All-digital Time-domain CNN Engine for Energy Efficient Edge Computing


Book Description

Machine Learning is finding applications in a wide variety of areas ranging from autonomous cars to genomics. Machine learning tasks such as image classification, speech recognition and object detection are being used in most of the modern computing systems. In particular, Convolutional Neural Networks (CNNs, class of artificial neural networks) are extensively used for many such ML applications, due to their state of the art classification accuracy at a much lesser complexity compared to their fully connected network counterpart. However, the CNN inference process requires intensive compute and memory resources making it challenging to implement in energy constrained edge devices. The major operation of a CNN is the Multiplication and Accumulate (MAC) operation. These operations are traditionally performed by digital adders and multipliers, which dissipates large amount of power. In this 2-phase work, an energy efficient time-domain approach is used to perform the MAC operation using the concept of Memory Delay Line (MDL). Phase I of this work implements LeNet-5 CNN to classify MNIST dataset (handwritten digits) and is demonstrated on a commercial 40nm CMOS Test-chip. Phase II of this work aims to scale-up this work for multi-bit weights and implements AlexNet CNN to classify 1000-class ImageNet dataset images







Time Domain Multiply and Accumulate Engine for Convolutional Neural Networks


Book Description

As machine learning rapidly progresses, convolutional neural networks (CNN) have emerged as a successful although computationally intensive approach, in part due to their ability to recognize spatial features. The main computation in these CNNs is the multiply-and-accumulate (MAC) operation, in which two matrices are multiplied together element wise and summed, corresponding to the Frobenius inner product of the two matrices. Because of this, an increase in efficiency in the MAC operation will significantly increase the efficiency of these networks, making it crucial to design the MAC engine efficiently. This thesis explores a near-memory timedomain multiply-and-accumulate (MAC) engine used for convolutional neural networks. Time domain computing is chosen for efficiency as it allows for compact representation of multi bit inputs within a single wire. This reduces the gate count and switching capacitance (Cdyn) within the arithmetic circuit compared to an all-digital implementation. The input features are encoded in time by modulating the pulse width of the feature signal. A delay line digital-to-time converter (DTC) is used to generate these encoded input features. Local static random-access memory (SRAM) is used to store weights, which are then used to gate the input feature pulses. The gated product is then passed to a proposed digitally controlled gated ring oscillator (DCGRO) time-todigital converter (TDC). The DCGRO TDC functions as a time accumulator, as partial pulses are stored within the DCGRO, and quantized pulses are tracked in the counter. Because of the digital control, the DCGRO is able to switch between two operating frequencies, allowing quantization of two pulses in parallel. To speed up the accumulation, partial sums are accumulated and summed together in the digital domain. To support signed accumulation, two time accumulators are used, and products are switched between the two depending on the sign of the weight from memory. The proposed design is implemented in a 28 nm process. For 5-bit input precision, the proposed design iii achieves an energy efficiency of 4.6 TOPS/W and a throughput of 819 GOPS/s at 900 mV. For 8- bit input precision, the power efficiency is estimated to be 854 GOPS/W, and the throughput is estimated to be 102 GOPS/s.







Energy Efficient Computation Offloading in Mobile Edge Computing


Book Description

This book provides a comprehensive review and in-depth discussion of the state-of-the-art research literature and propose energy-efficient computation offloading and resources management for mobile edge computing (MEC), covering task offloading, channel allocation, frequency scaling and resource scheduling. Since the task arrival process and channel conditions are stochastic and dynamic, the authors first propose an energy efficient dynamic computing offloading scheme to minimize energy consumption and guarantee end devices’ delay performance. To further improve energy efficiency combined with tail energy, the authors present a computation offloading and frequency scaling scheme to jointly deal with the stochastic task allocation and CPU-cycle frequency scaling for minimal energy consumption while guaranteeing the system stability. They also investigate delay-aware and energy-efficient computation offloading in a dynamic MEC system with multiple edge servers, and introduce an end-to-end deep reinforcement learning (DRL) approach to select the best edge server for offloading and allocate the optimal computational resource such that the expected long-term utility is maximized. Finally, the authors study the multi-task computation offloading in multi-access MEC via non-orthogonal multiple access (NOMA) and accounting for the time-varying channel conditions. An online algorithm based on DRL is proposed to efficiently learn the near-optimal offloading solutions. Researchers working in mobile edge computing, task offloading and resource management, as well as advanced level students in electrical and computer engineering, telecommunications, computer science or other related disciplines will find this book useful as a reference. Professionals working within these related fields will also benefit from this book.




Energy-Efficient Time-Domain Computation for Edge Devices


Book Description

This monograph reviews state-of-the-art time-domain accelerators and discusses system considerations and hardware implementations.




Efficient Processing of Deep Neural Networks


Book Description

This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.




TinyML


Book Description

Deep learning networks are getting smaller. Much smaller. The Google Assistant team can detect words with a model just 14 kilobytes in size—small enough to run on a microcontroller. With this practical book you’ll enter the field of TinyML, where deep learning and embedded systems combine to make astounding things possible with tiny devices. Pete Warden and Daniel Situnayake explain how you can train models small enough to fit into any environment. Ideal for software and hardware developers who want to build embedded systems using machine learning, this guide walks you through creating a series of TinyML projects, step-by-step. No machine learning or microcontroller experience is necessary. Build a speech recognizer, a camera that detects people, and a magic wand that responds to gestures Work with Arduino and ultra-low-power microcontrollers Learn the essentials of ML and how to train your own models Train models to understand audio, image, and accelerometer data Explore TensorFlow Lite for Microcontrollers, Google’s toolkit for TinyML Debug applications and provide safeguards for privacy and security Optimize latency, energy usage, and model and binary size




Computing at the Edge


Book Description

This book describes solutions to the problems of energy efficiency, resiliency and cyber security in the domain of Edge Computing and reports on early deployments of the technology in commercial settings. This book takes a business focused view, relating the technological outcomes to new business opportunities made possible by the edge paradigm. Drawing on the experience of end user deploying prototype edge technology, the authors discuss applications in financial management, wireless management, and social networks. Coverage includes a chapter on the analysis of total cost of ownership, thereby enabling readers to calculate the efficiency gain for use of the technology in their business. Provides a single-source reference to the state-of-the art of edge computing; Describes how researchers across the world are addressing challenges relating to power efficiency, ease of programming and emerging cyber security threats in this domain; Discusses total cost of ownership for applications in financial management and social networks; Discusses security challenges in wireless management.




Hardware Accelerator Systems for Artificial Intelligence and Machine Learning


Book Description

Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, Volume 122 delves into arti?cial Intelligence and the growth it has seen with the advent of Deep Neural Networks (DNNs) and Machine Learning. Updates in this release include chapters on Hardware accelerator systems for artificial intelligence and machine learning, Introduction to Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, Deep Learning with GPUs, Edge Computing Optimization of Deep Learning Models for Specialized Tensor Processing Architectures, Architecture of NPU for DNN, Hardware Architecture for Convolutional Neural Network for Image Processing, FPGA based Neural Network Accelerators, and much more. Updates on new information on the architecture of GPU, NPU and DNN Discusses In-memory computing, Machine intelligence and Quantum computing Includes sections on Hardware Accelerator Systems to improve processing efficiency and performance