Low-power Neural Network Accelerators


Book Description

This dissertation investigates design techniques involving custom Floating-Point (FP) computation for low-power neural network accelerators in resource-constrained embedded systems. It focuses on the sustainability of the future omnipresence of Artificial Intelligence (AI) through developing efficient hardware engines, emphasizing the balance between energy-efficient computations, inference quality, application versatility, and cross-platform compatibility. The research presents a hardware design methodology for low-power inference of Spike-by-Spike (SbS) neural networks. Despite the reduced complexity and noise robustness of SbS networks, their deployment in constrained embedded devices is challenging due to high memory and computational costs. The dissertation proposes a novel Multiply-Accumulate (MAC) hardware module that optimizes the balance between computational accuracy and resource efficiency in FP operations. This module employs a hybrid approach, combining standard FP with custom 8-bit FP and 4-bit logarithmic numerical representations, enabling customization based on application-specific constraints and implementing acceleration for the first time in embedded systems. Additionally, the study introduces a hardware design for low-power inference in Convolutional Neural Networks (CNNs), targeting sensor analytics applications. This proposes a Hybrid-Float6 (HF6) quantization scheme and a dedicated hardware accelerator. The proposed Quantization-Aware Training (QAT) method demonstrates improved quality despite the numerical quantization. The design ensures compatibility with standard ML frameworks such as TensorFlow Lite, highlighting its potential for practical deployment in real-world applications. This dissertation addresses the critical challenge of harmonizing computational accuracy with energy efficiency in AI hardware engines with inference quality, application versatility, and cross-platform compatibility as a design philosophy.




Efficient Processing of Deep Neural Networks


Book Description

This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.




Low-Power Computer Vision


Book Description

Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems.




Learning in Energy-Efficient Neuromorphic Computing: Algorithm and Architecture Co-Design


Book Description

Explains current co-design and co-optimization methodologies for building hardware neural networks and algorithms for machine learning applications This book focuses on how to build energy-efficient hardware for neural networks with learning capabilities—and provides co-design and co-optimization methodologies for building hardware neural networks that can learn. Presenting a complete picture from high-level algorithm to low-level implementation details, Learning in Energy-Efficient Neuromorphic Computing: Algorithm and Architecture Co-Design also covers many fundamentals and essentials in neural networks (e.g., deep learning), as well as hardware implementation of neural networks. The book begins with an overview of neural networks. It then discusses algorithms for utilizing and training rate-based artificial neural networks. Next comes an introduction to various options for executing neural networks, ranging from general-purpose processors to specialized hardware, from digital accelerator to analog accelerator. A design example on building energy-efficient accelerator for adaptive dynamic programming with neural networks is also presented. An examination of fundamental concepts and popular learning algorithms for spiking neural networks follows that, along with a look at the hardware for spiking neural networks. Then comes a chapter offering readers three design examples (two of which are based on conventional CMOS, and one on emerging nanotechnology) to implement the learning algorithm found in the previous chapter. The book concludes with an outlook on the future of neural network hardware. Includes cross-layer survey of hardware accelerators for neuromorphic algorithms Covers the co-design of architecture and algorithms with emerging devices for much-improved computing efficiency Focuses on the co-design of algorithms and hardware, which is especially critical for using emerging devices, such as traditional memristors or diffusive memristors, for neuromorphic computing Learning in Energy-Efficient Neuromorphic Computing: Algorithm and Architecture Co-Design is an ideal resource for researchers, scientists, software engineers, and hardware engineers dealing with the ever-increasing requirement on power consumption and response time. It is also excellent for teaching and training undergraduate and graduate students about the latest generation neural networks with powerful learning capabilities.




2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)


Book Description

This Symposium explores emerging trends and novel ideas and concepts in the area of VLSI The Symposium covers a range of topics from VLSI circuits, systems and design methods to system level design and system on chip issues, to bringing VLSI experience to new areas and technologies Future design methodologies as well as new CAD tools to support them will also be the key topics




Data Orchestration in Deep Learning Accelerators


Book Description

This Synthesis Lecture focuses on techniques for efficient data orchestration within DNN accelerators. The End of Moore's Law, coupled with the increasing growth in deep learning and other AI applications has led to the emergence of custom Deep Neural Network (DNN) accelerators for energy-efficient inference on edge devices. Modern DNNs have millions of hyper parameters and involve billions of computations; this necessitates extensive data movement from memory to on-chip processing engines. It is well known that the cost of data movement today surpasses the cost of the actual computation; therefore, DNN accelerators require careful orchestration of data across on-chip compute, network, and memory elements to minimize the number of accesses to external DRAM. The book covers DNN dataflows, data reuse, buffer hierarchies, networks-on-chip, and automated design-space exploration. It concludes with data orchestration challenges with compressed and sparse DNNs and future trends. The target audience is students, engineers, and researchers interested in designing high-performance and low-energy accelerators for DNN inference.




Efficient Inference Acceleration


Book Description

Forward progress in computing technology is expected to involve high degrees of heterogeneity and specialization. Emerging applications integrating neural networks are becoming more common and as a result development of specialized hardware designed for acceleration of neural networks is increasingly economical. As Moore's law wanes and applications utilizing neural networks benefit from high-performance and low-power execution provided by widely available specialized hardware, algorithms using neural networks are poised to continue to outpace alternative approaches. This dissertation explores the design space of neural network inference accelerators, spanning from monolithic systolic arrays with off-chip DRAMs for weight storage to tiled matrix-vector units with tightly coupled on-chip weight storage to supply high bandwidth weights without dependence on off-chip memory, targeting efficient microarchitectural techniques and neural network inference sequencing schemes, identifying three key design points of interest. The first is a monolithic systolic array based accelerator where pipeline depths are reduced in order to eliminate clocked element overheads. These optimizations primarily target energy-efficiency but also improve performance subject to bandwidth limitations. The accelerator includes weight permutation considerations required to better support processing convolutional layers on wide arrays using scheduling policies that preserve temporal locality of weight sub-matrices. The second accelerator uses codebook quantization for both weights and activations to reduce power associated with both on-chip communication and synapse calculation. Codebook based quantization and dequantization are tightly integrated into the accelerator data-path enabling the bulk of on-chip communication to remain in the quantized format. Training experiments are presented to provide insight into training techniques for inference accelerators utilizing codebook quantization of both activations and weights. The third accelerator design considers communication power reduction within a tiled accelerator using temporally coded interconnects for both activations and weights. Tolerance for the latency of the temporal codes within neural network accelerators is achieved by scheduling schemes that facilitate reuse of temporally communicated values and buffer capacities provisioned to support these schedules. Within the accelerator with temporally coded links, these adverse effects amount to performance degradations rather than high power consumption.




Progresses in Artificial Intelligence and Neural Systems


Book Description

This book provides an overview of the current advances in artificial intelligence and neural nets. Artificial intelligence (AI) methods have shown great capabilities in modelling, prediction and recognition tasks supporting human–machine interaction. At the same time, the issue of emotion has gained increasing attention due to its relevance in achieving human-like interaction with machines. The real challenge is taking advantage of the emotional characterization of humans’ interactions to make computers interfacing with them emotionally and socially credible. The book assesses how and to what extent current sophisticated computational intelligence tools might support the multidisciplinary research on the characterization of appropriate system reactions to human emotions and expressions in interactive scenarios. Discussing the latest recent research trends, innovative approaches and future challenges in AI from interdisciplinary perspectives, it is a valuable resource for researchers and practitioners in academia and industry.




TinyML


Book Description

Deep learning networks are getting smaller. Much smaller. The Google Assistant team can detect words with a model just 14 kilobytes in size—small enough to run on a microcontroller. With this practical book you’ll enter the field of TinyML, where deep learning and embedded systems combine to make astounding things possible with tiny devices. Pete Warden and Daniel Situnayake explain how you can train models small enough to fit into any environment. Ideal for software and hardware developers who want to build embedded systems using machine learning, this guide walks you through creating a series of TinyML projects, step-by-step. No machine learning or microcontroller experience is necessary. Build a speech recognizer, a camera that detects people, and a magic wand that responds to gestures Work with Arduino and ultra-low-power microcontrollers Learn the essentials of ML and how to train your own models Train models to understand audio, image, and accelerometer data Explore TensorFlow Lite for Microcontrollers, Google’s toolkit for TinyML Debug applications and provide safeguards for privacy and security Optimize latency, energy usage, and model and binary size




Hardware Accelerator Systems for Artificial Intelligence and Machine Learning


Book Description

Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, Volume 122 delves into arti?cial Intelligence and the growth it has seen with the advent of Deep Neural Networks (DNNs) and Machine Learning. Updates in this release include chapters on Hardware accelerator systems for artificial intelligence and machine learning, Introduction to Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, Deep Learning with GPUs, Edge Computing Optimization of Deep Learning Models for Specialized Tensor Processing Architectures, Architecture of NPU for DNN, Hardware Architecture for Convolutional Neural Network for Image Processing, FPGA based Neural Network Accelerators, and much more. Updates on new information on the architecture of GPU, NPU and DNN Discusses In-memory computing, Machine intelligence and Quantum computing Includes sections on Hardware Accelerator Systems to improve processing efficiency and performance