Time Domain Multiply and Accumulate Engine for Convolutional Neural Networks


Book Description

As machine learning rapidly progresses, convolutional neural networks (CNN) have emerged as a successful although computationally intensive approach, in part due to their ability to recognize spatial features. The main computation in these CNNs is the multiply-and-accumulate (MAC) operation, in which two matrices are multiplied together element wise and summed, corresponding to the Frobenius inner product of the two matrices. Because of this, an increase in efficiency in the MAC operation will significantly increase the efficiency of these networks, making it crucial to design the MAC engine efficiently. This thesis explores a near-memory timedomain multiply-and-accumulate (MAC) engine used for convolutional neural networks. Time domain computing is chosen for efficiency as it allows for compact representation of multi bit inputs within a single wire. This reduces the gate count and switching capacitance (Cdyn) within the arithmetic circuit compared to an all-digital implementation. The input features are encoded in time by modulating the pulse width of the feature signal. A delay line digital-to-time converter (DTC) is used to generate these encoded input features. Local static random-access memory (SRAM) is used to store weights, which are then used to gate the input feature pulses. The gated product is then passed to a proposed digitally controlled gated ring oscillator (DCGRO) time-todigital converter (TDC). The DCGRO TDC functions as a time accumulator, as partial pulses are stored within the DCGRO, and quantized pulses are tracked in the counter. Because of the digital control, the DCGRO is able to switch between two operating frequencies, allowing quantization of two pulses in parallel. To speed up the accumulation, partial sums are accumulated and summed together in the digital domain. To support signed accumulation, two time accumulators are used, and products are switched between the two depending on the sign of the weight from memory. The proposed design is implemented in a 28 nm process. For 5-bit input precision, the proposed design iii achieves an energy efficiency of 4.6 TOPS/W and a throughput of 819 GOPS/s at 900 mV. For 8- bit input precision, the power efficiency is estimated to be 854 GOPS/W, and the throughput is estimated to be 102 GOPS/s.




Efficient Processing of Deep Neural Networks


Book Description

This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.




Neural Computing for Advanced Applications


Book Description

The two-volume Proceedings set CCIS 1637 and 1638 constitutes the refereed proceedings of the Third International Conference on Neural Computing for Advanced Applications, NCAA 2022, held in Jinan, China, during July 8–10, 2022. The 77 papers included in these proceedings were carefully reviewed and selected from 205 submissions. These papers were categorized into 10 technical tracks, i.e., neural network theory, and cognitive sciences, machine learning, data mining, data security & privacy protection, and data-driven applications, computational intelligence, nature-inspired optimizers, and their engineering applications, cloud/edge/fog computing, the Internet of Things/Vehicles (IoT/IoV), and their system optimization, control systems, network synchronization, system integration, and industrial artificial intelligence, fuzzy logic, neuro-fuzzy systems, decision making, and their applications in management sciences, computer vision, image processing, and their industrial applications, natural language processing, machine translation, knowledge graphs, and their applications, Neural computing-based fault diagnosis, fault forecasting, prognostic management, and system modeling, and Spreading dynamics, forecasting, and other intelligent techniques against coronavirus disease (COVID-19).




Neuromorphic Photonics


Book Description

This book sets out to build bridges between the domains of photonic device physics and neural networks, providing a comprehensive overview of the emerging field of "neuromorphic photonics." It includes a thorough discussion of evolution of neuromorphic photonics from the advent of fiber-optic neurons to today’s state-of-the-art integrated laser neurons, which are a current focus of international research. Neuromorphic Photonics explores candidate interconnection architectures and devices for integrated neuromorphic networks, along with key functionality such as learning. It is written at a level accessible to graduate students, while also intending to serve as a comprehensive reference for experts in the field.




TinyML


Book Description

Deep learning networks are getting smaller. Much smaller. The Google Assistant team can detect words with a model just 14 kilobytes in size—small enough to run on a microcontroller. With this practical book you’ll enter the field of TinyML, where deep learning and embedded systems combine to make astounding things possible with tiny devices. Pete Warden and Daniel Situnayake explain how you can train models small enough to fit into any environment. Ideal for software and hardware developers who want to build embedded systems using machine learning, this guide walks you through creating a series of TinyML projects, step-by-step. No machine learning or microcontroller experience is necessary. Build a speech recognizer, a camera that detects people, and a magic wand that responds to gestures Work with Arduino and ultra-low-power microcontrollers Learn the essentials of ML and how to train your own models Train models to understand audio, image, and accelerometer data Explore TensorFlow Lite for Microcontrollers, Google’s toolkit for TinyML Debug applications and provide safeguards for privacy and security Optimize latency, energy usage, and model and binary size




Graph Representation Learning


Book Description

Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial for creating systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, including techniques for deep graph embeddings, generalizations of convolutional neural networks to graph-structured data, and neural message-passing approaches inspired by belief propagation. These advances in graph representation learning have led to new state-of-the-art results in numerous domains, including chemical synthesis, 3D vision, recommender systems, question answering, and social network analysis. This book provides a synthesis and overview of graph representation learning. It begins with a discussion of the goals of graph representation learning as well as key methodological foundations in graph theory and network analysis. Following this, the book introduces and reviews methods for learning node embeddings, including random-walk-based methods and applications to knowledge graphs. It then provides a technical synthesis and introduction to the highly successful graph neural network (GNN) formalism, which has become a dominant and fast-growing paradigm for deep learning with graph data. The book concludes with a synthesis of recent advancements in deep generative models for graphs—a nascent but quickly growing subset of graph representation learning.




Multivariate Statistical Machine Learning Methods for Genomic Prediction


Book Description

This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension.The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool.




Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays


Book Description

FPGA '17: The 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Feb 22, 2017-Feb 24, 2017 Monterey, USA. You can view more information about this proceeding and all of ACM�s other published conference proceedings from the ACM Digital Library: http://www.acm.org/dl.




Deep Learning for Computer Architects


Book Description

Machine learning, and specifically deep learning, has been hugely disruptive in many fields of computer science. The success of deep learning techniques in solving notoriously difficult classification and regression problems has resulted in their rapid adoption in solving real-world problems. The emergence of deep learning is widely attributed to a virtuous cycle whereby fundamental advancements in training deeper models were enabled by the availability of massive datasets and high-performance computer hardware. This text serves as a primer for computer architects in a new and rapidly evolving field. We review how machine learning has evolved since its inception in the 1960s and track the key developments leading up to the emergence of the powerful deep learning techniques that emerged in the last decade. Next we review representative workloads, including the most commonly used datasets and seminal networks across a variety of domains. In addition to discussing the workloads themselves, we also detail the most popular deep learning tools and show how aspiring practitioners can use the tools with the workloads to characterize and optimize DNNs. The remainder of the book is dedicated to the design and optimization of hardware and architectures for machine learning. As high-performance hardware was so instrumental in the success of machine learning becoming a practical solution, this chapter recounts a variety of optimizations proposed recently to further improve future designs. Finally, we present a review of recent research published in the area as well as a taxonomy to help readers understand how various contributions fall in context.




Neural Machine Translation


Book Description

Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.