A Runtime Reconfigurable Design of Compute-in-Memory–Based Hardware Accelerator for Deep Learning Inference

2021 ◽  
Vol 26 (6) ◽  
pp. 1-18
Author(s):  
Anni Lu ◽  
Xiaochen Peng ◽  
Yandong Luo ◽  
Shanshi Huang ◽  
Shimeng Yu

Compute-in-memory (CIM) is an attractive solution to address the “memory wall” challenges for the extensive computation in deep learning hardware accelerators. For custom ASIC design, a specific chip instance is restricted to a specific network during runtime. However, the development cycle of the hardware is normally far behind the emergence of new algorithms. Although some of the reported CIM-based architectures can adapt to different deep neural network (DNN) models, few details about the dataflow or control were disclosed to enable such an assumption. Instruction set architecture (ISA) could support high flexibility, but its complexity would be an obstacle to efficiency. In this article, a runtime reconfigurable design methodology of CIM-based accelerators is proposed to support a class of convolutional neural networks running on one prefabricated chip instance with ASIC-like efficiency. First, several design aspects are investigated: (1) the reconfigurable weight mapping method; (2) the input side of data transmission, mainly about the weight reloading; and (3) the output side of data processing, mainly about the reconfigurable accumulation. Then, a system-level performance benchmark is performed for the inference of different DNN models, such as VGG-8 on a CIFAR-10 dataset and AlexNet GoogLeNet, ResNet-18, and DenseNet-121 on an ImageNet dataset to measure the trade-offs between runtime reconfigurability, chip area, memory utilization, throughput, and energy efficiency.

Energies ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 1568
Author(s):  
Bernhard Wunsch ◽  
Stanislav Skibin ◽  
Ville Forsström ◽  
Ivica Stevanovic

EMC simulations are an indispensable tool to analyze EMC noise propagation in power converters and to assess the best filtering options. In this paper, we first show how to set up EMC simulations of power converters and then we demonstrate their use on the example of an industrial AC motor drive. Broadband models of key power converter components are reviewed and combined into a circuit model of the complete power converter setup enabling detailed EMC analysis. The approach is demonstrated by analyzing the conducted noise emissions of a 75 kW power converter driving a 45 kW motor. Based on the simulations, the critical impedances, the dominant noise propagation, and the most efficient filter component and location within the system are identified. For the analyzed system, maxima of EMC noise are caused by resonances of the long motor cable and can be accurately predicted as functions of type, length, and layout of the motor cable. The common-mode noise at the LISN is shown to have a dominant contribution caused by magnetic coupling between the noisy motor side and the AC input side of the drive. All the predictions are validated by measurements and highlight the benefit of simulation-based EMC analysis and filter design.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1514
Author(s):  
Seung-Ho Lim ◽  
WoonSik William Suh ◽  
Jin-Young Kim ◽  
Sang-Young Cho

The optimization for hardware processor and system for performing deep learning operations such as Convolutional Neural Networks (CNN) in resource limited embedded devices are recent active research area. In order to perform an optimized deep neural network model using the limited computational unit and memory of an embedded device, it is necessary to quickly apply various configurations of hardware modules to various deep neural network models and find the optimal combination. The Electronic System Level (ESL) Simulator based on SystemC is very useful for rapid hardware modeling and verification. In this paper, we designed and implemented a Deep Learning Accelerator (DLA) that performs Deep Neural Network (DNN) operation based on the RISC-V Virtual Platform implemented in SystemC in order to enable rapid and diverse analysis of deep learning operations in an embedded device based on the RISC-V processor, which is a recently emerging embedded processor. The developed RISC-V based DLA prototype can analyze the hardware requirements according to the CNN data set through the configuration of the CNN DLA architecture, and it is possible to run RISC-V compiled software on the platform, can perform a real neural network model like Darknet. We performed the Darknet CNN model on the developed DLA prototype, and confirmed that computational overhead and inference errors can be analyzed with the DLA prototype developed by analyzing the DLA architecture for various data sets.


Author(s):  
L. Chen ◽  
F. Rottensteiner ◽  
C. Heipke

Abstract. Matching images containing large viewpoint and viewing direction changes, resulting in large perspective differences, still is a very challenging problem. Affine shape estimation, orientation assignment and feature description algorithms based on detected hand crafted features have shown to be error prone. In this paper, affine shape estimation, orientation assignment and description of local features is achieved through deep learning. Those three modules are trained based on loss functions optimizing the matching performance of input patch pairs. The trained descriptors are first evaluated on the Brown dataset (Brown et al., 2011), a standard descriptor performance benchmark. The whole pipeline is then tested on images of small blocks acquired with an aerial penta camera, to compute image orientation. The results show that learned features perform significantly better than alternatives based on hand crafted features.


Author(s):  
Emanuele Lopelli ◽  
Johan van der Tang ◽  
Arthur van Roermund
Keyword(s):  

Technologies ◽  
2018 ◽  
Vol 7 (1) ◽  
pp. 3
Author(s):  
Panagiotis Oikonomou ◽  
Antonios Dadaliaris ◽  
Kostas Kolomvatsos ◽  
Thanasis Loukopoulos ◽  
Athanasios Kakarountas ◽  
...  

In standard cell placement, a circuit is given consisting of cells with a standard height, (different widths) and the problem is to place the cells in the standard rows of a chip area so that no overlaps occur and some target function is optimized. The process is usually split into at least two phases. In a first pass, a global placement algorithm distributes the cells across the circuit area, while in the second step, a legalization algorithm aligns the cells to the standard rows of the power grid and alleviates any overlaps. While a few legalization schemes have been proposed in the past for the basic problem formulation, few obstacle-aware extensions exist. Furthermore, they usually provide extreme trade-offs between time performance and optimization efficiency. In this paper, we focus on the legalization step, in the presence of pre-allocated modules acting as obstacles. We extend two known algorithmic approaches, namely Tetris and Abacus, so that they become obstacle-aware. Furthermore, we propose a parallelization scheme to tackle the computational complexity. The experiments illustrate that the proposed parallelization method achieves a good scalability, while it also efficiently prunes the search space resulting in a superlinear speedup. Furthermore, this time performance comes at only a small cost (sometimes even improvement) concerning the typical optimization metrics.


Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 3929 ◽  
Author(s):  
Grigorios Tsagkatakis ◽  
Anastasia Aidini ◽  
Konstantina Fotiadou ◽  
Michalis Giannopoulos ◽  
Anastasia Pentari ◽  
...  

Deep Learning, and Deep Neural Networks in particular, have established themselves as the new norm in signal and data processing, achieving state-of-the-art performance in image, audio, and natural language understanding. In remote sensing, a large body of research has been devoted to the application of deep learning for typical supervised learning tasks such as classification. Less yet equally important effort has also been allocated to addressing the challenges associated with the enhancement of low-quality observations from remote sensing platforms. Addressing such channels is of paramount importance, both in itself, since high-altitude imaging, environmental conditions, and imaging systems trade-offs lead to low-quality observation, as well as to facilitate subsequent analysis, such as classification and detection. In this paper, we provide a comprehensive review of deep-learning methods for the enhancement of remote sensing observations, focusing on critical tasks including single and multi-band super-resolution, denoising, restoration, pan-sharpening, and fusion, among others. In addition to the detailed analysis and comparison of recently presented approaches, different research avenues which could be explored in the future are also discussed.


2020 ◽  
Vol 143 (2) ◽  
Author(s):  
Yaser Hadad ◽  
Vahideh Radmard ◽  
Srikanth Rangarajan ◽  
Mahdi Farahikia ◽  
Gamal Refai-Ahmed ◽  
...  

Abstract The industry shift to multicore microprocessor architecture will likely cause higher temperature nonuniformity on chip surfaces, exacerbating the problem of chip reliability and lifespan. While advanced cooling technologies like two phase embedded cooling exist, the technological risks of such solutions make conventional cooling technologies more desirable. One such solution is remote cooling with heatsinks with sequential conduction resistance from chip to module. The objective of this work is to numerically demonstrate a novel concept to remotely cool chips with hotspots and maximize chip temperature uniformity using an optimized flow distribution under constrained geometric parameters for the heatsink. The optimally distributed flow conditions presented here are intended to maximize the heat transfer from a nonuniform chip power map by actively directing flow to a hotspot region. The hotspot-targeted parallel microchannel liquid cooling design is evaluated against a baseline uniform flow conventional liquid cooling design for the industry pressure drop limit of approximately 20 kPa. For an average steady-state heat flux of 145 W/cm2 on core areas (hotspots) and 18 W/cm2 on the remaining chip area (background), the chip temperature uniformity is improved by 10%. Moreover, the heatsink design has improved chip temperature uniformity without a need for any additional system level complexity, which also reduces reliability risks.


Sign in / Sign up

Export Citation Format

Share Document