3d memory
Recently Published Documents


TOTAL DOCUMENTS

91
(FIVE YEARS 19)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Younwoo Yoo ◽  
Hayoung Lee ◽  
Seung Ho Shin ◽  
Sungho Kang
Keyword(s):  

2021 ◽  
Author(s):  
Seung Ho Shin ◽  
Hayoung Lee ◽  
Younwoo Yoo ◽  
Sungho Kang
Keyword(s):  

2021 ◽  
Vol 26 (6) ◽  
pp. 1-20
Author(s):  
Naebeom Park ◽  
Sungju Ryu ◽  
Jaeha Kung ◽  
Jae-Joon Kim

This article discusses the high-performance near-memory neural network (NN) accelerator architecture utilizing the logic die in three-dimensional (3D) High Bandwidth Memory– (HBM) like memory. As most of the previously reported 3D memory-based near-memory NN accelerator designs used the Hybrid Memory Cube (HMC) memory, we first focus on identifying the key differences between HBM and HMC in terms of near-memory NN accelerator design. One of the major differences between the two 3D memories is that HBM has the centralized through- silicon-via (TSV) channels while HMC has distributed TSV channels for separate vaults. Based on the observation, we introduce the Round-Robin Data Fetching and Groupwise Broadcast schemes to exploit the centralized TSV channels for improvement of the data feeding rate for the processing elements. Using synthesized designs in a 28-nm CMOS technology, performance and energy consumption of the proposed architectures with various dataflow models are evaluated. Experimental results show that the proposed schemes reduce the runtime by 16.4–39.3% on average and the energy consumption by 2.1–5.1% on average compared to conventional data fetching schemes.


2021 ◽  
Vol 17 (2) ◽  
pp. 1-25
Author(s):  
Palash Das ◽  
Hemangee K. Kapoor

Convolutional/Deep Neural Networks (CNNs/DNNs) are rapidly growing workloads for the emerging AI-based systems. The gap between the processing speed and the memory-access latency in multi-core systems affects the performance and energy efficiency of the CNN/DNN tasks. This article aims to alleviate this gap by providing a simple and yet efficient near-memory accelerator-based system that expedites the CNN inference. Towards this goal, we first design an efficient parallel algorithm to accelerate CNN/DNN tasks. The data is partitioned across the multiple memory channels (vaults) to assist in the execution of the parallel algorithm. Second, we design a hardware unit, namely the convolutional logic unit (CLU), which implements the parallel algorithm. To optimize the inference, the CLU is designed, and it works in three phases for layer-wise processing of data. Last, to harness the benefits of near-memory processing (NMP), we integrate homogeneous CLUs on the logic layer of the 3D memory, specifically the Hybrid Memory Cube (HMC). The combined effect of these results in a high-performing and energy-efficient system for CNNs/DNNs. The proposed system achieves a substantial gain in the performance and energy reduction compared to multi-core CPU- and GPU-based systems with a minimal area overhead of 2.37%.


Author(s):  
Jun-Han Han ◽  
Robert E. West ◽  
Karina Torres-Castro ◽  
Nathan Swami ◽  
Samira Khan ◽  
...  
Keyword(s):  

Author(s):  
Kaori Sasaki ◽  
Takaki Hashimoto ◽  
Yenting Kuo ◽  
Hiroshi Tsukada ◽  
Hiroyuki Tanizaki

Sign in / Sign up

Export Citation Format

Share Document