High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Author(s):  
Basilio B. Fraguela ◽  
Diego Andrade
2015 ◽  
Vol 2015 (1) ◽  
pp. 000050-000054
Author(s):  
Andy Heinig ◽  
Muhammad Waqas Chaudhary ◽  
Robert Fischbach ◽  
Michael Dittrich

Further improvements in system performance are often limited by the achievable bandwidth between processor and memory. In this paper we look at interposer-based and stacked solutions to integrate processor and 3D memory into a high performance system. The comparison is made for different technological decisions, design problems faced for choosing a certain 3D memory type from Wide IO/1–2, High bandwidth memory (HBM) and Hybrid Memory Cube (HMC). Logic die size, metal layers and material of interposer affected by routing requirements of memory systems are discussed.


2019 ◽  
Vol 16 (2) ◽  
pp. 1-26 ◽  
Author(s):  
Xiaoyuan Wang ◽  
Haikun Liu ◽  
Xiaofei Liao ◽  
Ji Chen ◽  
Hai Jin ◽  
...  

2013 ◽  
Vol 23 (04) ◽  
pp. 1340011 ◽  
Author(s):  
FAISAL SHAHZAD ◽  
MARKUS WITTMANN ◽  
MORITZ KREUTZER ◽  
THOMAS ZEISER ◽  
GEORG HAGER ◽  
...  

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.


Author(s):  
M. Ben Olson ◽  
Tong Zhou ◽  
Michael R. Jantz ◽  
Kshitij A. Doshi ◽  
M. Graham Lopez ◽  
...  
Keyword(s):  

Author(s):  
Michael Bader ◽  
Hans-Joachim Bungartz ◽  
Martin Schreiber

2021 ◽  
Vol 27 (12) ◽  
pp. 625-633
Author(s):  
N. N. Levchenko ◽  
◽  
D. N. Zmejev ◽  

When developing high-performance multiprocessor computing systems, much attention is paid to ensuring uninterrupted operation, both in terms of hardware and software. In traditional computing systems, software is the main focus in address­ing these issues. The article discusses the solution to the issue of ensuring uninterrupted operation for the parallel dataflow computing system (PDCS), which implements the dataflow computational model with a dynamically formed context. Due to the features of the PDCS, it is proposed to implement this type of control in hardware, which will increase its efficiency, since the computational process will be controlled in dynamics, and not only in statics.


Sign in / Sign up

Export Citation Format

Share Document