High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Design challenges in Interposer based 3-D Memory Logic Interface

International Symposium on Microelectronics ◽

10.4071/isom-2015-tp24 ◽

2015 ◽

Vol 2015 (1) ◽

pp. 000050-000054

Author(s):

Andy Heinig ◽

Muhammad Waqas Chaudhary ◽

Robert Fischbach ◽

Michael Dittrich

Keyword(s):

System Performance ◽

High Performance ◽

Memory Systems ◽

Design Problems ◽

Hybrid Memory ◽

High Bandwidth ◽

Performance System ◽

Memory Type ◽

Metal Layers ◽

3D Memory

Further improvements in system performance are often limited by the achievable bandwidth between processor and memory. In this paper we look at interposer-based and stacked solutions to integrate processor and 3D memory into a high performance system. The comparison is made for different technological decisions, design problems faced for choosing a certain 3D memory type from Wide IO/1–2, High bandwidth memory (HBM) and Hybrid Memory Cube (HMC). Logic die size, metal layers and material of interposer affected by routing requirements of memory systems are discussed.

Download Full-text

Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3310133 ◽

2019 ◽

Vol 16 (2) ◽

pp. 1-26 ◽

Cited By ~ 6

Author(s):

Xiaoyuan Wang ◽

Haikun Liu ◽

Xiaofei Liao ◽

Ji Chen ◽

Hai Jin ◽

...

Keyword(s):

Memory Systems ◽

Hybrid Memory ◽

Page Migration

Download Full-text

Keynote Address 2: “Hybrid memory cube: Achieving high performance and high reliability”

2015 IEEE International Reliability Physics Symposium ◽

10.1109/irps.2015.7112657 ◽

2015 ◽

Author(s):

Brent Keeth

Keyword(s):

High Performance ◽

High Reliability ◽

Keynote Address ◽

Hybrid Memory

Download Full-text

MALRU: Miss-penalty aware LRU-based cache replacement for hybrid memory systems

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 ◽

10.23919/date.2017.7927151 ◽

2017 ◽

Cited By ~ 7

Author(s):

Di Chen ◽

Hai Jin ◽

Xiaofei Liao ◽

Haikun Liu ◽

Rentong Guo ◽

...

Keyword(s):

Memory Systems ◽

Cache Replacement ◽

Hybrid Memory

Download Full-text

A SURVEY OF CHECKPOINT/RESTART TECHNIQUES ON DISTRIBUTED MEMORY SYSTEMS

Parallel Processing Letters ◽

10.1142/s0129626413400112 ◽

2013 ◽

Vol 23 (04) ◽

pp. 1340011 ◽

Cited By ~ 7

Author(s):

FAISAL SHAHZAD ◽

MARKUS WITTMANN ◽

MORITZ KREUTZER ◽

THOMAS ZEISER ◽

GEORG HAGER ◽

...

Keyword(s):

High Performance ◽

Building Blocks ◽

Memory Systems ◽

Time To Failure ◽

Flow Solver ◽

The Road ◽

System A ◽

Node Level ◽

Mean Time ◽

Performance Computing

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.

Download Full-text

MemBrain: Automated Application Guidance for Hybrid Memory Systems

2018 IEEE International Conference on Networking, Architecture and Storage (NAS) ◽

10.1109/nas.2018.8515694 ◽

2018 ◽

Author(s):

M. Ben Olson ◽

Tong Zhou ◽

Michael R. Jantz ◽

Kshitij A. Doshi ◽

M. Graham Lopez ◽

...

Keyword(s):

Memory Systems ◽

Hybrid Memory

Download Full-text

Pattern-Aware Staging for Hybrid Memory Systems

Lecture Notes in Computer Science - High Performance Computing ◽

10.1007/978-3-030-50743-5_24 ◽

2020 ◽

pp. 474-495

Author(s):

Eishi Arima ◽

Martin Schulz

Keyword(s):

Memory Systems ◽

Hybrid Memory

Download Full-text

Temporal Locality with a Long Interval: Hybrid Memory System for High-Performance and Low-Power

Software Engineering Research, Management and Applications - Studies in Computational Intelligence ◽

10.1007/978-3-319-98881-8_1 ◽

2018 ◽

pp. 1-15

Author(s):

Bo-Sung Jung ◽

Jung-Hoon Lee

Keyword(s):

Low Power ◽

High Performance ◽

Memory System ◽

Hybrid Memory ◽

Temporal Locality

Download Full-text

Invasive Computing on High Performance Shared Memory Systems

Facing the Multicore-Challenge III - Lecture Notes in Computer Science ◽

10.1007/978-3-642-35893-7_1 ◽

2013 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Michael Bader ◽

Hans-Joachim Bungartz ◽

Martin Schreiber

Keyword(s):

Shared Memory ◽

High Performance ◽

Memory Systems

Download Full-text

Dynamic Control of Computation Consistency in the Parallel Dataflow Computing System

INFORMACIONNYE TEHNOLOGII ◽

10.17587/it.27.625-633 ◽

2021 ◽

Vol 27 (12) ◽

pp. 625-633

Author(s):

N. N. Levchenko ◽

◽

D. N. Zmejev ◽

Keyword(s):

Computational Model ◽

High Performance ◽

Dynamic Control ◽

Computing System ◽

Computational Process ◽

Computing Systems ◽

Systems Software ◽

Dataflow Computing

When developing high-performance multiprocessor computing systems, much attention is paid to ensuring uninterrupted operation, both in terms of hardware and software. In traditional computing systems, software is the main focus in addressing these issues. The article discusses the solution to the issue of ensuring uninterrupted operation for the parallel dataflow computing system (PDCS), which implements the dataflow computational model with a dynamically formed context. Due to the features of the PDCS, it is proposed to implement this type of control in hardware, which will increase its efficiency, since the computational process will be controlled in dynamics, and not only in statics.

Download Full-text