FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

Revisiting the design of LSM-tree Based OLTP storage engine with persistent memory

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467875 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1872-1885

Author(s):

Baoyue Yan ◽

Xuntao Cheng ◽

Bo Jiang ◽

Shibin Chen ◽

Canfang Shang ◽

...

Keyword(s):

Recovery Time ◽

High Performance ◽

Transaction Processing ◽

Light Weight ◽

Global Index ◽

Persistent Memory ◽

Write Amplification ◽

Database As A Service ◽

Overall Evaluation ◽

Memory Compaction

The recent byte-addressable and large-capacity commercialized persistent memory (PM) is promising to drive database as a service (DBaaS) into unchartered territories. This paper investigates how to leverage PMs to revisit the conventional LSM-tree based OLTP storage engines designed for DRAM-SSD hierarchy for DBaaS instances. Specifically we (1) propose a light-weight PM allocator named Hal-loc customized for LSM-tree, (2) build a high-performance Semi-persistent Memtable utilizing the persistent in-memory writes of PM, (3) design a concurrent commit algorithm named Reorder Ring to aschieve log-free transaction processing for OLTP workloads and (4) present a Global Index as the new globally sorted persistent level with non-blocking in-memory compaction. The design of Reorder Ring and Semi-persistent Memtable achieves fast writes without synchronized logging overheads and achieves near instant recovery time. Moreover, the design of Semi-persistent Memtable and Global Index with in-memory compaction enables the byte-addressable persistent levels in PM, which significantly reduces the read and write amplification as well as the background compaction overheads. The overall evaluation shows that the performance of our proposal over PM-SSD hierarchy outperforms the baseline by up to 3.8x in YCSB benchmark and by 2x in TPC-C benchmark.

Download Full-text

Persistent memory hash indexes

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446101 ◽

2021 ◽

Vol 14 (5) ◽

pp. 785-798

Author(s):

Daokun Hu ◽

Zhiwen Chen ◽

Jianbing Wu ◽

Jianhua Sun ◽

Hao Chen

Keyword(s):

Future Development ◽

High Performance ◽

Performance Metrics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Hash Tables ◽

Trade Offs ◽

Depth Analysis ◽

Persistent Memory ◽

Memory Modules

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.

Download Full-text

A SURVEY OF CHECKPOINT/RESTART TECHNIQUES ON DISTRIBUTED MEMORY SYSTEMS

Parallel Processing Letters ◽

10.1142/s0129626413400112 ◽

2013 ◽

Vol 23 (04) ◽

pp. 1340011 ◽

Cited By ~ 7

Author(s):

FAISAL SHAHZAD ◽

MARKUS WITTMANN ◽

MORITZ KREUTZER ◽

THOMAS ZEISER ◽

GEORG HAGER ◽

...

Keyword(s):

High Performance ◽

Building Blocks ◽

Memory Systems ◽

Time To Failure ◽

Flow Solver ◽

The Road ◽

System A ◽

Node Level ◽

Mean Time ◽

Performance Computing

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.

Download Full-text

Invasive Computing on High Performance Shared Memory Systems

Facing the Multicore-Challenge III - Lecture Notes in Computer Science ◽

10.1007/978-3-642-35893-7_1 ◽

2013 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Michael Bader ◽

Hans-Joachim Bungartz ◽

Martin Schreiber

Keyword(s):

Shared Memory ◽

High Performance ◽

Memory Systems

Download Full-text

Command vector memory systems: high performance at low cost

Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192) ◽

10.1109/pact.1998.727154 ◽

2002 ◽

Cited By ~ 24

Author(s):

J. Corbal ◽

R. Espasa ◽

M. Valero

Keyword(s):

High Performance ◽

Low Cost ◽

Memory Systems

Download Full-text

Closed yet Open DRAM: Achieving Low Latency and High Performance in DRAM Memory Systems

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) ◽

10.1109/dac.2018.8465817 ◽

2018 ◽

Author(s):

Lavanya Subramanian ◽

Kaushik Vaidyanathan ◽

Anant Nori ◽

Sreenivas Subramoney ◽

Tanay Karnik ◽

...

Keyword(s):

High Performance ◽

Memory Systems ◽

Low Latency

Download Full-text

Using DRAM as Cache for Non-Volatile Main Memory Swapping

International Journal of Software Innovation ◽

10.4018/ijsi.2016010105 ◽

2016 ◽

Vol 4 (1) ◽

pp. 61-71

Author(s):

Hirotaka Kawata ◽

Gaku Nakagawa ◽

Shuichi Oikawa

Keyword(s):

Power Consumption ◽

Power Management ◽

Mobile Devices ◽

Memory Management ◽

High Performance ◽

Reducing Power ◽

Memory Systems ◽

Main Memory ◽

Dynamic Power Management ◽

Memory Space

The performance of mobile devices such as smartphones and tablets has been rapidly improving in recent years. However, these improvements have been seriously affecting power consumption. One of the greatest challenges is to achieve efficient power management for battery-equipped mobile devices. To solve this problem, the authors focus on the emerging non-volatile memory (NVM), which has been receiving increasing attention in recent years. Since its performance is comparable with that of DRAM, it is possible to replace the main memory with NVM, thereby reducing power consumption. However, the price and capacity of NVM are problematic. Therefore, the authors provide a large memory space without performance degradation by combining NVM with other memory devices. In this study, they propose a design for non-volatile main memory systems that use DRAM as a swap space. This enables both high performance and energy efficient memory management through dynamic power management in NVM and DRAM.

Download Full-text