UNAT: UNstructured Acceleration Toolkit on SW26010 many-core processor

2020 ◽  
Vol 37 (9) ◽  
pp. 3187-3208
Author(s):  
Hongbin Liu ◽  
Hu Ren ◽  
Hanfeng Gu ◽  
Fei Gao ◽  
Guangwen Yang

Purpose The purpose of this paper is to provide an automatic parallelization toolkit for unstructured mesh-based computation. Among all kinds of mesh types, unstructured meshes are dominant in engineering simulation scenarios and play an essential role in scientific computations for their geometrical flexibility. However, the high-fidelity applications based on unstructured grids are still time-consuming, no matter for programming or running. Design/methodology/approach This study develops an efficient UNstructured Acceleration Toolkit (UNAT), which provides friendly high-level programming interfaces and elaborates lower level implementation on the target hardware to get nearly hand-optimized performance. At the present state, two efficient strategies, a multi-level blocks method and a row-subsections method, are designed and implemented on Sunway architecture. Random memory access and write–write conflict issues of unstructured meshes have been handled by partitioning, coloring and other hardware-specific techniques. Moreover, a data-reuse mechanism is developed to increase the computational intensity and alleviate the memory bandwidth bottleneck. Findings The authors select sparse matrix-vector multiplication as a performance benchmark of UNAT across different data layouts and different matrix formats. Experimental results show that the speed-ups reach up to 26× compared to single management processing element, and the utilization ratio tests indicate the capability of achieving nearly hand-optimized performance. Finally, the authors adopt UNAT to accelerate a well-tuned unstructured solver and obtain speed-ups of 19× and 10× on average for main kernels and overall solver, respectively. Originality/value The authors design an unstructured mesh toolkit, UNAT, to link the hardware and numerical algorithm, and then, engineers can focus on the algorithms and solvers rather than the parallel implementation. For the many-core processor SW26010 of the fastest supercomputer in China, UNAT yields up to 26× speed-ups and achieves nearly hand-optimized performance.

Author(s):  
Saira Banu Jamalmohammed ◽  
Lavanya K. ◽  
Sumaiya Thaseen I. ◽  
Biju V.

Sparse matrix-vector multiplication (SpMV) is a challenging computational kernel in linear algebra applications, like data mining, image processing, and machine learning. The performance of this kernel is greatly dependent on the size of the input matrix and the underlying hardware features. Various sparse matrix storage formats referred to commonly as sparse formats have been proposed in the literature to reduce the size of the matrix. In modern multi-core and many-core architectures, the performance of the kernel is mainly dependent on memory wall and power wall problem. Normally review on sparse formats is done with specific architecture or with specific application. This chapter presents a comparative study on various sparse formats in cross platform architecture like CPU, graphics processor unit (GPU), and single instruction multiple data stream (SIMD) registers. Space complexity analysis of various formats with its representation is discussed. Finally, the merits and demerits of each format have been summarized into a table.


Sign in / Sign up

Export Citation Format

Share Document