Performance Analysis of Effective Symbolic Methods for Solving Band Matrix SLAEs

This paper presents an experimental performance study of implementations of three symbolic algorithms for solving band matrix systems of linear algebraic equations with heptadiagonal, pentadiagonal, and tridiagonal coefficient matrices. The only assumption on the coefficient matrix in order for the algorithms to be stable is nonsingularity. These algorithms are implemented using the GiNaC library of C++ and the SymPy library of Python, considering five different data storing classes. Performance analysis of the implementations is done using the high-performance computing (HPC) platforms “HybriLIT” and “Avitohol”. The experimental setup and the results from the conducted computations on the individual computer systems are presented and discussed. An analysis of the three algorithms is performed.

Download Full-text

METHOD OF SOLVING SYSTEM OF LINEAR ALGEBRAIC EQUATIONS WITH USE OF HIGH PERFORMANCE COMPUTING ON GPU

Issues of radio electronics ◽

10.21778/2218-5453-2019-5-112-115 ◽

2019 ◽

pp. 112-115

Author(s):

M. Z. Benenson

Keyword(s):

High Performance ◽

Operation Time ◽

General Purpose ◽

Algebraic Equations ◽

Graphics Processors ◽

Computing Platform ◽

Software Interface ◽

Linear Algebraic Equations ◽

Gauss Method ◽

Performance Computing

The  article  discusses  the  use  of  graphics  processing  units  for  solving  large  system  of  linear  algebraic  equations  (SLAE).  A heterogeneous multiprocessor computing platform produced by the NIIVK, whose architecture allows the integration of general‑ purpose microprocessor modules with graphics processor modules was used as an equipment for solving SLAEs. The description of the SLAE solution program, developed on the basis of the CUBLAS CUDA software interface library, is given. A method is proposed for increasing the accuracy of calculations of linear systems based on the use of a modified Gauss method. It has been established that the use of the modified Gauss method practically does not increase the program operation time with a significant increase in the accuracy of calculations. It is concluded that the use of graphics processors for solving SLAEs allows processing matrices of a larger size compared to the use of general‑purpose microprocessors.

Download Full-text

ROTATION METHOD FOR RECONSTRUCTING PROCESS AND FIELD FROM IMPERFECT DATA

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127404010941 ◽

2004 ◽

Vol 14 (08) ◽

pp. 2991-2997 ◽

Cited By ~ 14

Author(s):

PETER C. CHU ◽

LEONID M. IVANOV ◽

TATYANA M. MARGOLINA

Keyword(s):

Condition Number ◽

A Priori ◽

Coefficient Matrix ◽

Minimum Condition ◽

Lorenz Attractor ◽

Algebraic Equations ◽

Signal Ratio ◽

Imperfect Data ◽

Linear Algebraic Equations ◽

Noise Statistics

Reconstruction of processes and fields from noisy data is to solve a set of linear algebraic equations. Three factors affect the accuracy of reconstruction: (a) a large condition number of the coefficient matrix, (b) high noise-to-signal ratio in the source term, and (c) no a priori knowledge of noise statistics. To improve reconstruction accuracy, the set of linear algebraic equations is transformed into a new set with minimum condition number and noise-to-signal ratio using the rotation matrix. The procedure does not require any knowledge of low-order statistics of noises. Several examples including highly distorted Lorenz attractor illustrate the benefit of using this procedure.

Download Full-text

Containerization and wrapping of a mass spectra prediction workflow

10.7287/peerj.preprints.2528v1 ◽

2016 ◽

Author(s):

Jens Krüger ◽

Oliver Kohlbacher

Keyword(s):

Small Molecules ◽

High Performance Computing ◽

High Performance ◽

Mass Spectra ◽

Practical Experiences ◽

The Individual ◽

Performance Computing

Practical experiences are reported about implementing a workflow for the prediction of mass spectra. QCEIMS is used to simulate the fragmentation trajectories consequently leading to predicted mass spectra for small molecules, such as metabolites. The individual calculations are embedded into UNICORE workflow nodes using Docker containerization for the applications themselves. Challenges, caveats, but also advantages are discussed, providing guidance for the deployment of a scientific protocol on high performance computing resources.

Download Full-text

Experimenting with reproducibility in bioinformatics

10.1101/143503 ◽

2017 ◽

Author(s):

Yang-Min Kim ◽

Jean-Baptiste Poline ◽

Guillaume Dumas

Keyword(s):

High Performance ◽

Scientific Activity ◽

Scientific Data ◽

The Individual ◽

High Performance Computing Cluster ◽

Performance Computing ◽

Scientific Fields ◽

Research Efficiency

AbstractReproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of the scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our attempt to reproduce a promising bioinformatics method [1] and illustrate the challenges to use a published method for which code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the method in Python to avoid dependency on a MATLAB licence and ease the execution of the code on HPCC (High-Performance Computing Cluster). Third, we assessed reusability of our reimplementation and the quality of our documentation. Then, we experimented with our own software and tested how easy it would be to start from our implementation to reproduce the results, hence attempting to estimate the robustness of the reproducibility. Finally, in a second part, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective level.Availabilitylast version of StratiPy (Python) with two examples of reproducibility are available at GitHub [2][email protected]

Download Full-text

Modern computer technologies for modeling the life cycle of buildings based on parallel calculations

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-1-22-27 ◽

2019 ◽

Keyword(s):

Life Cycle ◽

High Performance ◽

Strain Analysis ◽

Positive Definite Matrix ◽

Algebraic Equations ◽

Software Complex ◽

Parallel Calculations ◽

Modern Computer ◽

Linear Algebraic Equations ◽

Geological Conditions

The article deals with high-performance information technology (HPC) for the problems of stress-strain analysis at all stages of the life cycle of buildings and structures: construction, operation and reconstruction. The results of numerical simulation of high buildings using software as a processor component based on a new hybrid algorithm for solving systems of linear algebraic equations [1] with a symmetric positive-definite matrix that combines computation on multi-core processors and graphs. It has been found that to accelerate the calculations, hybrid systems that combine multi-core CPUs with accelerator coprocessors, including GPUs, are promising [5]. To test the effectiveness of the proposed parallel algorithm for solving systems of linear algebraic equations [1], numerical experiments were carried out at the most dangerous loads of a 27-story building. Results of numerical researches with use for preprocessor (input of initial data) and postprocessor (output of results of calculations) of processing of the LIRA-SAPR software complex are presented [2, 4, 6]. The results of numerical studies of the behavior of structures of high buildings have shown a multiple reduction in the time of solving systems of linear algebraic equations with symmetric matrices on multiprocessor (multi-core) computers with graphical accelerators using the proposed hybrid algorithms [1]. High-performance technologies based on parallel calculations give more effect than more complex processes: modeling of life cycle of high buildings, bridges, especially complex structures of NPPs, etc. for static and dynamic loads, including emergencies in normal and difficult geological conditions, which make up 70% of Ukraine's territories.

Download Full-text

Performance Analysis of Ivshmem for High-Performance Computing in Virtual Machines

Journal of Physics Conference Series ◽

10.1088/1742-6596/960/1/012015 ◽

2018 ◽

Vol 960 ◽

pp. 012015

Author(s):

Pavle Ivanovic ◽

Harald Richter

Keyword(s):

Performance Analysis ◽

High Performance Computing ◽

High Performance ◽

Virtual Machines ◽

Performance Computing

Download Full-text

Parallel PPI Prediction Performance Study on HPC Platforms

Journal of Circuits System and Computers ◽

10.1142/s0218126615500747 ◽

2015 ◽

Vol 24 (05) ◽

pp. 1550074 ◽

Cited By ~ 1

Author(s):

Ali A. El-Moursy ◽

Wael S. Afifi ◽

Fadi N. Sibai ◽

Salwa M. Nassar

Keyword(s):

Protein Interactions ◽

Execution Time ◽

High Performance ◽

Large Scale ◽

Parallel Implementation ◽

Prediction Method ◽

Protein Protein Interactions ◽

Performance Study ◽

Ppi Prediction ◽

Performance Computing

STRIKE is an algorithm which predicts protein–protein interactions (PPIs) and determines that proteins interact if they contain similar substrings of amino acids. Unlike other methods for PPI prediction, STRIKE is able to achieve reasonable improvement over the existing PPI prediction methods. Although its high accuracy as a PPI prediction method, STRIKE consumes a large execution time and hence it is considered to be a compute-intensive application. In this paper, we develop and implement a parallel STRIKE algorithm for high-performance computing (HPC) systems. Using a large-scale cluster, the execution time of the parallel implementation of this bioinformatics algorithm was reduced from about a week on a serial uniprocessor machine to about 16.5 h on 16 computing nodes, down to about 2 h on 128 parallel nodes. Communication overheads between nodes are thoroughly studied.

Download Full-text

The Recursive Algorithms of Yule-Walker Equation in Generalized Stationary Prediction

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3070 ◽

2013 ◽

Vol 756-759 ◽

pp. 3070-3073 ◽

Cited By ~ 1

Author(s):

Er Yan Zhang ◽

Xiao Feng Zhu

Keyword(s):

Toeplitz Matrix ◽

Gaussian Elimination ◽

Recursive Algorithm ◽

Toeplitz Matrices ◽

Recursive Formula ◽

Coefficient Matrix ◽

Recursive Algorithms ◽

Algebraic Equations ◽

Linear Algebraic Equations ◽

Order Of Magnitude

Toeplitz matrix arises in a remarkable variety of applications such as signal processing, time series analysis, image processing. Yule-Walker equation in generalized stationary prediction is linear algebraic equations that use Toeplitz matrix as coefficient matrix. Making better use of the structure of Toeplitz matrix, we present a recursive algorithm of linear algebraic equations from by using Toeplitz matrix as coefficient matrix , and also offer the proof of the recursive formula. The algorithm, making better use of the structure of Toeplitz matrices, effectively reduces calculation cost. For n-order Toeplitz coefficient matrix, the computational complexity of usual Gaussian elimination is about , while this algorithm is about , decreasing of one order of magnitude.

Download Full-text

Parallel Algorithms for Solving Linear Systems on Hybrid Computers

Cybernetics and Computer Technologies ◽

10.34229/2707-451x.20.2.6 ◽

2020 ◽

pp. 53-66

Author(s):

Alexander Khimich ◽

Victor Polyanko ◽

Tamara Chistyakova

Keyword(s):

Parallel Algorithms ◽

Computer Architecture ◽

Linear Systems ◽

High Performance ◽

Numerical Data ◽

Algebraic Equations ◽

Existing Problems ◽

Mathematical Properties ◽

Linear Algebraic Equations ◽

Computer Problem

Introduction. At present, in science and technology, new computational problems constantly arise with large volumes of data, the solution of which requires the use of powerful supercomputers. Most of these problems come down to solving systems of linear algebraic equations (SLAE). The main problem of solving problems on a computer is to obtain reliable solutions with minimal computing resources. However, the problem that is solved on a computer always contains approximate data regarding the original task (due to errors in the initial data, errors when entering numerical data into the computer, etc.). Thus, the mathematical properties of a computer problem can differ significantly from the properties of the original problem. It is necessary to solve problems taking into account approximate data and analyze computer results. Despite the significant results of research in the field of linear algebra, work in the direction of overcoming the existing problems of computer solving problems with approximate data is further aggravated by the use of contemporary supercomputers, do not lose their significance and require further development. Today, the most high-performance supercomputers are parallel ones with graphic processors. The architectural and technological features of these computers make it possible to significantly increase the efficiency of solving problems of large volumes at relatively low energy costs. The purpose of the article is to develop new parallel algorithms for solving systems of linear algebraic equations with approximate data on supercomputers with graphic processors that implement the automatic adjustment of the algorithms to the effective computer architecture and the mathematical properties of the problem, identified in the computer, as well with estimates of the reliability of the results. Results. A methodology for creating parallel algorithms for supercomputers with graphic processors that implement the study of the mathematical properties of linear systems with approximate data and the algorithms with the analysis of the reliability of the results are described. The results of computational experiments on the SKIT-4 supercomputer are presented. Conclusions. Parallel algorithms have been created for investigating and solving linear systems with approximate data on supercomputers with graphic processors. Numerical experiments with the new algorithms showed a significant acceleration of calculations with a guarantee of the reliability of the results. Keywords: systems of linear algebraic equations, hybrid algorithm, approximate data, reliability of the results, GPU computers.

Download Full-text

Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

International Journal of Reconfigurable Computing ◽

10.1155/2019/3679839 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13

Author(s):

Hamish J. Macintosh ◽

Jasmine E. Banks ◽

Neil A. Kelson

Keyword(s):

Linear System ◽

Linear Systems ◽

High Performance ◽

Performance Comparison ◽

Third Party ◽

Global Memory ◽

Diagonally Dominant ◽

The Individual ◽

Performance Computing ◽

Tridiagonal Systems

Solving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementations of parallel linear system solvers, power consumption concerns are increasing in priority. This work presents the oclspkt routine. The oclspkt routine is a heterogeneous OpenCL implementation of the truncated SPIKE algorithm that can use FPGAs, GPUs, and CPUs to concurrently accelerate the solving of diagonally dominant tridiagonal linear systems. The routine is designed to solve tridiagonal systems of any size and can dynamically allocate optimised workloads to each accelerator in a heterogeneous environment depending on the accelerator’s compute performance. The truncated SPIKE FPGA solver is developed first for optimising OpenCL device kernel performance, global memory bandwidth, and interleaved host to device memory transactions. The FPGA OpenCL kernel code is then refactored and optimised to best exploit the underlying architecture of the CPU and GPU. An optimised TDMA OpenCL kernel is also developed to act as a serial baseline performance comparison for the parallel truncated SPIKE kernel since no FPGA tridiagonal solver capable of solving large tridiagonal systems was available at the time of development. The individual GPU, CPU, and FPGA solvers of the oclspkt routine are 110%, 150%, and 170% faster, respectively, than comparable device-optimised third-party solvers and applicable baselines. Assessing heterogeneous combinations of compute devices, the GPU + FPGA combination is found to have the best compute performance and the FPGA-only configuration is found to have the best overall estimated energy efficiency.

Download Full-text