scholarly journals The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

2019 ◽  
Vol 29 (2) ◽  
pp. 407-419
Author(s):  
Beata Bylina ◽  
Jarosław Bylina

Abstract The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl’s law.

2003 ◽  
Vol 11 (2) ◽  
pp. 95-104 ◽  
Author(s):  
C. Addison ◽  
Y. Ren ◽  
M. van Waveren

Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shapes. Inherently this means that parallel implementations have to exploit parallelism wherever it is present. While OpenMP allows relatively fine grain parallelism to be exploited in a shared memory environment it currently lacks features to make it easy to partition computation over multiple array indices or to overlap sequential and parallel computations. The inherent flexible nature of shared memory paradigms such as OpenMP poses other difficulties when it becomes necessary to optimise performance across successive parallel library calls. Notions borrowed from distributed memory paradigms, such as explicit data distributions help address some of these problems, but the focus on data rather than work distribution appears misplaced in an SMP context.


2015 ◽  
Vol 13 (1) ◽  
Author(s):  
Feng Wang ◽  
Deshu Sun

AbstractThe theory of Schur complement plays an important role in many fields, such as matrix theory and control theory. In this paper, applying the properties of Schur complement, some new estimates of diagonally dominant degree on the Schur complement of I(II)-block strictly diagonally dominant matrices and I(II)-block strictly doubly diagonally dominant matrices are obtained, which improve some relative results in Liu [Linear Algebra Appl. 435(2011) 3085-3100]. As an application, we present several new eigenvalue inclusion regions for the Schur complement of matrices. Finally, we give a numerical example to illustrate the advantages of our derived results.


2010 ◽  
Vol 18 (1) ◽  
pp. 35-50 ◽  
Author(s):  
Hatem Ltaief ◽  
Jakub Kurzak ◽  
Jack Dongarra ◽  
Rosa M. Badia

The objective of this paper is to describe, in the context of multicore architectures, three different scheduler implementations for the two-sided linear algebra transformations, in particular the Hessenberg and Bidiagonal reductions which are the first steps for the standard eigenvalue problems and the singular value decompositions respectively. State-of-the-art dense linear algebra softwares, such as the LAPACK and ScaLAPACK libraries, suffer performance losses on multicore processors due to their inability to fully exploit thread-level parallelism. At the same time the fine-grain dataflow model gains popularity as a paradigm for programming multicore architectures. Buttari et al. (Parellel Comput. Syst. Appl. 35 (2009), 38–53) introduced the concept oftile algorithmsin which parallelism is no longer hidden inside Basic Linear Algebra Subprograms but is brought to the fore to yield much better performance. Along with efficient scheduling mechanisms for data-driven execution, these tile two-sided reductions achieve high performance computing by reaching up to 75% of the DGEMM peak on a 12000×12000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of thetile algorithmsapproach for two-sided transformations is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrices to the required forms.


Author(s):  
Yuzhu Wang ◽  
Akihiro Tanaka ◽  
Akiko Yoshise

AbstractWe develop techniques to construct a series of sparse polyhedral approximations of the semidefinite cone. Motivated by the semidefinite (SD) bases proposed by Tanaka and Yoshise (Ann Oper Res 265:155–182, 2018), we propose a simple expansion of SD bases so as to keep the sparsity of the matrices composing it. We prove that the polyhedral approximation using our expanded SD bases contains the set of all diagonally dominant matrices and is contained in the set of all scaled diagonally dominant matrices. We also prove that the set of all scaled diagonally dominant matrices can be expressed using an infinite number of expanded SD bases. We use our approximations as the initial approximation in cutting plane methods for solving a semidefinite relaxation of the maximum stable set problem. It is found that the proposed methods with expanded SD bases are significantly more efficient than methods using other existing approximations or solving semidefinite relaxation problems directly.


2017 ◽  
Vol 5 (1) ◽  
pp. 158-201
Author(s):  
Jan Brandts ◽  
Apo Cihangir

Abstract The convex hull of n + 1 affinely independent vertices of the unit n-cube In is called a 0/1-simplex. It is nonobtuse if none its dihedral angles is obtuse, and acute if additionally none of them is right. In terms of linear algebra, acute 0/1-simplices in In can be described by nonsingular 0/1-matrices P of size n × n whose Gramians G = PTP have an inverse that is strictly diagonally dominant, with negative off-diagonal entries [6, 7]. The first part of this paper deals with giving a detailed description of how to efficiently compute, by means of a computer program, a representative from each orbit of an acute 0/1-simplex under the action of the hyperoctahedral group Bn [17] of symmetries of In. A side product of the investigations is a simple code that computes the cycle index of Bn, which can in explicit form only be found in the literature [11] for n ≤ 6. Using the computed cycle indices for B3, . . . ,B11 in combination with Pólya’s theory of enumeration shows that acute 0/1-simplices are extremely rare among all 0/1-simplices. In the second part of the paper, we study the 0/1-matrices that represent the acute 0/1-simplices that were generated by our code from a mathematical perspective. One of the patterns observed in the data involves unreduced upper Hessenberg 0/1-matrices of size n × n, block-partitioned according to certain integer compositions of n. These patterns will be fully explained using a so-called One Neighbor Theorem [4]. Additionally, we are able to prove that the volumes of the corresponding acute simplices are in one-to-one correspondence with the part of Kepler’s Tree of Fractions [1, 24] that enumerates ℚ ⋂ (0, 1). Another key ingredient in the proofs is the fact that the Gramians of the unreduced upper Hessenberg matrices involved are strictly ultrametric [14, 26] matrices.


2020 ◽  
Vol 13 (12) ◽  
pp. 6265-6284
Author(s):  
Emmanuel Wyser ◽  
Yury Alkhimenkov ◽  
Michel Jaboyedoff ◽  
Yury Y. Podladchikov

Abstract. We present an efficient MATLAB-based implementation of the material point method (MPM) and its most recent variants. MPM has gained popularity over the last decade, especially for problems in solid mechanics in which large deformations are involved, such as cantilever beam problems, granular collapses and even large-scale snow avalanches. Although its numerical accuracy is lower than that of the widely accepted finite element method (FEM), MPM has proven useful for overcoming some of the limitations of FEM, such as excessive mesh distortions. We demonstrate that MATLAB is an efficient high-level language for MPM implementations that solve elasto-dynamic and elasto-plastic problems. We accelerate the MATLAB-based implementation of the MPM method by using the numerical techniques recently developed for FEM optimization in MATLAB. These techniques include vectorization, the use of native MATLAB functions and the maintenance of optimal RAM-to-cache communication, among others. We validate our in-house code with classical MPM benchmarks including (i) the elastic collapse of a column under its own weight; (ii) the elastic cantilever beam problem; and (iii) existing experimental and numerical results, i.e. granular collapses and slumping mechanics respectively. We report an improvement in performance by a factor of 28 for a vectorized code compared with a classical iterative version. The computational performance of the solver is at least 2.8 times greater than those of previously reported MPM implementations in Julia under a similar computational architecture.


Author(s):  
Ramon Amela ◽  
Cristian Ramon-Cortes ◽  
Jorge Ejarque ◽  
Javier Conejero ◽  
Rosa M. Badia

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.


Sign in / Sign up

Export Citation Format

Share Document