A HIGH PERFORMANCE LARGE SPARSE SYMMETRIC SOLVER FOR THE MESHFREE GALERKIN METHOD

2008 ◽  
Vol 05 (04) ◽  
pp. 533-550 ◽  
Author(s):  
S. C. WU ◽  
H. O. ZHANG ◽  
C. ZHENG ◽  
J. H. ZHANG

One main disadvantage of meshfree methods is that their memory requirement and computational cost are much higher than those of the usual finite element method (FEM). This paper presents an efficient and reliable solver for the large sparse symmetric positive definite (SPD) system resulting from the element-free Galerkin (EFG) approach. A compact mathematical model of heat transfer problems is first established using the EFG procedure. Based on the widely used Successive Over-Relaxation–Preconditioned Conjugate Gradient (SSOR–PCG) scheme, a novel solver named FastPCG is then proposed for solving the SPD linear system. To decrease the computational time in each iteration step, a new algorithm for realizing multiplication of the global stiffness matrix by a vector is presented for this solver. The global matrix and load vector are changed in accordance with a special rule and, in this way, a large account of calculation is avoided on the premise of not decreasing the solution's accuracy. In addition, a double data structure is designed to tackle frequent and unexpected operations of adding or removing nodes in problems of dynamic adaptive or moving high-gradient field analysis. An information matrix is also built to avoid drastic transformation of the coefficient matrix caused by the initial-boundary values. Numerical results show that the memory requirement of the FastPCG solver is only one-third of that of the well-developed AGGJE solver, and the computational cost is comparable with the traditional method with the increas of solution scale and order.

2020 ◽  
Author(s):  
Marcelo Damasceno ◽  
Hélio Ribeiro Neto ◽  
Tatiane Costa ◽  
Aldemir Cavalini Júnior ◽  
Ludimar Aguiar ◽  
...  

Abstract Fluid-structure interaction modeling tools based on computational fluid dynamics (CFD) produce interesting results that can be used in the design of submerged structures. However, the computational cost of simulations associated with the design of submerged offshore structures is high. There are no high-performance platforms devoted to the analysis and optimization of these structures using CFD techniques. In this context, this work aims to present a computational tool dedicated to the construction of Kriging surrogate models in order to represent the time domain force responses of submerged risers. The force responses obtained from high-cost computational simulations are used as outputs for training and validated the surrogate models. In this case, different excitations are applied in the riser aiming at evaluating the representativeness of the obtained Kriging surrogate model. A similar investigation is performed by changing the number of samples and the total time used for training purposes. The present methodology can be used to perform the dynamic analysis in different submerged structures with a low computational cost. Instead of solving the motion equation associated with the fluid-structure system, a Kriging surrogate model is used. A significant reduction in computational time is expected, which allows the realization of different analyses and optimization procedures in a fast and efficient manner for the design of this type of structure.


2016 ◽  
Vol 713 ◽  
pp. 248-253
Author(s):  
M. Caicedo ◽  
J. Oliver ◽  
A.E. Huespe ◽  
O. Lloberas-Valls

Nowadays, the model order reduction techniques have become an intensive research eld because of the increasing interest in the computational modeling of complex phenomena in multi-physic problems, and its conse- quent increment in high-computing demanding processes; it is well known that the availability of high-performance computing capacity is, in most of cases limited, therefore, the model order reduction becomes a novelty tool to overcome this paradigm, that represents an immediately challenge in our research community. In computational multiscale modeling for instance, in order to study the interaction between components, a di erent numerical model has to be solved in each scale, this feature increases radically the computational cost. We present a reduced model based on a multi-scale framework for numerical modeling of the structural failure of heterogeneous quasi-brittle materials using the Strong Discontinuity Approach (CSD). The model is assessed by application to cementitious materials. The Proper Orthogonal Decomposition (POD) and the Reduced Order Integration Cubature are the pro- posed techniques to develop the reduced model, these two techniques work together to reduce both, the complexity and computational time of the high-delity model, in our case the FE2 standard model


2017 ◽  
Vol 42 (2) ◽  
Author(s):  
Tülay Akal ◽  
Vilda Purutçuoğlu ◽  
Gerhard-Wilhelm Weber

AbstractBackground:Microarray technology, aims to measure the amount of changes in transcripted messages for each gene by RNA via quantifying the colour intensity on the arrays. But due to the different experimental conditions, these measurements can include both systematic and random erroneous signals. For this reason, we present a novel gene expression index, called multi-RGX (Multiple-probe Robust Gene Expression Index) for one-channel microarrays.Methods:Multi-RGX, different from other gene expression indices, considers the long-tailed symmetric (LTS) density, covering a wider range of distributions for modelling gene expressions on the log-scale, resulting in robust inference and it takes into account both probe and gene specific intensities. Furthermore, we derive the covariance-variance matrix of model parameters from the observed Fisher information matrix and test the performance of the multi-RGX method in three different datasets.Results:Our method is found to be a promising method regarding its alternatives in terms of accuracy and computational time.Conclusion:Multi-RGX gives accurate results with respect to its alternatives, with a reduction in computational cost.


Author(s):  
Tu Huynh-Kha ◽  
Thuong Le-Tien ◽  
Synh Ha ◽  
Khoa Huynh-Van

This research work develops a new method to detect the forgery in image by combining the Wavelet transform and modified Zernike Moments (MZMs) in which the features are defined from more pixels than in traditional Zernike Moments. The tested image is firstly converted to grayscale and applied one level Discrete Wavelet Transform (DWT) to reduce the size of image by a half in both sides. The approximation sub-band (LL), which is used for processing, is then divided into overlapping blocks and modified Zernike moments are calculated in each block as feature vectors. More pixels are considered, more sufficient features are extracted. Lexicographical sorting and correlation coefficients computation on feature vectors are next steps to find the similar blocks. The purpose of applying DWT to reduce the dimension of the image before using Zernike moments with updated coefficients is to improve the computational time and increase exactness in detection. Copied or duplicated parts will be detected as traces of copy-move forgery manipulation based on a threshold of correlation coefficients and confirmed exactly from the constraint of Euclidean distance. Comparisons results between proposed method and related ones prove the feasibility and efficiency of the proposed algorithm.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Israel F. Araujo ◽  
Daniel K. Park ◽  
Francesco Petruccione ◽  
Adenilton J. da Silva

AbstractAdvantages in several fields of research and industry are expected with the rise of quantum computers. However, the computational cost to load classical data in quantum computers can impose restrictions on possible quantum speedups. Known algorithms to create arbitrary quantum states require quantum circuits with depth O(N) to load an N-dimensional vector. Here, we show that it is possible to load an N-dimensional vector with exponential time advantage using a quantum circuit with polylogarithmic depth and entangled information in ancillary qubits. Results show that we can efficiently load data in quantum devices using a divide-and-conquer strategy to exchange computational time for space. We demonstrate a proof of concept on a real quantum device and present two applications for quantum machine learning. We expect that this new loading strategy allows the quantum speedup of tasks that require to load a significant volume of information to quantum devices.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 223
Author(s):  
Yen-Ling Tai ◽  
Shin-Jhe Huang ◽  
Chien-Chang Chen ◽  
Henry Horng-Shing Lu

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 645
Author(s):  
Muhammad Farooq ◽  
Sehrish Sarfraz ◽  
Christophe Chesneau ◽  
Mahmood Ul Hassan ◽  
Muhammad Ali Raza ◽  
...  

Expectiles have gained considerable attention in recent years due to wide applications in many areas. In this study, the k-nearest neighbours approach, together with the asymmetric least squares loss function, called ex-kNN, is proposed for computing expectiles. Firstly, the effect of various distance measures on ex-kNN in terms of test error and computational time is evaluated. It is found that Canberra, Lorentzian, and Soergel distance measures lead to minimum test error, whereas Euclidean, Canberra, and Average of (L1,L∞) lead to a low computational cost. Secondly, the performance of ex-kNN is compared with existing packages er-boost and ex-svm for computing expectiles that are based on nine real life examples. Depending on the nature of data, the ex-kNN showed two to 10 times better performance than er-boost and comparable performance with ex-svm regarding test error. Computationally, the ex-kNN is found two to five times faster than ex-svm and much faster than er-boost, particularly, in the case of high dimensional data.


2021 ◽  
pp. 146808742199863
Author(s):  
Aishvarya Kumar ◽  
Ali Ghobadian ◽  
Jamshid Nouri

This study assesses the predictive capability of the ZGB (Zwart-Gerber-Belamri) cavitation model with the RANS (Reynolds Averaged Navier-Stokes), the realizable k-epsilon turbulence model, and compressibility of gas/liquid models for cavitation simulation in a multi-hole fuel injector at different cavitation numbers (CN) for diesel and biodiesel fuels. The prediction results were assessed quantitatively by comparison of predicted velocity profiles with those of measured LDV (Laser Doppler Velocimetry) data. Subsequently, predictions were assessed qualitatively by visual comparison of the predicted void fraction with experimental CCD (Charged Couple Device) recorded images. Both comparisons showed that the model could predict fluid behavior in such a condition with a high level of confidence. Additionally, flow field analysis of numerical results showed the formation of vortices in the injector sac volume. The analysis showed two main types of vortex structures formed. The first kind appeared connecting two adjacent holes and is known as “hole-to-hole” connecting vortices. The second type structure appeared as double “counter-rotating” vortices emerging from the needle wall and entering the injector hole facing it. The use of RANS proved to save significant computational cost and time in predicting the cavitating flow with good accuracy.


2021 ◽  
Vol 11 (2) ◽  
pp. 813
Author(s):  
Shuai Teng ◽  
Zongchao Liu ◽  
Gongfa Chen ◽  
Li Cheng

This paper compares the crack detection performance (in terms of precision and computational cost) of the YOLO_v2 using 11 feature extractors, which provides a base for realizing fast and accurate crack detection on concrete structures. Cracks on concrete structures are an important indicator for assessing their durability and safety, and real-time crack detection is an essential task in structural maintenance. The object detection algorithm, especially the YOLO series network, has significant potential in crack detection, while the feature extractor is the most important component of the YOLO_v2. Hence, this paper employs 11 well-known CNN models as the feature extractor of the YOLO_v2 for crack detection. The results confirm that a different feature extractor model of the YOLO_v2 network leads to a different detection result, among which the AP value is 0.89, 0, and 0 for ‘resnet18’, ‘alexnet’, and ‘vgg16’, respectively meanwhile, the ‘googlenet’ (AP = 0.84) and ‘mobilenetv2’ (AP = 0.87) also demonstrate comparable AP values. In terms of computing speed, the ‘alexnet’ takes the least computational time, the ‘squeezenet’ and ‘resnet18’ are ranked second and third respectively; therefore, the ‘resnet18’ is the best feature extractor model in terms of precision and computational cost. Additionally, through the parametric study (influence on detection results of the training epoch, feature extraction layer, and testing image size), the associated parameters indeed have an impact on the detection results. It is demonstrated that: excellent crack detection results can be achieved by the YOLO_v2 detector, in which an appropriate feature extractor model, training epoch, feature extraction layer, and testing image size play an important role.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 627
Author(s):  
David Marquez-Viloria ◽  
Luis Castano-Londono ◽  
Neil Guerrero-Gonzalez

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.


Sign in / Sign up

Export Citation Format

Share Document