Parallel TLM Procedures for NVIDIA GPU

Author(s):  
Poman So
Keyword(s):  
Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2595
Author(s):  
Balakrishnan Ramalingam ◽  
Abdullah Aamir Hayat ◽  
Mohan Rajesh Elara ◽  
Braulio Félix Gómez ◽  
Lim Yi ◽  
...  

The pavement inspection task, which mainly includes crack and garbage detection, is essential and carried out frequently. The human-based or dedicated system approach for inspection can be easily carried out by integrating with the pavement sweeping machines. This work proposes a deep learning-based pavement inspection framework for self-reconfigurable robot named Panthera. Semantic segmentation framework SegNet was adopted to segment the pavement region from other objects. Deep Convolutional Neural Network (DCNN) based object detection is used to detect and localize pavement defects and garbage. Furthermore, Mobile Mapping System (MMS) was adopted for the geotagging of the defects. The proposed system was implemented and tested with the Panthera robot having NVIDIA GPU cards. The experimental results showed that the proposed technique identifies the pavement defects and litters or garbage detection with high accuracy. The experimental results on the crack and garbage detection are presented. It is found that the proposed technique is suitable for deployment in real-time for garbage detection and, eventually, sweeping or cleaning tasks.


2019 ◽  
Author(s):  
Tomás Antonio Valencia Pérez ◽  
Javier Miguel Hernández López ◽  
Eduardo Moreno Barbosa ◽  
Mario Iván Martínez Hernández ◽  
Guillermo Tejeda Muñoz ◽  
...  

Author(s):  
Sreeram Potluri ◽  
Anshuman Goswami ◽  
Davide Rossetti ◽  
C.J. Newburn ◽  
Manjunath Gorentla Venkata ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Nathanael Schaeffer

<p>Most of the new supercomputers now use acceleration technology such as GPUs. They promise much higher performance than traditional CPU-only servers, both in terms of floating point operation throughput and memory bandwidth. Furthermore, the electric consumption is significantly reduced, resulting in lower carbon emissions.<br>However, such high computation speeds can only be achieved if a set of more or less stringent rules are followed with respect to memory access and program flow. As a consequence some algorithms more easily approach peak performance.</p><p>Here, we present the results of an effort to achieve high performance on recent nvidia GPU accelerators for the spherical harmonic transform. The spherical harmonic transform can be split into a Legendre transform (which is compute bound) and a Fourier transform (which is memory bound).<br>By taking advantage of recent algorithmic improvements as well as by tuning the Fourier transform, the can now compute a full forward or backward spherical harmonic transform up to degree 8191 on a single 16GB Volta GPU in less than 0.35 seconds.<br>For lower resolution (up to degree 1023), a single Volta GPU performs a full transform more than 3 times faster than a 48-cores dual socket Skylake Xeon Platinum server.</p><p>We also present results of an ongoing effort to port the (simulation of planetary core fluid and magnetic field dynamics) to GPU-accelerated computers.</p>


2011 ◽  
pp. 1339-1345
Author(s):  
Laxmikant V. Kalé ◽  
Abhinav Bhatele ◽  
Eric J. Bohm ◽  
James C. Phillips ◽  
David H. Bailey ◽  
...  
Keyword(s):  

2015 ◽  
Vol 25 (03) ◽  
pp. 1541001 ◽  
Author(s):  
Christian Obrecht ◽  
Bernard Tourancheau ◽  
Frédéric Kuznik

A portable OpenCL implementation of the lattice Boltzmann method targeting emerging many-core architectures is described. The main purpose of this work is to evaluate and compare the performance of this code on three mainstream hardware architectures available today, namely an Intel CPU, an Nvidia GPU, and the Intel Xeon Phi. Because of the similarities between OpenCL and CUDA, we chose to follow some of the strategies devised to implement efficient lattice Boltzmann solvers on Nvidia GPU, while remaining as generic as possible. Being fairly configurable, this program makes possible to ascertain the best options for each hardware platforms. The achieved performance is quite satisfactory for both the CPU and the GPU. For the Xeon Phi however, the results are below expectations. Nevertheless, comparison with data from the literature shows that on this architecture the code seems memory-bound.


Sign in / Sign up

Export Citation Format

Share Document