scholarly journals A GPU based multidimensional amplitude analysis to search for tetraquark candidates

2020 ◽  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

2020 ◽  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

2021 ◽  
Vol 8 (1) ◽  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

AbstractThe demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the $$B^0 \rightarrow J/\psi K \pi$$ B 0 → J / ψ K π decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

2020 ◽  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data while physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the quark structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays that can also be seamlessly adapted for other similar analyses. The GooFit fitter running on GPUs shows a remarkable speed-up in the computing performance when compared to a ROOT/RooFit implementation of the same, running on multicore CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

2021 ◽  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons.Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously.In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the B0 → J/ψKπ decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

2021 ◽  
Vol 38 (2) ◽  
Nicholas Torres Okita ◽  
Tiago A. Coimbra ◽  
José Ribeiro ◽  
Martin Tygel

ABSTRACT. The usage of graphics processing units is already known as an alternative to traditional multi-core CPU processing, offering faster performance in the order of dozens of times in parallel tasks. Another new computing paradigm is cloud computing usage as a replacement to traditional in-house clusters, enabling seemingly unlimited computation power, no maintenance costs, and cutting-edge technology, dynamically on user demand. Previously those two tools were used to accelerate the estimation of Common Reflection Surface (CRS) traveltime parameters, both in zero-offset and finite-offset domain, delivering very satisfactory results with large time savings from GPU devices alongside cost savings on the cloud. This work extends those results by using GPUs on the cloud to accelerate the Offset Continuation Trajectory (OCT) traveltime parameter estimation. The results have shown that the time and cost savings from GPU devices’ usage are even larger than those seen in the CRS results, being up to fifty times faster and sixty times cheaper. This analysis reaffirms that it is possible to save both time and money when using GPU devices on the cloud and concludes that the larger the data sets are and the more computationally intensive the traveltime operators are, we can see larger improvements.Keywords: cloud computing, GPU, seismic processing. Estendendo o uso de placas gráficas na nuvem para economias em regularização de dados sísmicosRESUMO. O uso de aceleradores gráficos para processamento já é uma alternativa conhecida ao uso de CPUs multi-cores, oferecendo um desempenho na ordem de dezenas de vezes mais rápido em tarefas paralelas. Outro novo paradigma de computação é o uso da nuvem computacional como substituta para os tradicionais clusters internos, possibilitando o uso de um poder computacional aparentemente infinito sem custo de manutenção e com tecnologia de ponta, dinamicamente sob demanda de usuário. Anteriormente essas duas ferramentas foram utilizadas para acelerar a estimação de parâmetros do tempo de trânsito de Common Reflection Surface (CRS), tanto em zero-offset quanto em offsets finitos, obtendo resultados satisfatórios com amplas economias tanto de tempo quanto de dinheiro na nuvem. Este trabalho estende os resultados obtidos anteriormente, desta vez utilizando GPUs na nuvem para acelerar a estimação de parâmetros do tempo de trânsito em Offset Continuation Trajectory (OCT). Os resultados obtidos mostraram que as economias de tempo e dinheiro foram ainda maiores do que aquelas obtidas no CRS, sendo até cinquenta vezes mais rápido e sessenta vezes mais barato. Esta análise reafirma que é possível economizar tanto tempo quanto dinheiro usando GPUs na nuvem, e conclui que quanto maior for o dado e quanto mais computacionalmente intenso for o operador, maiores serão os ganhos de desempenho observados e economias.Palavras-chave: computação em nuvem, GPU, processamento sísmico. 

Christopher J. Reid ◽  
Biswanath Samanta ◽  
Christopher Kadlec

The use of robots in complex tasks such as search and rescue operations is becoming more and more common. These robots often work independently with no cooperation with other robots or control software, and are very limited in their ability to perform dynamic tasks and interact with both humans and other robots. To this end, a system must be developed to facilitate the cooperation of heterogeneous robots to complete complex tasks. To model and study human-robot and robot-robot interactions in a multi-system environment, a robust network infrastructure must be implemented to support the broad nature of these studies. The work presented here details the creation of a cloud-based infrastructure designed to support the introduction and implementation of multiple heterogeneous robots to the environment utilizing the Robot Operating System (ROS). Implemented robots include both ground-based (e.g. Turtlebot) and air-based (e.g Parrot ARDrone2.0) systems. Additional hardware is also implemented, such as embedded vision systems, host computers to support virtual machines for software implementation, and machines with graphics processing units (GPUs) for additional computational resources. Control software for the robots is implemented in the system with complexities ranging from simple teleoperation to skeletal tracking and neural network simulators. A robust integration of multiple heterogeneous components, including both hardware and software, is achieved.

2013 ◽  
pp. 488-509
Lodovico Marziale ◽  
Santhi Movva ◽  
Golden G. Richard ◽  
Vassil Roussev ◽  
Loren Schwiebert

Digital forensics comprises the set of techniques to recover, preserve, and examine digital evidence, and has applications in a number of important areas, including investigation of child exploitation, identity theft, counter-terrorism, and intellectual property disputes. Digital forensics tools must exhaustively examine and interpret data at a low level, because data of evidentiary value may have been deleted, partially overwritten, obfuscated, or corrupted. While forensics investigation is typically seen as an off-line activity, improving case turnaround time is crucial, because in many cases lives or livelihoods may hang in the balance. Furthermore, if more computational resources can be brought to bear, we believe that preventative network security (which must be performed on-line) and digital forensics can be merged into a common research focus. In this chapter we consider recent hardware trends and argue that multicore CPUs and Graphics Processing Units (GPUs) offer one solution to the problem of maximizing available compute resources.

2015 ◽  
Vol 8 (9) ◽  
pp. 2815-2827 ◽  
S. Xu ◽  
X. Huang ◽  
L.-Y. Oey ◽  
F. Xu ◽  
H. Fu ◽  

Abstract. Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.

2020 ◽  
Ryan N Gutenkunst

Extracting insight from population genetic data often demands computationally intensive modeling. dadi is a popular program for fitting models of demographic history and natural selection to such data. Here, I show that running dadi on a Graphics Processing Unit (GPU) can speed computation by orders of magnitude compared to the CPU implementation, with minimal user burden. This speed increase enables the analysis of more complex models, which motivated the extension of dadi to four- and five-population models. Remarkably, dadi performs almost as well on inexpensive consumer-grade GPUs as on expensive server-grade GPUs. GPU computing thus offers large and accessible benefits to the community of dadi users. This functionality is available in dadi version 2.1.0.

2020 ◽  
Vol 245 ◽  
pp. 05037
Caterina Marcon ◽  
Oxana Smirnova ◽  
Servesh Muralidharan

Experimental observations and advanced computer simulations in High Energy Physics (HEP) paved the way for the recent discoveries at the Large Hadron Collider (LHC) at CERN. Currently, Monte Carlo simulations account for a very significant amount of computational resources of the Worldwide LHC Computing Grid (WLCG). The current growth in available computing performance will not be enough to fulfill the expected demand for the forthcoming High Luminosity run (HL-LHC). More efficient simulation codes are therefore required. This study focuses on evaluating the impact of different build methods on the simulation execution time. The Geant4 toolkit, the standard simulation code for the LHC experiments, consists of a set of libraries which can be either dynamically or statically linked to the simulation executable. Dynamic libraries are currently the preferred build method. In this work, three versions of the GCC compiler, namely 4.8.5, 6.2.0 and 8.2.0 have been used. In addition, a comparison between four optimization levels (Os, O1, O2 and O3) has also been performed. Static builds for all the GCC versions considered, exhibit a reduction in execution times of about 10%. Switching to newer GCC version results in an average of 30% improvement in the execution time regardless of the build type. In particular, a static build with GCC 8.2.0 leads to an improvement of about 34% with respect to the default configuration (GCC 4.8.5, dynamic, O2). The different GCC optimization flags do not affect the execution times.

Sign in / Sign up

Export Citation Format

Share Document