scholarly journals MetaMap: An atlas of metatranscriptomic reads in human disease-related RNA-seq data

2018 ◽  
Author(s):  
LM Simon ◽  
S Karg ◽  
AJ Westermann ◽  
M Engel ◽  
AHA Elbehery ◽  
...  

AbstractBackgroundWith the advent of the age of big data in bioinformatics, large volumes of data and high performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts, but its generic nature also enables the detection of microbial and viral transcripts.FindingsWe developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from 6 independent controlled infection experiments of cell line models and comparison with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems. The resulting data of this large-scale re-analysis are made available in the presented MetaMap resource.ConclusionsOur results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation towards the role of the microbiome in human disease.

2016 ◽  
Author(s):  
Arnald Alonso ◽  
Brittany N. Lasseigne ◽  
Kelly Williams ◽  
Josh Nielsen ◽  
Ryne C. Ramaker ◽  
...  

AbstractSummaryThe wide range of RNA-seq applications and their high computational needs require the development of pipelines orchestrating the entire workflow and optimizing usage of available computational resources. We present aRNApipe, a project-oriented pipeline for processing of RNA-seq data in high performance cluster environments. aRNApipe is highly modular and can be easily migrated to any high performance computing (HPC) environment. The current applications included in aRNApipe combine the essential RNA-seq primary analyses, including quality control metrics, transcript alignment, count generation, transcript fusion identification, alternative splicing, and sequence variant calling. aRNApipe is project-oriented and dynamic so users can easily update analyses to include or exclude samples or enable additional processing modules. Workflow parameters are easily set using a single configuration file that provides centralized tracking of all analytical processes. Finally, aRNApipe incorporates interactive web reports for sample tracking and a tool for managing the genome assemblies available to perform an analysis.Availability and documentationhttps://github.com/HudsonAlpha/aRNAPipe; DOI:10.5281/[email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Mohsen Hadianpour ◽  
Ehsan Rezayat ◽  
Mohammad-Reza Dehaqani

Abstract Due to the significantly drastic progress and improvement in neurophysiological recording technologies, neuroscientists have faced various complexities dealing with unstructured large-scale neural data. In the neuroscience community, these complexities could create serious bottlenecks in storing, sharing, and processing neural datasets. In this article, we developed a distributed high-performance computing (HPC) framework called `Big neuronal data framework' (BNDF), to overcome these complexities. BNDF is based on open-source big data frameworks, Hadoop and Spark providing a flexible and scalable structure. We examined BNDF on three different large-scale electrophysiological recording datasets from nonhuman primate’s brains. Our results exhibited faster runtimes with scalability due to the distributed nature of BNDF. We compared BNDF results to a widely used platform like MATLAB in an equitable computational resource. Compared with other similar methods, using BNDF provides more than five times faster performance in spike sorting as a usual neuroscience application.


2017 ◽  
Vol 33 (2) ◽  
pp. 119-130
Author(s):  
Vinh Van Le ◽  
Hoai Van Tran ◽  
Hieu Ngoc Duong ◽  
Giang Xuan Bui ◽  
Lang Van Tran

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.


Author(s):  
Adrian Jackson ◽  
Michèle Weiland

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.


Green computing is a contemporary research topic to address climate and energy challenges. In this chapter, the authors envision the duality of green computing with technological trends in other fields of computing such as High Performance Computing (HPC) and cloud computing on one hand and economy and business on the other hand. For instance, in order to provide electricity for large-scale cloud infrastructures and to reach exascale computing, we need huge amounts of energy. Thus, green computing is a challenge for the future of cloud computing and HPC. Alternatively, clouds and HPC provide solutions for green computing and climate change. In this chapter, the authors discuss this proposition by looking at the technology in detail.


Author(s):  
Atta ur Rehman Khan ◽  
Abdul Nasir Khan

Mobile devices are gaining high popularity due to support for a wide range of applications. However, the mobile devices are resource constrained and many applications require high resources. To cater to this issue, the researchers envision usage of mobile cloud computing technology which offers high performance computing, execution of resource intensive applications, and energy efficiency. This chapter highlights importance of mobile devices, high performance applications, and the computing challenges of mobile devices. It also provides a brief introduction to mobile cloud computing technology, its architecture, types of mobile applications, computation offloading process, effective offloading challenges, and high performance computing application on mobile devises that are enabled by mobile cloud computing technology.


2019 ◽  
Vol 3 (4) ◽  
pp. 902-904
Author(s):  
Alexander Peyser ◽  
Sandra Diaz Pier ◽  
Wouter Klijn ◽  
Abigail Morrison ◽  
Jochen Triesch

Large-scale in silico experimentation depends on the generation of connectomes beyond available anatomical structure. We suggest that linking research across the fields of experimental connectomics, theoretical neuroscience, and high-performance computing can enable a new generation of models bridging the gap between biophysical detail and global function. This Focus Feature on ”Linking Experimental and Computational Connectomics” aims to bring together some examples from these domains as a step toward the development of more comprehensive generative models of multiscale connectomes.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Emmanuel Imuetinyan Aghimien ◽  
Lerato Millicent Aghimien ◽  
Olutomilayo Olayemi Petinrin ◽  
Douglas Omoregie Aghimien

Purpose This paper aims to present the result of a scientometric analysis conducted using studies on high-performance computing in computational modelling. This was done with a view to showcasing the need for high-performance computers (HPC) within the architecture, engineering and construction (AEC) industry in developing countries, particularly in Africa, where the use of HPC in developing computational models (CMs) for effective problem solving is still low. Design/methodology/approach An interpretivism philosophical stance was adopted for the study which informed a scientometric review of existing studies gathered from the Scopus database. Keywords such as high-performance computing, and computational modelling were used to extract papers from the database. Visualisation of Similarities viewer (VOSviewer) was used to prepare co-occurrence maps based on the bibliographic data gathered. Findings Findings revealed the scarcity of research emanating from Africa in this area of study. Furthermore, past studies had placed focus on high-performance computing in the development of computational modelling and theory, parallel computing and improved visualisation, large-scale application software, computer simulations and computational mathematical modelling. Future studies can also explore areas such as cloud computing, optimisation, high-level programming language, natural science computing, computer graphics equipment and Graphics Processing Units as they relate to the AEC industry. Research limitations/implications The study assessed a single database for the search of related studies. Originality/value The findings of this study serve as an excellent theoretical background for AEC researchers seeking to explore the use of HPC for CMs development in the quest for solving complex problems in the industry.


Sign in / Sign up

Export Citation Format

Share Document