MetaMap: An atlas of metatranscriptomic reads in human disease-related RNA-seq data

AbstractBackgroundWith the advent of the age of big data in bioinformatics, large volumes of data and high performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts, but its generic nature also enables the detection of microbial and viral transcripts.FindingsWe developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from 6 independent controlled infection experiments of cell line models and comparison with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems. The resulting data of this large-scale re-analysis are made available in the presented MetaMap resource.ConclusionsOur results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation towards the role of the microbiome in human disease.

Download Full-text

aRNApipe: A balanced, efficient and distributed pipeline for processing RNA-seq data in high performance computing environments

10.1101/060277 ◽

2016 ◽

Cited By ~ 2

Author(s):

Arnald Alonso ◽

Brittany N. Lasseigne ◽

Kelly Williams ◽

Josh Nielsen ◽

Ryne C. Ramaker ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Variant Calling ◽

Supplementary Information ◽

Sequence Variant ◽

Rna Seq ◽

Wide Range ◽

Additional Processing ◽

Computational Resources ◽

Performance Computing

AbstractSummaryThe wide range of RNA-seq applications and their high computational needs require the development of pipelines orchestrating the entire workflow and optimizing usage of available computational resources. We present aRNApipe, a project-oriented pipeline for processing of RNA-seq data in high performance cluster environments. aRNApipe is highly modular and can be easily migrated to any high performance computing (HPC) environment. The current applications included in aRNApipe combine the essential RNA-seq primary analyses, including quality control metrics, transcript alignment, count generation, transcript fusion identification, alternative splicing, and sequence variant calling. aRNApipe is project-oriented and dynamic so users can easily update analyses to include or exclude samples or enable additional processing modules. Workflow parameters are easily set using a single configuration file that provides centralized tracking of all analytical processes. Finally, aRNApipe incorporates interactive web reports for sample tracking and a tool for managing the genome assemblies available to perform an analysis.Availability and documentationhttps://github.com/HudsonAlpha/aRNAPipe; DOI:10.5281/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

High-Performance Computing Framework Based on Distributed Systems for Large-Scale Neurophysiological Data

10.21203/rs.3.rs-136986/v1 ◽

2021 ◽

Author(s):

Mohsen Hadianpour ◽

Ehsan Rezayat ◽

Mohammad-Reza Dehaqani

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Electrophysiological Recording ◽

Neural Data ◽

Data Framework ◽

Neurophysiological Data ◽

Computing Framework ◽

Performance Computing ◽

Neuroscience Community

Abstract Due to the significantly drastic progress and improvement in neurophysiological recording technologies, neuroscientists have faced various complexities dealing with unstructured large-scale neural data. In the neuroscience community, these complexities could create serious bottlenecks in storing, sharing, and processing neural datasets. In this article, we developed a distributed high-performance computing (HPC) framework called `Big neuronal data framework' (BNDF), to overcome these complexities. BNDF is based on open-source big data frameworks, Hadoop and Spark providing a flexible and scalable structure. We examined BNDF on three different large-scale electrophysiological recording datasets from nonhuman primate’s brains. Our results exhibited faster runtimes with scalability due to the distributed nature of BNDF. We compared BNDF results to a widely used platform like MATLAB in an equitable computational resource. Compared with other similar methods, using BNDF provides more than five times faster performance in spike sorting as a usual neuroscience application.

Download Full-text

Measuring and tuning energy efficiency on large scale high performance computing platforms.

10.2172/1035312 ◽

2011 ◽

Cited By ~ 1

Author(s):

James H., III Laros

Keyword(s):

Energy Efficiency ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Computing Platforms ◽

Performance Computing

Download Full-text

Taxonomic assignment for large-scale metagenomic data on high-perfomance systems

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/33/2/10753 ◽

2017 ◽

Vol 33 (2) ◽

pp. 119-130

Author(s):

Vinh Van Le ◽

Hoai Van Tran ◽

Hieu Ngoc Duong ◽

Giang Xuan Bui ◽

Lang Van Tran

Keyword(s):

High Performance Computing ◽

Assignment Problem ◽

High Performance ◽

Large Scale ◽

Computing System ◽

Metagenomic Data ◽

Taxonomic Assignment ◽

High Performance Computing System ◽

Powerful Approach ◽

Performance Computing

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.

Download Full-text

Cloud Computing for Scientific Simulation and High Performance Computing

Principles, Methodologies, and Service-Oriented Approaches for Cloud Computing ◽

10.4018/978-1-4666-2854-0.ch003 ◽

2013 ◽

pp. 51-70

Author(s):

Adrian Jackson ◽

Michèle Weiland

Keyword(s):

Cloud Computing ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Parallel Programs ◽

Small Scale ◽

Cloud Infrastructure ◽

Scientific Simulations ◽

Cloud Infrastructures ◽

Performance Computing

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.

Download Full-text

Green Computing

Pervasive Cloud Computing Technologies - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-4666-4683-4.ch012 ◽

2014 ◽

pp. 248-260

Keyword(s):

Climate Change ◽

Cloud Computing ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Green Computing ◽

Research Topic ◽

The Other ◽

Cloud Infrastructures ◽

Performance Computing

Green computing is a contemporary research topic to address climate and energy challenges. In this chapter, the authors envision the duality of green computing with technological trends in other fields of computing such as High Performance Computing (HPC) and cloud computing on one hand and economy and business on the other hand. For instance, in order to provide electricity for large-scale cloud infrastructures and to reach exascale computing, we need huge amounts of energy. Thus, green computing is a challenge for the future of cloud computing and HPC. Alternatively, clouds and HPC provide solutions for green computing and climate change. In this chapter, the authors discuss this proposition by looking at the technology in detail.

Download Full-text

High Performance Computing on Mobile Devices

Innovative Research and Applications in Next-Generation High Performance Computing - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-5225-0287-6.ch013 ◽

2016 ◽

pp. 334-348

Author(s):

Atta ur Rehman Khan ◽

Abdul Nasir Khan

Keyword(s):

Cloud Computing ◽

High Performance Computing ◽

Mobile Devices ◽

High Performance ◽

Mobile Cloud Computing ◽

Computation Offloading ◽

Mobile Cloud ◽

Computing Technology ◽

Wide Range ◽

Performance Computing

Mobile devices are gaining high popularity due to support for a wide range of applications. However, the mobile devices are resource constrained and many applications require high resources. To cater to this issue, the researchers envision usage of mobile cloud computing technology which offers high performance computing, execution of resource intensive applications, and energy efficiency. This chapter highlights importance of mobile devices, high performance applications, and the computing challenges of mobile devices. It also provides a brief introduction to mobile cloud computing technology, its architecture, types of mobile applications, computation offloading process, effective offloading challenges, and high performance computing application on mobile devises that are enabled by mobile cloud computing technology.

Download Full-text

Editorial: Linking experimental and computational connectomics

Network Neuroscience ◽

10.1162/netn_e_00108 ◽

2019 ◽

Vol 3 (4) ◽

pp. 902-904

Author(s):

Alexander Peyser ◽

Sandra Diaz Pier ◽

Wouter Klijn ◽

Abigail Morrison ◽

Jochen Triesch

Keyword(s):

High Performance Computing ◽

In Silico ◽

High Performance ◽

Large Scale ◽

Generative Models ◽

Anatomical Structure ◽

Global Function ◽

Theoretical Neuroscience ◽

New Generation ◽

Performance Computing

Large-scale in silico experimentation depends on the generation of connectomes beyond available anatomical structure. We suggest that linking research across the fields of experimental connectomics, theoretical neuroscience, and high-performance computing can enable a new generation of models bridging the gap between biophysical detail and global function. This Focus Feature on ”Linking Experimental and Computational Connectomics” aims to bring together some examples from these domains as a step toward the development of more comprehensive generative models of multiscale connectomes.

Download Full-text

High-performance computing for computational modelling in built environment-related studies – a scientometric review

Journal of Engineering Design and Technology ◽

10.1108/jedt-07-2020-0294 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Emmanuel Imuetinyan Aghimien ◽

Lerato Millicent Aghimien ◽

Olutomilayo Olayemi Petinrin ◽

Douglas Omoregie Aghimien

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Computational Models ◽

Computational Modelling ◽

Scientometric Analysis ◽

Content Type ◽

Performance Computing ◽

Aec Industry

Purpose This paper aims to present the result of a scientometric analysis conducted using studies on high-performance computing in computational modelling. This was done with a view to showcasing the need for high-performance computers (HPC) within the architecture, engineering and construction (AEC) industry in developing countries, particularly in Africa, where the use of HPC in developing computational models (CMs) for effective problem solving is still low. Design/methodology/approach An interpretivism philosophical stance was adopted for the study which informed a scientometric review of existing studies gathered from the Scopus database. Keywords such as high-performance computing, and computational modelling were used to extract papers from the database. Visualisation of Similarities viewer (VOSviewer) was used to prepare co-occurrence maps based on the bibliographic data gathered. Findings Findings revealed the scarcity of research emanating from Africa in this area of study. Furthermore, past studies had placed focus on high-performance computing in the development of computational modelling and theory, parallel computing and improved visualisation, large-scale application software, computer simulations and computational mathematical modelling. Future studies can also explore areas such as cloud computing, optimisation, high-level programming language, natural science computing, computer graphics equipment and Graphics Processing Units as they relate to the AEC industry. Research limitations/implications The study assessed a single database for the search of related studies. Originality/value The findings of this study serve as an excellent theoretical background for AEC researchers seeking to explore the use of HPC for CMs development in the quest for solving complex problems in the industry.

Download Full-text