A proteomics sample metadata representation for multiomics integration and big data analysis

Chengxin Dai; Anja Füllgrabe; Julianus Pfeuffer; Elizaveta M. Solovyeva; Jingwen Deng; Pablo Moreno; Selvakumar Kamatchinathan; Deepti Jaiswal Kundu; Nancy George; Silvie Fexova; Björn Grüning; Melanie Christine Föll; Johannes Griss; Marc Vaudel; Enrique Audain; Marie Locard-Paulet; Michael Turewicz; Martin Eisenacher; Julian Uszkoreit; Tim Van Den Bossche; Veit Schwämmle; Henry Webel; Stefan Schulze; David Bouyssié; Savita Jayaram; Vinay Kumar Duggineni; Patroklos Samaras; Mathias Wilhelm; Meena Choi; Mingxun Wang; Oliver Kohlbacher; Alvis Brazma; Irene Papatheodorou; Nuno Bandeira; Eric W. Deutsch; Juan Antonio Vizcaíno; Mingze Bai; Timo Sachsenberg; Lev I. Levitsky; Yasset Perez-Riverol

doi:10.1038/s41467-021-26111-3

A proteomics sample metadata representation for multiomics integration and big data analysis

Nature Communications ◽

10.1038/s41467-021-26111-3 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chengxin Dai ◽

Anja Füllgrabe ◽

Julianus Pfeuffer ◽

Elizaveta M. Solovyeva ◽

Jingwen Deng ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Proteomics Data ◽

Data Format ◽

Standard Representation ◽

Related Information ◽

Transcriptomics Data ◽

Public Datasets

AbstractThe amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Download Full-text

Big Data Analysis Techniques for Visualization of Genomics in Medicinal Plants

Biotechnology ◽

10.4018/978-1-5225-8903-7.ch032 ◽

2019 ◽

pp. 804-837

Author(s):

Hithesh Kumar ◽

Vivek Chandramohan ◽

Smrithy M. Simon ◽

Rahul Yadav ◽

Shashi Kumar

Keyword(s):

Big Data ◽

Data Analysis ◽

Next Generation Sequencing ◽

Medicinal Plants ◽

Big Data Analysis ◽

Next Generation ◽

Genome Data ◽

Huge Data ◽

Transcriptomics Data ◽

Generation Sequencing

In this chapter, the complete overview and application of Big Data analysis in the field of health care industries, Clinical Informatics, Personalized Medicine and Bioinformatics is provided. The major tools and databases used for the Big Data analysis are discussed in this chapter. The development of sequencing machines has led to the fast and effective ways of generating DNA, RNA, Whole Genome data, Transcriptomics data, etc. available in our hands in just a matter of hours. The complete Next Generation Sequencing (NGS) huge data analysis work flow for the medicinal plants are discussed in the chapter. This chapter serves as an introduction to the big data analysis in Next Generation Sequencing and concludes with a summary of the topics of the remaining chapters of this book.

Download Full-text

Big Data Analysis Techniques for Visualization of Genomics in Medicinal Plants

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch026 ◽

2018 ◽

pp. 749-781

Author(s):

Hithesh Kumar ◽

Vivek Chandramohan ◽

Smrithy M. Simon ◽

Rahul Yadav ◽

Shashi Kumar

Keyword(s):

Big Data ◽

Data Analysis ◽

Next Generation Sequencing ◽

Medicinal Plants ◽

Big Data Analysis ◽

Next Generation ◽

Genome Data ◽

Huge Data ◽

Transcriptomics Data ◽

Generation Sequencing

Download Full-text

A proteomics sample metadata representation for multiomics integration, and big data analysis.

10.1101/2021.05.21.445143 ◽

2021 ◽

Author(s):

Chengxin Dai ◽

Anja Fullgrabe ◽

Julianus Pfeuffer ◽

Elizaveta Solovyeva ◽

Jingwen Deng ◽

...

Keyword(s):

Data Analysis ◽

Protein Interactions ◽

Ad Hoc ◽

Data File ◽

Protein Protein Interactions ◽

Proteomics Data ◽

Related Information ◽

File Formats ◽

Data Files ◽

Public Datasets

The amount of public proteomics data is increasing at an extraordinary rate. Hundreds of datasets are submitted each month to ProteomeXchange repositories, representing many types of proteomics studies, focusing on different aspects such as quantitative experiments, post-translational modifications, protein-protein interactions, or subcellular localization, among many others. For every proteomics dataset, two levels of data are captured: the dataset description, and the data files (encoded in different file formats). Whereas the dataset description and data file formats are supported by all ProteomeXchange partner repositories, there is no standardized format to properly describe the sample metadata and their relationship with the dataset files in a way that fully allows their understanding or re-analysis. It is left to the users choice whether to provide or not an ad hoc document containing this information. Therefore, in many cases, understanding the study design and data requires going back to the associated publication. This can be tedious and may be restricted in the case of non-open access publications. In many cases, this problem limits the generalization and reuse of public proteomics data. Here we present a standard representation for sample metadata tailored to proteomics datasets produced by the HUPO Proteomics Standards Initiative and supported by ProteomeXchange resources. We repurposed the existing data format MAGE-TAB used routinely in the transcriptomics field to represent and annotate proteomics datasets. MAGE-TAB-Proteomics defines a set of annotation rules that the datasets submitted to ProteomeXchange should follow, ranging from sample properties to data analysis protocols. We also introduce a crowdsourcing project that enabled the manual curation of over 200 public datasets using MAGE-TAB-Proteomics. In addition, we describe an ecosystem of tools and libraries that were developed to validate and submit sample metadata-related information to ProteomeXchange. We expect that these tools will improve the reproducibility of published results and facilitate the reanalysis and integration of public proteomics datasets.

Download Full-text

A Study on Effectiveness of a New Network Marketing Model with M-Code Compensation System Using Big Data Analysis

10.33645/cnc.2018.12.40.8.1015 ◽

2018 ◽

Vol 40 (8) ◽

pp. 1015-1042

Author(s):

Koono Kim ◽

Hyebong Choi

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Compensation System ◽

Network Marketing ◽

Marketing Model

Download Full-text

Insights into seismic hazard from big data analysis of ground motion simulations

International Journal of Safety and Security Engineering ◽

10.2495/safe-v9-n1-01-12 ◽

2019 ◽

Vol 9 (1) ◽

pp. 01-12 ◽

Cited By ~ 1

Author(s):

Kristy F. Tiampo ◽

Javad Kazemian ◽

Hadi Ghofrani ◽

Yelena Kropivnitskaya ◽

Gero Michel

Keyword(s):

Big Data ◽

Data Analysis ◽

Seismic Hazard ◽

Ground Motion ◽

Big Data Analysis ◽

Ground Motion Simulations

Download Full-text

A Study on Social Cognitions of ‘Chinese Education’ in Korea Using Big Data Analysis

The Journal of Chinese Language and Literature ◽

10.25021/jcll.2019.6.116.83 ◽

2019 ◽

Vol 116 ◽

pp. 83-112

Author(s):

Eun-jae Choi

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Social Cognitions ◽

Chinese Education

Download Full-text

AN EFFICIENT DEDUPLICATION MECHANISM FOR BIG DATA ANALYSIS IN CLOUD ENVIRONMENTS

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i4.389395 ◽

2018 ◽

Vol 6 (4) ◽

pp. 389-395

Author(s):

M.Murugesan . ◽

◽

A. Kalaiyarasi

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Cloud Environments

Download Full-text

A Social Big Data Analysis on Sport Participation

Korean Journal of Sport Management ◽

10.31308/kssm.25.2.2 ◽

2020 ◽

Vol 25 (2) ◽

pp. 18-30

Author(s):

Seung Wook Oh ◽

Jin-Wook Han ◽

Min Soo Kim

Keyword(s):

Big Data ◽

Data Analysis ◽

Sport Participation ◽

Big Data Analysis ◽

Social Big Data

Download Full-text

Bipolar Disorder and Oxidative Stress Injury Mechanism - Clinical Big Data Analysis Based on Machine Learning

Case Medical Research ◽

10.31525/ct1-nct03949218 ◽

2019 ◽

Author(s):

Keyword(s):

Oxidative Stress ◽

Machine Learning ◽

Bipolar Disorder ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Injury Mechanism ◽

Stress Injury ◽

Oxidative Stress Injury ◽

And Oxidative Stress

Download Full-text

A Study on Perception of Golf Lesson Using Big Data Analysis

Journal of Golf Studies ◽

10.34283/ksgs.2020.14.1.13 ◽

2020 ◽

Vol 14 (1) ◽

pp. 151-163

Author(s):

Joon-Seo Choi ◽

◽

Su-in Park

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis

Download Full-text