scholarly journals A proteomics sample metadata representation for multiomics integration and big data analysis

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chengxin Dai ◽  
Anja Füllgrabe ◽  
Julianus Pfeuffer ◽  
Elizaveta M. Solovyeva ◽  
Jingwen Deng ◽  
...  

AbstractThe amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Biotechnology ◽  
2019 ◽  
pp. 804-837
Author(s):  
Hithesh Kumar ◽  
Vivek Chandramohan ◽  
Smrithy M. Simon ◽  
Rahul Yadav ◽  
Shashi Kumar

In this chapter, the complete overview and application of Big Data analysis in the field of health care industries, Clinical Informatics, Personalized Medicine and Bioinformatics is provided. The major tools and databases used for the Big Data analysis are discussed in this chapter. The development of sequencing machines has led to the fast and effective ways of generating DNA, RNA, Whole Genome data, Transcriptomics data, etc. available in our hands in just a matter of hours. The complete Next Generation Sequencing (NGS) huge data analysis work flow for the medicinal plants are discussed in the chapter. This chapter serves as an introduction to the big data analysis in Next Generation Sequencing and concludes with a summary of the topics of the remaining chapters of this book.


Author(s):  
Hithesh Kumar ◽  
Vivek Chandramohan ◽  
Smrithy M. Simon ◽  
Rahul Yadav ◽  
Shashi Kumar

In this chapter, the complete overview and application of Big Data analysis in the field of health care industries, Clinical Informatics, Personalized Medicine and Bioinformatics is provided. The major tools and databases used for the Big Data analysis are discussed in this chapter. The development of sequencing machines has led to the fast and effective ways of generating DNA, RNA, Whole Genome data, Transcriptomics data, etc. available in our hands in just a matter of hours. The complete Next Generation Sequencing (NGS) huge data analysis work flow for the medicinal plants are discussed in the chapter. This chapter serves as an introduction to the big data analysis in Next Generation Sequencing and concludes with a summary of the topics of the remaining chapters of this book.


2021 ◽  
Author(s):  
Chengxin Dai ◽  
Anja Fullgrabe ◽  
Julianus Pfeuffer ◽  
Elizaveta Solovyeva ◽  
Jingwen Deng ◽  
...  

The amount of public proteomics data is increasing at an extraordinary rate. Hundreds of datasets are submitted each month to ProteomeXchange repositories, representing many types of proteomics studies, focusing on different aspects such as quantitative experiments, post-translational modifications, protein-protein interactions, or subcellular localization, among many others. For every proteomics dataset, two levels of data are captured: the dataset description, and the data files (encoded in different file formats). Whereas the dataset description and data file formats are supported by all ProteomeXchange partner repositories, there is no standardized format to properly describe the sample metadata and their relationship with the dataset files in a way that fully allows their understanding or re-analysis. It is left to the users choice whether to provide or not an ad hoc document containing this information. Therefore, in many cases, understanding the study design and data requires going back to the associated publication. This can be tedious and may be restricted in the case of non-open access publications. In many cases, this problem limits the generalization and reuse of public proteomics data. Here we present a standard representation for sample metadata tailored to proteomics datasets produced by the HUPO Proteomics Standards Initiative and supported by ProteomeXchange resources. We repurposed the existing data format MAGE-TAB used routinely in the transcriptomics field to represent and annotate proteomics datasets. MAGE-TAB-Proteomics defines a set of annotation rules that the datasets submitted to ProteomeXchange should follow, ranging from sample properties to data analysis protocols. We also introduce a crowdsourcing project that enabled the manual curation of over 200 public datasets using MAGE-TAB-Proteomics. In addition, we describe an ecosystem of tools and libraries that were developed to validate and submit sample metadata-related information to ProteomeXchange. We expect that these tools will improve the reproducibility of published results and facilitate the reanalysis and integration of public proteomics datasets.


2019 ◽  
Vol 9 (1) ◽  
pp. 01-12 ◽  
Author(s):  
Kristy F. Tiampo ◽  
Javad Kazemian ◽  
Hadi Ghofrani ◽  
Yelena Kropivnitskaya ◽  
Gero Michel

2020 ◽  
Vol 25 (2) ◽  
pp. 18-30
Author(s):  
Seung Wook Oh ◽  
Jin-Wook Han ◽  
Min Soo Kim

2020 ◽  
Vol 14 (1) ◽  
pp. 151-163
Author(s):  
Joon-Seo Choi ◽  
◽  
Su-in Park

Sign in / Sign up

Export Citation Format

Share Document