scholarly journals RNA-combine: a toolkit for comprehensive analyses on transcriptome data from different sequencing platforms

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Xuemin Dong ◽  
Shanshan Dong ◽  
Shengkai Pan ◽  
Xiangjiang Zhan

Abstract Background Understanding the transcriptome has become an essential step towards the full interpretation of the biological function of a cell, a tissue or even an organ. Many tools are available for either processing, analysing transcriptome data, or visualizing analysis results. However, most existing tools are limited to data from a single sequencing platform and only several of them could handle more than one analysis module, which are far from enough to meet the requirements of users, especially those without advanced programming skills. Hence, we still lack an open-source toolkit that enables both bioinformatician and non-bioinformatician users to process and analyze the large transcriptome data from different sequencing platforms and visualize the results. Results We present a Linux-based toolkit, RNA-combine, to automatically perform the quality assessment, downstream analysis of the transcriptome data generated from different sequencing platforms, including bulk RNA-seq (Illumina platform), single cell RNA-seq (10x Genomics) and Iso-Seq (PacBio) and visualization of the results. Besides, this toolkit is implemented with at least 10 analysis modules more than other toolkits examined in this study. Source codes of RNA-combine are available on GitHub: https://github.com/dongxuemin666/RNA-combine. Conclusion Our results suggest that RNA-combine is a reliable tool for transcriptome data processing and result interpretation for both bioinformaticians and non-bioinformaticians.

2016 ◽  
Vol 14 (05) ◽  
pp. 1650027 ◽  
Author(s):  
Ashis Kumer Biswas ◽  
Jean X. Gao

RNA-seq, the next generation sequencing platform, enables researchers to explore deep into the transcriptome of organisms, such as identifying functional non-coding RNAs (ncRNAs), and quantify their expressions on tissues. The functions of ncRNAs are mostly related to their secondary structures. Thus by exploring the clustering in terms of structural profiles of the corresponding read-segments would be essential and this fuels in our motivation behind this research. In this manuscript we proposed PR2S2Clust, Patched RNA-seq Read Segments’ Structure-oriented Clustering, which is an analysis platform to extract features to prepare the secondary structure profiles of the RNA-seq read segments. It provides a strategy to employ the profiles to annotate the segments into ncRNA classes using several clustering strategies. The system considers seven pairwise structural distance metrics by considering short-read mappings onto each structure, which we term as the “patched structure” while clustering the segments. In this regard, we show applications of both classical and ensemble clusterings of the partitional and hierarchical variations. Extensive real-world experiments over three publicly available RNA-seq datasets and a comparative analysis over four competitive systems confirm the effectiveness and superiority of the proposed system. The source codes and dataset of PR2S2Clust are available at the http://biomecis.uta.edu/~ashis/res/PR2S2Clust-suppl/ .


2021 ◽  
Author(s):  
Robert A Player ◽  
Angeline M Aguinaldo ◽  
Brian B Merritt ◽  
Lisa N Maszkiewicz ◽  
Oluwaferanmi E Adeyemo ◽  
...  

A major challenge in the field of metagenomics is the selection of the correct combination of sequencing platform and downstream metagenomic analysis algorithm, or classifier. Here, we present the Metagenomic Evaluation Tool Analyzer (META), which produces simulated data and facilitates platform and algorithm selection for any given metagenomic use case. META-generated in silico read data are modular, scalable, and reflect user-defined community profiles, while the downstream analysis is done using a variety of metagenomic classifiers. Reported results include information on resource utilization, time-to-answer, and performance. Real-world data can also be analyzed using selected classifiers and results benchmarked against simulations. To test the utility of the META software, simulated data was compared to real-world viral and bacterial metagenomic samples run on four different sequencers and analyzed using 12 metagenomic classifiers. Lastly, we introduce META Score: a unified, quantitative value which rates an analytic classifiers' ability to both identify and count taxa in a representative sample.


Genes ◽  
2019 ◽  
Vol 10 (1) ◽  
pp. 35 ◽  
Author(s):  
Yuri Motorin ◽  
Mark Helm

New analytics of post-transcriptional RNA modifications have paved the way for a tremendous upswing of the biological and biomedical research in this field. This especially applies to methods that included RNA-Seq techniques, and which typically result in what is termed global scale modification mapping. In this process, positions inside a cell`s transcriptome are receiving a status of potential modification sites (so called modification calling), typically based on a score of some kind that issues from the particular method applied. The resulting data are thought to represent information that goes beyond what is contained in typical transcriptome data, and hence the field has taken to use the term “epitranscriptome”. Due to the high rate of newly published mapping techniques, a significant number of chemically distinct RNA modifications have become amenable to mapping, albeit with variegated accuracy and precision, depending on the nature of the technique. This review gives a brief overview of known techniques, and how they were applied to modification calling.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zefang Sun ◽  
Jia Tan ◽  
Minqiong Zhao ◽  
Qiyao Peng ◽  
Mingqing Zhou ◽  
...  

AbstracttRNAs and tRNA-derived RNA fragments (tRFs) play various roles in many cellular processes outside of protein synthesis. However, comprehensive investigations of tRNA/tRF regulation are rare. In this study, we used new algorithms to extensively analyze the publicly available data from 1332 ChIP-Seq and 42 small-RNA-Seq experiments in human cell lines and tissues to investigate the transcriptional and posttranscriptional regulatory mechanisms of tRNAs. We found that histone acetylation, cAMP, and pluripotency pathways play important roles in the regulation of the tRNA gene transcription in a cell-specific manner. Analysis of RNA-Seq data identified 950 high-confidence tRFs, and the results suggested that tRNA pools are dramatically distinct across the samples in terms of expression profiles and tRF composition. The mismatch analysis identified new potential modification sites and specific modification patterns in tRNA families. The results also show that RNA library preparation technologies have a considerable impact on tRNA profiling and need to be optimized in the future.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.


2021 ◽  
Vol 49 (4) ◽  
pp. 1779-1790 ◽  
Author(s):  
Lorenzo Ceccarelli ◽  
Chiara Giacomelli ◽  
Laura Marchetti ◽  
Claudia Martini

Extracellular vesicles (EVs) are a heterogeneous family of cell-derived lipid bounded vesicles comprising exosomes and microvesicles. They are potentially produced by all types of cells and are used as a cell-to-cell communication method that allows protein, lipid, and genetic material exchange. Microglia cells produce a large number of EVs both in resting and activated conditions, in the latter case changing their production and related biological effects. Several actions of microglia in the central nervous system are ascribed to EVs, but the molecular mechanisms by which each effect occurs are still largely unknown. Conflicting functions have been ascribed to microglia-derived EVs starting from the neuronal support and ending with the propagation of inflammation and neurodegeneration, confirming the crucial role of these organelles in tuning brain homeostasis. Despite the increasing number of studies reported on microglia-EVs, there is also a lot of fragmentation in the knowledge on the mechanism at the basis of their production and modification of their cargo. In this review, a collection of literature data about the surface and cargo proteins and lipids as well as the miRNA content of EVs produced by microglial cells has been reported. A special highlight was given to the works in which the EV molecular composition is linked to a precise biological function.


Viruses ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2093
Author(s):  
Shen-Yuan Hsieh ◽  
Mohammad A. Tariq ◽  
Andrea Telatin ◽  
Rebecca Ansorge ◽  
Evelien M. Adriaenssens ◽  
...  

The human intestinal microbiota is abundant in viruses, comprising mainly bacteriophages, occasionally outnumbering bacteria 10:1 and is termed the virome. Due to their high genetic diversity and the lack of suitable tools and reference databases, the virome remains poorly characterised and is often referred to as “viral dark matter”. However, the choice of sequencing platforms, read lengths and library preparation make study design challenging with respect to the virome. Here we have compared the use of PCR and PCR-free methods for sequence-library construction on the Illumina sequencing platform for characterising the human faecal virome. Viral DNA was extracted from faecal samples of three healthy donors and sequenced. Our analysis shows that most variation was reflecting the individually specific faecal virome. However, we observed differences between PCR and PCR-free library preparation that affected the recovery of low-abundance viral genomes. Using three faecal samples in this study, the PCR library preparation samples led to a loss of lower-abundance vOTUs evident in their PCR-free pairs (vOTUs 128, 6202 and 8364) and decreased the alpha-diversity indices (Chao1 p-value = 0.045 and Simpson p-value = 0.044). Thus, differences between PCR and PCR-free methods are important to consider when investigating “rare” members of the gut virome, with these biases likely negligible when investigating moderately and highly abundant viruses.


2020 ◽  
Author(s):  
Snehalika Lall ◽  
Abhik Ghosh ◽  
Sumanta Ray ◽  
Sanghamitra Bandyopadhyay

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF


2021 ◽  
Author(s):  
Saket Choudhary ◽  
Rahul Satija

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate. Here, we analyze 58 scRNA-seq datasets that span a wide range of technologies, systems, and sequencing depths in order to evaluate the performance of different error models. We find that while a Poisson error model appears appropriate for sparse datasets, we observe clear evidence of overdispersion for genes with sufficient sequencing depth in all biological systems, necessitating the use of a negative binomial model. Moreover, we find that the degree of overdispersion varies widely across datasets, systems, and gene abundances, and argues for a data-driven approach for parameter estimation. Based on these analyses, we provide a set of recommendations for modeling variation in scRNA-seq data, particularly when using generalized linear models or likelihood-based approaches for preprocessing and downstream analysis.


Sign in / Sign up

Export Citation Format

Share Document