A consensus-based ensemble approach to improve de novo transcriptome assembly

Mapping Intimacies ◽

10.1101/2020.06.08.139964 ◽

2020 ◽

Cited By ~ 1

Author(s):

Adam Voshall ◽

Sairam Behera ◽

Xiangjun Li ◽

Xiao-Hong Yu ◽

Kushagra Kapil ◽

...

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Reference Genome ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Pathway Reconstruction ◽

Rnaseq Data ◽

Metabolic Pathway Reconstruction ◽

Assembly Performance

AbstractSystems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative splicing events could exacerbate such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. In this study, we provide a pipeline to generate a set of the benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including genome-guided, de novo, and ensemble methods. The results showed that the assembly performance deteriorates significantly when the reference is not available from the same genome (for genome-guided methods) or when alternative transcripts (isoforms) exist. We demonstrated the value of consensus between de novo assemblers in transcriptome assembly. Leveraging the overlapping predictions between the four de novo assemblers, we further present ConSemble, a consensus-based de novo ensemble transcriptome assembly pipeline. Without using a reference genome, ConSemble achieved an accuracy up to twice as high as any de novo assemblers we compared. It matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the ConSemble pipeline are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/.Author summaryObtaining the accurate representation of the gene expression is critical in many analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction. The state of the art high-throughput RNA-sequencing (RNAseq) technologies can be used to sequence the set of all transcripts in a cell, the transcriptome. Although many computational tools are available for transcriptome assembly from RNAseq data, assembling high-quality transcriptomes is difficult especially for non-model organisms. Different methods often produce different transcriptome models and there is no easy way to determine which are more accurate. In this study, we present an approach to evaluate transcriptome assembly performance using simulated benchmarking read sets. The results showed that the assembly performance of genome-guided assembly methods deteriorates significantly when the adequate reference genome is not available. The assembly performance of all methods is affected when alternative transcripts (isoforms) exist. We further demonstrated the value of consensus among assemblers in improving transcriptome assembly. Leveraging the overlapping predictions between the four de novo assemblers, we present ConSemble. Without using a reference genome, ConSemble achieved a much higher accuracy than any de novo assemblers we compared. It matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms.

Download Full-text

A consensus-based ensemble approach to improve transcriptome assembly

BMC Bioinformatics ◽

10.1186/s12859-021-04434-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Adam Voshall ◽

Sairam Behera ◽

Xiangjun Li ◽

Xiao-Hong Yu ◽

Kushagra Kapil ◽

...

Keyword(s):

Expression Analysis ◽

Reference Genome ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Individual Genome ◽

Minimal Impact ◽

Pathway Reconstruction ◽

Rnaseq Data ◽

Assembly Performance

Abstract Background Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. Results In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. Conclusions Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/.

Download Full-text

Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment

10.1101/2021.04.23.441097 ◽

2021 ◽

Author(s):

Anish M.S. Shrestha ◽

Joyce Emlyn B. Guiao ◽

Kyle Christian R. Santiago

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

De Novo ◽

Transcriptome Assembly ◽

Differential Expression Analysis ◽

Homology Search ◽

Model Organisms ◽

Rna Seq ◽

Protein Database

AbstractRNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. Conventional differential expression analysis for organisms without reference sequences requires performing computationally expensive and error-prone de-novo transcriptome assembly, followed by homology search against a high-confidence protein database for functional annotation. We propose a shortcut, where we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the protein database. Through experiments on simulated and real data, we show drastic reductions in run-time and memory usage, with no loss in accuracy. A Snakemake implementation of our workflow is available at:https://bitbucket.org/project_samar/samar

Download Full-text

The brain transcriptome of the wolf spider, Schizocosa ocreata

BMC Research Notes ◽

10.1186/s13104-021-05648-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Daniel Stribling ◽

Peter L. Chang ◽

Justin E. Dalton ◽

Christopher A. Conow ◽

Malcolm Rosenthal ◽

...

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Wolf Spiders ◽

Schizocosa Ocreata ◽

Genomic Studies ◽

The Brain

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.

Download Full-text

De novo Transcriptome Assembly and Dynamic Spatial Gene Expression Analysis in Red Clover

The Plant Genome ◽

10.3835/plantgenome2015.06.0048 ◽

2016 ◽

Vol 9 (2) ◽

Cited By ~ 9

Author(s):

Manohar Chakrabarti ◽

Randy D. Dinkins ◽

Arthur G. Hunt

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

De Novo ◽

Transcriptome Assembly ◽

Red Clover ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome

Download Full-text

De novo transcriptome assembly, functional annotation and differential gene expression analysis of juvenile and adult E. fetida, a model oligochaete used in ecotoxicological studies

Biological Research ◽

10.1186/s40659-017-0114-y ◽

2017 ◽

Vol 50 (1) ◽

Cited By ~ 7

Author(s):

Michelle Thunders ◽

Jo Cavanagh ◽

Yinsheng Li

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Functional Annotation ◽

Gene Expression Analysis ◽

De Novo ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Differential Gene Expression Analysis ◽

Differential Gene

Download Full-text

Gray whale transcriptome reveals longevity adaptations associated with DNA repair and ubiquitination

10.1101/754218 ◽

2019 ◽

Author(s):

Dmitri Toren ◽

Anton Kulaga ◽

Mineshbhai Jethva ◽

Eitan Rubin ◽

Anastasia V Snezhkina ◽

...

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Expression Patterns ◽

Model Organisms ◽

Gray Whale ◽

Aging Research ◽

Arctic Water ◽

Gray Whales ◽

Longevity Genes

AbstractOne important question in aging research is how differences in genomics and transcriptomics determine the maximum lifespan in various species. Despite recent progress, much is still unclear on the topic, partly due to the lack of samples in non-model organisms and due to challenges in direct comparisons of transcriptomes from different species. The novel ranking-based method that we employ here is used to analyze gene expression in the gray whale and compare its de novo assembled transcriptome with that of other long- and short-lived mammals. Gray whales are among the top 1% longest-lived mammals. Despite the extreme environment, or maybe due to a remarkable adaptation to its habitat (intermittent hypoxia, Arctic water, and high pressure), gray whales reach at least the age of 77 years. In this work, we show that long-lived mammals share common gene expression patterns between themselves, including high expression of DNA maintenance and repair, ubiquitination, apoptosis, and immune responses. Additionally, the level of expression for gray whale orthologs of pro- and anti-longevity genes found in model organisms is in support of their alleged role and direction in lifespan determination. Remarkably, among highly expressed pro-longevity genes many are stress-related, reflecting an adaptation to extreme environmental conditions. The conducted analysis suggests that the gray whale potentially possesses high resistance to cancer and stress, at least in part ensuring its longevity. This new transcriptome assembly also provides important resources to support the efforts of maintaining the endangered population of gray whales.

Download Full-text

The neurotranscriptome of the Aedes aegypti mosquito

10.1101/026823 ◽

2015 ◽

Author(s):

Benjamin J Matthews ◽

Carolyn S McBride ◽

Matthew DeGennaro ◽

Orion Despo ◽

Leslie B Vosshall

Keyword(s):

Gene Expression ◽

Aedes Aegypti ◽

Vector Control ◽

De Novo ◽

Transcriptome Assembly ◽

Molecular Genetic ◽

Blood Feeding ◽

Model Organisms ◽

Protein Coding ◽

Male And Female

Background A complete genome sequence and the advent of genome editing open up non-traditional model organisms to mechanistic genetic studies. The mosquito Aedes aegypti is an important vector of infectious diseases such as dengue, chikungunya, and yellow fever, and has a large and complex genome, which has slowed annotation efforts. We used comprehensive transcriptomic analysis of adult gene expression to improve the genome annotation and to provide a detailed tissue-specific catalogue of neural gene expression at different adult behavioral states. Results We carried out deep RNA sequencing across all major peripheral male and female sensory tissues, the brain, and (female) ovary. Furthermore, we examined gene expression across three important phases of the female reproductive cycle, a remarkable example of behavioral switching in which a female mosquito alternates between obtaining blood-meals from humans and laying eggs. Using genome-guided alignments and de novo transcriptome assembly, our re-annotation includes 572 new putative protein-coding genes and updates to 13.5% and 50.3% of existing transcripts within coding sequences and untranslated regions, respectively. Using this updated annotation, we detail gene expression in each tissue, identifying large numbers of transcripts regulated by blood-feeding and sexually dimorphic transcripts that may provide clues to the biology of male- and female-specific behaviors, such as mating and blood-feeding, which are areas of intensive study for those interested in vector control. Conclusions This neurotranscriptome forms a strong foundation for the study of genes in the mosquito nervous system and investigation of sensory-driven behaviors and their regulation. Furthermore, understanding the molecular genetic basis of mosquito chemosensory behavior has important implications for vector control.

Download Full-text

Characterization of a de novo assembled transcriptome of the Common Blackbird (Turdus merula)

PeerJ ◽

10.7717/peerj.4045 ◽

2017 ◽

Vol 5 ◽

pp. e4045 ◽

Cited By ~ 2

Author(s):

Sven Koglin ◽

Daronja Trense ◽

Michael Wink ◽

Hedwig Sauer-Gürth ◽

Dieter Thomas Tietze

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Reference Genome ◽

De Novo ◽

Next Generation ◽

Model Species ◽

Bioinformatics Tools ◽

The Common ◽

Common Blackbird

Background In recent years, next generation high throughput sequencing technologies have proven to be useful tools for investigations concerning the genomics or transcriptomics also of non-model species. Consequently, ornithologists have adopted these technologies and the respective bioinformatics tools to survey the genomes and transcriptomes of a few avian non-model species. The Common Blackbird is one of the most common bird species living in European cities, which has successfully colonized urban areas and for which no reference genome or transcriptome is publicly available. However, to target questions like genome wide gene expression analysis, a reference genome or transcriptome is needed. Methods Therefore, in this study two Common Blackbirds were sacrificed, their mRNA was isolated and analyzed by RNA-Seq to de novo assemble a transcriptome and characterize it. Illumina reads (125 bp paired-end) and a Velvet/Oases pipeline led to 162,158 transcripts. For the annotation (using Blast+), an unfiltered protein database was used. SNPs were identified using SAMtools and BCFtools. Furthermore, mRNA from three single tissues (brain, heart and liver) of the same two Common Blackbirds were sequenced by Illumina (75 bp single-end reads). The draft transcriptome and the three single tissues were compared by their BLAST hits with the package VennDiagram in R. Results Following the annotation against protein databases, we found evidence for 15,580 genes in the transcriptome (all well characterized hits after annotation). On 18% of the assembled transcripts, 144,742 SNPs were identified which are, consequently, 0.09% of all nucleotides in the assembled transcriptome. In the transcriptome and in the single tissues (brain, heart and liver), 10,182 shared genes were found. Discussion Using a next-generation technology and bioinformatics tools, we made a first step towards the genomic investigation of the Common Blackbird. The de novo assembled transcriptome is usable for downstream analyses such as differential gene expression analysis and SNP identification. This study shows the importance of the approach to sequence single tissues to understand functions of tissues, proteins and the phenotype.

Download Full-text

SuperTranscript: a data driven reference for analysis and visualisation of transcriptomes

10.1101/077750 ◽

2016 ◽

Cited By ~ 3

Author(s):

Nadia M Davidson ◽

Anthony DK Hawkins ◽

Alicia Oshlack

Keyword(s):

Reference Genome ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Sequencing Data ◽

Expressed Sequence ◽

Model Sequencing ◽

Reference Genomes ◽

Exonic Sequence ◽

Single Sequence

AbstractNumerous methods have been developed to analyse RNA sequencing data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. De novo transcriptome assembly can build a reference transcriptome from the non-model sequencing data, but falls short of allowing most tools to be applied. Here we present superTranscripts, a simple but powerful solution to bridge that gap. SuperTranscripts are a substitute for a reference genome, consisting of all the unique exonic sequence, in transcriptional order, such that each gene is represented by a single sequence. We demonstrate how superTranscripts allow visualization, variant detection and differential isoform detection in non-model organisms, using widely applied methods that are designed to work with reference genomes. SuperTranscripts can also be applied to model organisms to enhance visualization and discover novel expressed sequence. We describe Lace, software to construct superTranscripts from any set of transcripts including de novo assembled transcriptomes. In addition we used Lace to combine reference and assembled transcriptomes for chicken and recovered the sequence of hundreds of gaps in the reference genome.

Download Full-text

DE NOVO TRANSCRIPTOME ASSEMBLY AND GENE EXPRESSION ANALYSIS IN RESPONSE TO DROUGHT STRESS IN NARCISSUS PSEUDONARCISSUS

JP Journal of Biostatistics ◽

10.17654/jb017010209 ◽

2020 ◽

Vol 17 (1) ◽

pp. 209-223

Author(s):

Kamishirazi Maryamalsadat ◽

Majd Ahmad ◽

Arbabian Sedighe ◽

Tajadod Golnaz

Keyword(s):

Gene Expression ◽

Drought Stress ◽

Expression Analysis ◽

Gene Expression Analysis ◽

De Novo ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Narcissus Pseudonarcissus

Download Full-text