transcript reconstruction
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 5)

H-INDEX

5
(FIVE YEARS 1)

2022 ◽  
Author(s):  
Michael A Schon ◽  
Stefan Lutzmayer ◽  
Falko Hofmann ◽  
Michael D Nodine

Accurate annotation of transcript isoforms is crucial for functional genomics research, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data are imprecise. We developed a generalized transcript assembly framework called Bookend that incorporates data from multiple modes of RNA-seq, with a focus on identifying, labeling, and deconvoluting RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correctly modeling transcript start and end sites is essential for precise transcript assembly. Furthermore, we discover that reads from full-length single-cell RNA-seq (scRNA-seq) methods are sparsely end-labeled, and that these ends are sufficient to dramatically improve precision of assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq in the model plant Arabidopsis and meta-assembly of single mouse embryonic stem cells (mESCs) are both capable of producing tissue-specific end-to-end transcript annotations of comparable or superior quality to existing reference isoforms.


2021 ◽  
Vol 12 ◽  
Author(s):  
Michelle M. Halstead ◽  
Alma Islas-Trejo ◽  
Daniel E. Goszczynski ◽  
Juan F. Medrano ◽  
Huaijun Zhou ◽  
...  

A comprehensive annotation of transcript isoforms in domesticated species is lacking. Especially considering that transcriptome complexity and splicing patterns are not well-conserved between species, this presents a substantial obstacle to genomic selection programs that seek to improve production, disease resistance, and reproduction. Recent advances in long-read sequencing technology have made it possible to directly extrapolate the structure of full-length transcripts without the need for transcript reconstruction. In this study, we demonstrate the power of long-read sequencing for transcriptome annotation by coupling Oxford Nanopore Technology (ONT) with large-scale multiplexing of 93 samples, comprising 32 tissues collected from adult male and female Hereford cattle. More than 30 million uniquely mapping full-length reads were obtained from a single ONT flow cell, and used to identify and characterize the expression dynamics of 99,044 transcript isoforms at 31,824 loci. Of these predicted transcripts, 21% exactly matched a reference transcript, and 61% were novel isoforms of reference genes, substantially increasing the ratio of transcript variants per gene, and suggesting that the complexity of the bovine transcriptome is comparable to that in humans. Over 7,000 transcript isoforms were extremely tissue-specific, and 61% of these were attributed to testis, which exhibited the most complex transcriptome of all interrogated tissues. Despite profiling over 30 tissues, transcription was only detected at about 60% of reference loci. Consequently, additional studies will be necessary to continue characterizing the bovine transcriptome in additional cell types, developmental stages, and physiological conditions. However, by here demonstrating the power of ONT sequencing coupled with large-scale multiplexing, the task of exhaustively annotating the bovine transcriptome – or any mammalian transcriptome – appears significantly more feasible.


2020 ◽  
Vol 36 (9) ◽  
pp. 2712-2717
Author(s):  
Ting Yu ◽  
Juntao Liu ◽  
Xin Gao ◽  
Guojun Li

Abstract Motivation Full-length transcript reconstruction is very important and quite challenging for the widely used RNA-seq data analysis. Currently, available RNA-seq assemblers generally suffered from serious limitations in practical applications, such as low assembly accuracy and incompatibility with latest alignment tools. Results We introduce iPAC, a new genome-guided assembler for reconstruction of isoforms, which revolutionizes the usage of paired-end and sequencing depth information via phasing and combing paths over a newly designed phasing graph. Tested on both simulated and real datasets, it is to some extent superior to all the salient assemblers of the same kind. Especially, iPAC is significantly powerful in recovery of lowly expressed transcripts while others are not. Availability and implementation iPAC is freely available at http://sourceforge.net/projects/transassembly/files. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Dana Wyman ◽  
Gabriela Balderrama-Gutierrez ◽  
Fairlie Reese ◽  
Shan Jiang ◽  
Sorena Rahmanian ◽  
...  

ABSTRACTAlternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short reads. Here we introduce TALON, the ENCODE4 pipeline for platform-independent analysis of long-read transcriptomes. We apply TALON to the GM12878 cell line and show that while both PacBio and ONT technologies perform well at full-transcript discovery and quantification, each displayed distinct technical artifacts. We further apply TALON to mouse hippocampus and cortex transcriptomes and find that 422 genes found in these regions have more reads associated with novel isoforms than with annotated ones. We demonstrate that TALON is a capable of tracking both known and novel transcript models as well as their expression levels across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.


2019 ◽  
Vol 35 (21) ◽  
pp. 4264-4271
Author(s):  
Juntao Liu ◽  
Xiangyu Liu ◽  
Xianwen Ren ◽  
Guojun Li

Abstract Motivation Full-length transcript reconstruction is essential for single-cell RNA-seq data analysis, but dropout events, which can cause transcripts discarded completely or broken into pieces, pose great challenges for transcript assembly. Currently available RNA-seq assemblers are generally designed for bulk RNA sequencing. To fill the gap, we introduce single-cell RNA-seq assembler, a method that applies explicit strategies to impute lost information caused by dropout events and a combing strategy to infer transcripts using scRNA-seq. Results Extensive evaluations on both simulated and biological datasets demonstrated its superiority over the state-of-the-art RNA-seq assemblers including StringTie, Cufflinks and CLASS2. In particular, it showed a remarkable capability of recovering unknown ‘novel’ isoforms and highly computational efficiency compared to other tools. Availability and implementation scRNAss is free, open-source software available from https://sourceforge.net/projects/single-cell-rna-seq-assembly/files/. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Kristoffer Sahlin ◽  
Paul Medvedev

AbstractLong-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.


2017 ◽  
Author(s):  
Luca Venturini ◽  
Shabhonam Caim ◽  
Gemy G Kaithakottil ◽  
Daniel L Mapleson ◽  
David Swarbreck

AbstractThe performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado.


2016 ◽  
Author(s):  
Ruolin Liu ◽  
Julie Dickerson

We propose a novel method and computational tool, Strawberry, for transcript reconstruction and quantification from paired-end RNA-seq data under the guidance of genome alignment and independent of gene annotation. Strawberry achieves this through disentangling assembly and quantification in a sequential manner. The application of a fast flow network algorithm for assembly speeds up the construction of a parsimonious set of transcripts. The resulting reduced data representation improves the efficiency of expression-level quantification. Strawberry leverages the speed and accuracy of transcript assembly and quantification in such a way that processing 10 million simulated reads (after alignment) requires only 90 seconds using a single thread while achieving over 92% correlation with the ground truth, making it the state-of-the-art method. Strawberry outperforms Cufflinks and StringTie, the two other leading methods, in many aspects, including the number of corrected assembled transcripts and the correlation with the ground truth of simulated RNA-seq data. Availability: Strawberry is written in C++11, and is available as open source software at https://github.com/ruolin/Strawberry under the GPLv3 license.


2015 ◽  
Author(s):  
Stefan Canzar ◽  
Sandro Andreotti ◽  
David Weese ◽  
Knut Reinert ◽  
Gunnar W. Klau

We present CIDANE, a novel framework for genome-based transcript reconstruction and quantification from RNA-seq reads. CIDANE assembles transcripts with significantly higher sensitivity and precision than existing tools, while competing in speed with the fastest methods. In addition to reconstructing transcripts ab initio, the algorithm also allows to make use of the growing annotation of known splice sites, transcription start and end sites, or full-length transcripts, which are available for most model organisms. CIDANE supports the integrated analysis of RNA-seq and additional gene-boundary data and recovers splice junctions that are invisible to other methods. CIDANE is available at http://ccb.jhu.edu/software/cidane/.


Sign in / Sign up

Export Citation Format

Share Document