circtools—a one-stop software solution for circular RNA research

Tobias Jakobi; Alexey Uvarovskii; Christoph Dieterich

doi:10.1093/bioinformatics/bty948

circtools—a one-stop software solution for circular RNA research

Bioinformatics ◽

10.1093/bioinformatics/bty948 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2326-2328 ◽

Cited By ~ 13

Author(s):

Tobias Jakobi ◽

Alexey Uvarovskii ◽

Christoph Dieterich

Keyword(s):

High Throughput Sequencing ◽

Circular Rna ◽

Statistical Testing ◽

Supplementary Information ◽

Circular Rnas ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Multi Stage ◽

Sequence Reconstruction ◽

One Stop

Abstract Motivation Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights. Results Here, we present circtools, a modular, Python-based framework for computational circRNA analyses. The software includes modules for circRNA detection, internal sequence reconstruction, quality checking, statistical testing, screening for enrichment of RBP binding sites, differential exon RNase R resistance and circRNA-specific primer design. circtools supports researchers with visualization options and data export into commonly used formats. Availability and implementation circtools is available via https://github.com/dieterich-lab/circtools and http://circ.tools under GPLv3.0. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

hypeR: An R Package for Geneset Enrichment Workflows

10.1101/656637 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

Link Type ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.

Download Full-text

hypeR: an R package for geneset enrichment workflows

Bioinformatics ◽

10.1093/bioinformatics/btz700 ◽

2019 ◽

Cited By ~ 3

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Use Cases ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]

Download Full-text

ADFinder: accurate detection of programmed DNA elimination using NGS high-throughput sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa226 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3632-3636 ◽

Cited By ~ 2

Author(s):

Weibo Zheng ◽

Jing Chen ◽

Thomas G Doak ◽

Weibo Song ◽

Ying Yan

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Sequencing Data ◽

Source Codes ◽

High Throughput Sequencing Data ◽

Dna Elimination ◽

Multiple Alternative ◽

Dna Splicing

Abstract Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Five Circular RNAs in Metabolism Pathways Related to Prostate Cancer

Frontiers in Genetics ◽

10.3389/fgene.2021.636419 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lili Zhang ◽

Wei Zhang ◽

Hexin Li ◽

Xiaokun Tang ◽

Siyuan Xu ◽

...

Keyword(s):

Prostate Cancer ◽

Signaling Pathway ◽

High Throughput Sequencing ◽

Specific Antigen ◽

Mapk Signaling ◽

Circular Rnas ◽

Sequencing Data ◽

Clinical Indicators ◽

High Throughput Sequencing Data ◽

Differentially Expressed Mrna

Prostate cancer (PCa) is the most common malignant tumor in men, and its incidence increases with age. Serum prostate-specific antigen and tissue biopsy remain the standard for diagnosis of suspected PCa. However, these clinical indicators may lead to aggressive overtreatment in patients who have been treated sufficiently with active surveillance. Circular RNAs (circRNAs) have been recently recognized as a new type of regulatory RNA that is not easily degraded by RNases and other exonucleases because of their covalent closed cyclic structure. Thus, we utilized high-throughput sequencing data and bioinformatics analysis to identify specifically expressed circRNAs in PCa and filtered out five specific circRNAs for further analysis—hsa_circ_0006410, hsa_circ_0003970, hsa_circ_0006754, hsa_circ_0005848, and a novel circRNA, hsa_circ_AKAP7. We constructed a circRNA-miRNA regulatory network and used miRNA and differentially expressed mRNA interactions to predict the function of the selected circRNAs. Furthermore, survival analysis of their cognate genes and PCR verification of these five circRNAs revealed that they are closely related to well-known PCa pathways such as the MAPK signaling pathway, P53 pathway, androgen receptor signaling pathway, cell cycle, hormone-mediated signaling pathway, and cellular lipid metabolic process. By understanding the related metabolism of circRNAs, these circRNAs could act as metabolic biomarkers, and monitoring their levels could help diagnose PCa. Meanwhile, the exact regulatory mechanism for AR-related regulation in PCa is still unclear. The circRNAs we found can provide new solutions for research in this field.

Download Full-text

seekCRIT: Detecting and characterizing differentially expressed circular RNAs using high-throughput sequencing data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008338 ◽

2020 ◽

Vol 16 (10) ◽

pp. e1008338

Author(s):

Mohamed Chaabane ◽

Kalina Andreeva ◽

Jae Yeon Hwang ◽

Tae Lim Kook ◽

Juw Won Park ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Differentially Expressed ◽

Circular Rnas ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Inference of viral quasispecies with a paired de Bruijn graph

Bioinformatics ◽

10.1093/bioinformatics/btaa782 ◽

2020 ◽

Author(s):

Borja Freire ◽

Susana Ladra ◽

Jose R Paramá ◽

Leena Salmela

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

De Bruijn Graph ◽

Viral Quasispecies ◽

Sequencing Data ◽

De Bruijn Graphs ◽

Sequencing Errors ◽

High Throughput Sequencing Data ◽

De Bruijn

Abstract Motivation RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. Results We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo. Availability and implementation viaDBG is implemented in C++ and it is publicly available at https://bitbucket.org/bfreirec1/viadbg. All datasets used in this article are publicly available at https://bitbucket.org/bfreirec1/data-viadbg/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MitoFlex: an efficient, high-performance toolkit for animal mitogenome assembly, annotation, and visualization

Bioinformatics ◽

10.1093/bioinformatics/btab111 ◽

2021 ◽

Author(s):

Jun-Yu Li ◽

Wei-Xuan Li ◽

An-Tai Wang ◽

Zhang Yu

Keyword(s):

Mitochondrial Genome ◽

High Performance ◽

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

Sequencing Data ◽

Protein Coding ◽

High Throughput Sequencing Data ◽

Genome Analysis Toolkit ◽

Overall Performance

Abstract Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files

Bioinformatics ◽

10.1093/bioinformatics/btaa926 ◽

2020 ◽

Author(s):

Babak Saremi ◽

Moritz Kohls ◽

Pamela Liebig ◽

Ursula Siebert ◽

Klaus Jung

Keyword(s):

Mixture Model ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Sequencing Data ◽

Bootstrap Sampling ◽

Technical Errors ◽

High Throughput Sequencing Data ◽

The Difference ◽

Real World Datasets ◽

Highly Correlated

Abstract Motivation High-throughput sequencing data can be affected by different technical errors, e.g. from probe preparation or false base calling. As a consequence, reproducibility of experiments can be weakened. In virus metagenomics, technical errors can result in falsely identified viruses in samples from infected hosts. We present a new resampling approach based on bootstrap sampling of sequencing reads from FASTQ-files in order to generate artificial replicates of sequencing runs which can help to judge the robustness of an analysis. In addition, we evaluate a mixture model on the distribution of read counts per virus to identify potentially false positive findings. Results The evaluation of our approach on an artificially generated dataset with known viral sequence content shows in general a high reproducibility of uncovering viruses in sequencing data, i.e. the correlation between original and mean bootstrap read count was highly correlated. However, the bootstrap read counts can also indicate reduced or increased evidence for the presence of a virus in the biological sample. We also found that the mixture-model fits well to the read counts, and furthermore, it provides a higher accuracy on the original or on the bootstrap read counts than on the difference between both. The usefulness of our methods is further demonstrated on two freely available real-world datasets from harbor seals. Availability and implementation We provide a Phyton tool, called RESEQ, available from https://github.com/babaksaremi/RESEQ that allows efficient generation of bootstrap reads from an original FASTQ-file. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

OGRE: Overlap Graph-based metagenomic Read clustEring

Bioinformatics ◽

10.1093/bioinformatics/btaa760 ◽

2020 ◽

Author(s):

Marleen Balvert ◽

Xiao Luo ◽

Ernestina Hauptfeld ◽

Alexander Schönhuth ◽

Bas E Dutilh

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Supplementary Information ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Computationally Intensive ◽

Metagenome Sequencing ◽

Species Specific ◽

Cluster Purity

Abstract Motivation The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation Code is made available on Github (https://github.com/Marleen1/OGRE). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CircMiner: accurate and rapid detection of circular RNA through splice-aware pseudo-alignment scheme

Bioinformatics ◽

10.1093/bioinformatics/btaa232 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3703-3711 ◽

Cited By ~ 1

Author(s):

Hossein Asghari ◽

Yen-Yi Lin ◽

Yang Xu ◽

Ehsan Haghshenas ◽

Colin C Collins ◽

...

Keyword(s):

Cell Line ◽

Rapid Detection ◽

High Throughput Sequencing ◽

Circular Rna ◽

Supplementary Information ◽

Circular Rnas ◽

Alignment Technique ◽

Nucleotide Resolution ◽

High Degree ◽

Splice Junctions

Abstract Motivation The ubiquitous abundance of circular RNAs (circRNAs) has been revealed by performing high-throughput sequencing in a variety of eukaryotes. circRNAs are related to some diseases, such as cancer in which they act as oncogenes or tumor-suppressors and, therefore, have the potential to be used as biomarkers or therapeutic targets. Accurate and rapid detection of circRNAs from short reads remains computationally challenging. This is due to the fact that identifying chimeric reads, which is essential for finding back-splice junctions, is a complex process. The sensitivity of discovery methods, to a high degree, relies on the underlying mapper that is used for finding chimeric reads. Furthermore, all the available circRNA discovery pipelines are resource intensive. Results We introduce CircMiner, a novel stand-alone circRNA detection method that rapidly identifies and filters out linear RNA sequencing reads and detects back-splice junctions. CircMiner employs a rapid pseudo-alignment technique to identify linear reads that originate from transcripts, genes or the genome. CircMiner further processes the remaining reads to identify the back-splice junctions and detect circRNAs with single-nucleotide resolution. We evaluated the efficacy of CircMiner using simulated datasets generated from known back-splice junctions and showed that CircMiner has superior accuracy and speed compared to the existing circRNA detection tools. Additionally, on two RNase R treated cell line datasets, CircMiner was able to detect most of consistent, high confidence circRNAs compared to untreated samples of the same cell line. Availability and implementation CircMiner is implemented in C++ and is available online at https://github.com/vpc-ccg/circminer. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text