Exogene: A performant workflow for detecting viral integrations from paired-end next-generation sequencing data

Zachary Stephens; Daniel O’Brien; Mrunal Dehankar; Lewis R. Roberts; Ravishankar K. Iyer; Jean-Pierre Kocher

doi:10.1371/journal.pone.0250915

Exogene: A performant workflow for detecting viral integrations from paired-end next-generation sequencing data

PLoS ONE ◽

10.1371/journal.pone.0250915 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0250915

Author(s):

Zachary Stephens ◽

Daniel O’Brien ◽

Mrunal Dehankar ◽

Lewis R. Roberts ◽

Ravishankar K. Iyer ◽

...

Keyword(s):

Next Generation Sequencing ◽

Sequence Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Long Read ◽

Breakpoint Detection ◽

Targeted Capture ◽

Genome Heterogeneity ◽

Generation Sequencing

The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene’s read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.

Download Full-text

Exogene: A performant workflow for detecting viral integrations from paired-end next-generation sequencing data

10.1101/2021.04.19.440427 ◽

2021 ◽

Author(s):

Jean-Pierre Kocher ◽

Zachary Stephens ◽

Daniel O'Brien ◽

Mrunal Dehankar ◽

Lewis Roberts ◽

...

Keyword(s):

Next Generation Sequencing ◽

Sequence Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Long Read ◽

Breakpoint Detection ◽

Targeted Capture ◽

Genome Heterogeneity ◽

Generation Sequencing

The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with those found in long read validation sets. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are validated by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq or targeted capture.

Download Full-text

Methods for analyzing next-generation sequencing data VII. long-read assembly

Japanese Journal of Lactic Acid Bacteria ◽

10.4109/jslab.27.101 ◽

2016 ◽

Vol 27 (2) ◽

pp. 101-110

Author(s):

Yasuhiro Tanizawa ◽

Eli Kaminuma ◽

Yasukazu Nakamura ◽

Masanori Tohno ◽

Ken Osaki ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Long Read ◽

Generation Sequencing

Download Full-text

A Simple Deep Learning Approach for Detecting Duplications and Deletions in Next-Generation Sequencing Data

10.1101/657361 ◽

2019 ◽

Author(s):

Tom Hill ◽

Robert L. Unckless

Keyword(s):

Machine Learning ◽

Next Generation Sequencing ◽

Copy Number Variants ◽

Difficult Problem ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

High Coverage ◽

Long Read ◽

Generation Sequencing

AbstractCopy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods or coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.Available at: https://github.com/tomh1lll/dudeml

Download Full-text

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Next Generation Sequencing ◽

Network Analysis ◽

Next Generation Sequencing Data ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727775566.793536095 ◽

2017 ◽

Author(s):

Steve Pereira

Keyword(s):

Pancreatic Cancer ◽

Next Generation Sequencing ◽

Precision Medicine ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Assisted Analysis ◽

Generation Sequencing

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text

recoup: flexible and versatile signal visualization from next generation sequencing

BMC Bioinformatics ◽

10.1186/s12859-020-03902-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Panagiotis Moulos

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Special Focus ◽

Next Generation ◽

Sequencing Data ◽

User Friendliness ◽

Computational Environment ◽

Level Data ◽

Data Signal ◽

Generation Sequencing

Abstract Background The relentless continuing emergence of new genomic sequencing protocols and the resulting generation of ever larger datasets continue to challenge the meaningful summarization and visualization of the underlying signal generated to answer important qualitative and quantitative biological questions. As a result, the need for novel software able to reliably produce quick, comprehensive, and easily repeatable genomic signal visualizations in a user-friendly manner is rapidly re-emerging. Results recoup is a Bioconductor package for quick, flexible, versatile, and accurate visualization of genomic coverage profiles generated from Next Generation Sequencing data. Coupled with a database of precalculated genomic regions for multiple organisms, recoup offers processing mechanisms for quick, efficient, and multi-level data interrogation with minimal effort, while at the same time creating publication-quality visualizations. Special focus is given on plot reusability, reproducibility, and real-time exploration and formatting options, operations rarely supported in similar visualization tools in a profound way. recoup was assessed using several qualitative user metrics and found to balance the tradeoff between important package features, including speed, visualization quality, overall friendliness, and the reusability of the results with minimal additional calculations. Conclusion While some existing solutions for the comprehensive visualization of NGS data signal offer satisfying results, they are often compromised regarding issues such as effortless tracking of processing and preparation steps under a common computational environment, visualization quality and user friendliness. recoup is a unique package presenting a balanced tradeoff for a combination of assessment criteria while remaining fast and friendly.

Download Full-text

Clinical Implications of Copy Number Alteration Detection using Panel-Based Next-Generation Sequencing Data in Myelodysplastic Syndrome

Leukemia Research ◽

10.1016/j.leukres.2021.106540 ◽

2021 ◽

pp. 106540

Author(s):

Yoo-Jin Kim ◽

Seung-Hyun Jung ◽

Eun-Hye Hur ◽

Eun-Ji Choi ◽

Kyoo-Hyung Lee ◽

...

Keyword(s):

Next Generation Sequencing ◽

Myelodysplastic Syndrome ◽

Copy Number ◽

Copy Number Alteration ◽

Next Generation Sequencing Data ◽

Clinical Implications ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data

Genomics Proteomics & Bioinformatics ◽

10.1016/s1672-0229(11)60027-2 ◽

2011 ◽

Vol 9 (6) ◽

pp. 238-244 ◽

Cited By ~ 21

Author(s):

Tongwu Zhang ◽

Yingfeng Luo ◽

Kan Liu ◽

Linlin Pan ◽

Bing Zhang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Quality Assessment ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

Genome Biology ◽

10.1186/gb-2010-11-10-r99 ◽

2010 ◽

Vol 11 (10) ◽

Cited By ~ 53

Author(s):

Nils Homer ◽

Stanley F Nelson

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Variant Discovery ◽

Generation Sequencing

Download Full-text