Improving long-read accuracy

Lei Tang

doi:10.1038/s41592-018-0204-y

LSCplus: a fast solution for improving long read accuracy by short read alignment

BMC Bioinformatics ◽

10.1186/s12859-016-1316-y ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 12

Author(s):

Ruifeng Hu ◽

Guibo Sun ◽

Xiaobo Sun

Keyword(s):

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Fast Solution ◽

Long Read ◽

Read Accuracy

Download Full-text

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Bioinformatics ◽

10.1093/bioinformatics/btz102 ◽

2019 ◽

Vol 36 (5) ◽

pp. 1374-1381 ◽

Cited By ~ 9

Author(s):

Antoine Limasset ◽

Jean-François Flot ◽

Pierre Peterlongo

Keyword(s):

Supplementary Information ◽

De Bruijn Graph ◽

Sequence Information ◽

Short Read ◽

De Bruijn Graphs ◽

Short Reads ◽

Sequencing Errors ◽

Long Read ◽

De Bruijn ◽

Read Accuracy

Abstract Motivation Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large datasets or consider reads as mere suites of k-mers, without taking into account their full-length sequence information. Results We propose a new method to correct short reads using de Bruijn graphs and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. Availability and implementation The implementation is open source, available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Toward perfect reads: short reads correction via mapping on compacted de Bruijn graphs

10.1101/558395 ◽

2019 ◽

Cited By ~ 3

Author(s):

Antoine Limasset ◽

Jean-François Flot ◽

Pierre Peterlongo

Keyword(s):

Large Data ◽

De Bruijn Graph ◽

Data Sets ◽

Short Read ◽

De Bruijn Graphs ◽

Short Reads ◽

Sequencing Errors ◽

Long Read ◽

De Bruijn ◽

Read Accuracy

AbstractMotivationsShort-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large data sets or consider reads as mere suites of k-mers, without taking into account their full-length read information.ResultsWe propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond.Availability and ImplementationThe implementation is open source and available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package.ContactAntoine Limasset [email protected] & Jean-François Flot [email protected] & Pierre Peterlongo [email protected]

Download Full-text

Improving PacBio Long Read Accuracy by Short Read Alignment

PLoS ONE ◽

10.1371/journal.pone.0046679 ◽

2012 ◽

Vol 7 (10) ◽

pp. e46679 ◽

Cited By ~ 195

Author(s):

Kin Fai Au ◽

Jason G. Underwood ◽

Lawrence Lee ◽

Wing Hung Wong

Keyword(s):

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Long Read ◽

Read Accuracy

Download Full-text

The NCTC 3000 project: development and optimization of DNA extraction methods for 3000 different bacterial strains suitable for long-read PacBio SMRT sequencing

10.26226/morressier.56d5ba27d462b80296c95fcb ◽

2016 ◽

Author(s):

Mohammed Fazal

Keyword(s):

Dna Extraction ◽

Extraction Methods ◽

Project Development ◽

Bacterial Strains ◽

Smrt Sequencing ◽

Long Read ◽

Dna Extraction Methods ◽

Pacbio Smrt Sequencing

Download Full-text

Filling the gap of short-read next generation sequencing in PGD by long-read approach

10.26226/morressier.5af300b2738ab10027aa98ef ◽

2018 ◽

Author(s):

Dona Ngar Yin Ho

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Short Read ◽

Long Read ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725503882.793524932 ◽

2016 ◽

Author(s):

Auinash Kalsotra

Keyword(s):

Transcriptome Analysis ◽

Long Read

Download Full-text

Faculty Opinions recommendation of MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732346961.793543720 ◽

2018 ◽

Author(s):

Charles Baer

Keyword(s):

Caenorhabditis Elegans ◽

Reference Genome ◽

Long Read

Download Full-text

Analysis of HLA-G long-read genomic sequences in mother–offspring pairs with preeclampsia

Scientific Reports ◽

10.1038/s41598-020-77081-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ayako Nishizawa ◽

Kazuki Kumada ◽

Keiko Tateno ◽

Maiko Wagata ◽

Sakae Saito ◽

...

Keyword(s):

Single Molecule ◽

Gene Polymorphisms ◽

Genomic Dna ◽

Genomic Sequences ◽

Genomic Sequencing ◽

Public Database ◽

Coding Sequences ◽

Pacbio Rs Ii ◽

Potential Association ◽

Long Read

AbstractPreeclampsia is a pregnancy-induced disorder that is characterized by hypertension and is a leading cause of perinatal and maternal–fetal morbidity and mortality. HLA-G is thought to play important roles in maternal–fetal immune tolerance, and the associations between HLA-G gene polymorphisms and the onset of pregnancy-related diseases have been explored extensively. Because contiguous genomic sequencing is difficult, the association between the HLA-G genotype and preeclampsia onset is controversial. In this study, genomic sequences of the HLA-G region (5.2 kb) from 31 pairs of mother–offspring genomic DNA samples (18 pairs from normal pregnancies/births and 13 from preeclampsia births) were obtained by single-molecule real-time sequencing using the PacBio RS II platform. The HLA-G alleles identified in our cohort matched seven known HLA-G alleles, but we also identified two new HLA-G alleles at the fourth-field resolution and compared them with nucleotide sequences from a public database that consisted of coding sequences that cover the 3.1-kb HLA-G gene span. Intriguingly, a potential association between preeclampsia onset and the poly T stretch within the downstream region of the HLA-G*01:01:01:01 allele was found. Our study suggests that long-read sequencing of HLA-G will provide clues for characterizing HLA-G variants that are involved in the pathophysiology of preeclampsia.

Download Full-text

Long-read genome sequencing for the molecular diagnosis of neurodevelopmental disorders

Human Genetics and Genomics Advances ◽

10.1016/j.xhgg.2021.100023 ◽

2021 ◽

Vol 2 (2) ◽

pp. 100023

Author(s):

Susan M. Hiatt ◽

James M.J. Lawlor ◽

Lori H. Handley ◽

Ryne C. Ramaker ◽

Brianne B. Rogers ◽

...

Keyword(s):

Genome Sequencing ◽

Molecular Diagnosis ◽

Neurodevelopmental Disorders ◽

Long Read

Download Full-text