Numt identification and removal with RtN!

August E Woerner; Jennifer Churchill Cihlar; Utpal Smart; Bruce Budowle

doi:10.1093/bioinformatics/btaa642

Numt identification and removal with RtN!

Bioinformatics ◽

10.1093/bioinformatics/btaa642 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5115-5116 ◽

Cited By ~ 2

Author(s):

August E Woerner ◽

Jennifer Churchill Cihlar ◽

Utpal Smart ◽

Bruce Budowle

Keyword(s):

Mitochondrial Genome ◽

Massively Parallel Sequencing ◽

Sequence Similarity ◽

Variant Calling ◽

Supplementary Information ◽

Mitochondrial Genomes ◽

Sequencing Data ◽

Read Mapping ◽

Genome Data ◽

Mitochondrial Sequences

Abstract Motivation Assays in mitochondrial genomics rely on accurate read mapping and variant calling. However, there are known and unknown nuclear paralogs that have fundamentally different genetic properties than that of the mitochondrial genome. Such paralogs complicate the interpretation of mitochondrial genome data and confound variant calling. Results Remove the Numts! (RtN!) was developed to categorize reads from massively parallel sequencing data not based on the expected properties and sequence identities of paralogous nuclear encoded mitochondrial sequences, but instead using sequence similarity to a large database of publicly available mitochondrial genomes. RtN! removes low-level sequencing noise and mitochondrial paralogs while not impacting variant calling, while competing methods were shown to remove true variants from mitochondrial mixtures. Availability and implementation https://github.com/Ahhgust/RtN Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Whisper: Read sorting allows robust mapping of sequencing data

10.1101/240358 ◽

2017 ◽

Author(s):

Sebastian Deorowicz ◽

Agnieszka Debudaj-Grabysz ◽

Adam Gudyś ◽

Szymon Grabowski

Keyword(s):

Reference Genome ◽

Variant Calling ◽

Real Data ◽

Supplementary Information ◽

Sequencing Data ◽

Suffix Arrays ◽

Link Type ◽

Mapping Tool ◽

Reverse Complement ◽

Comparable Accuracy

AbstractMotivationMapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. Mistakes made at this computationally challenging stage cannot be recovered easily.ResultsWe present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known Bowtie2 and BWA-MEM tools at a comparable accuracy (validated in variant calling pipeline).AvailabilityWhisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/[email protected] informationSupplementary data are available at publisher Web site.

Download Full-text

Pisces: An Accurate and Versatile Variant Caller for Somatic and Germline Next-Generation Sequencing Data

10.1101/291641 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tamsen Dunn ◽

Gwenn Berry ◽

Dorothea Emig-Agius ◽

Yu Jiang ◽

Serena Lei ◽

...

Keyword(s):

Next Generation Sequencing ◽

Gene Mutations ◽

Variant Calling ◽

Amplicon Sequencing ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ras Gene ◽

Generation Sequencing

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.

Download Full-text

NGSEP3: accurate variant calling across species and sequencing protocols

Bioinformatics ◽

10.1093/bioinformatics/btz275 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4716-4723 ◽

Cited By ~ 7

Author(s):

Daniel Tello ◽

Juanita Gil ◽

Cristian D Loaiza ◽

John J Riascos ◽

Nicolás Cardozo ◽

...

Keyword(s):

Short Tandem Repeats ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Variant Calling ◽

Real Data ◽

Supplementary Information ◽

Sequencing Data ◽

Comparative Accuracy ◽

Downstream Analysis ◽

Short Tandem

Abstract Motivation Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. Results Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. Availability and implementation NGSEP is available as open source software at http://ngsep.sf.net. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Comparison of Three Circular Mitochondrial Genomes of Fagus sylvatica from Germany and Poland Reveals Low Variation and Complete Identity of the Gene Space

Forests ◽

10.3390/f12050571 ◽

2021 ◽

Vol 12 (5) ◽

pp. 571

Author(s):

Bagdevi Mishra ◽

Bartosz Ulaszewski ◽

Joanna Meger ◽

Sebastian Ploch ◽

Jaroslaw Burczyk ◽

...

Keyword(s):

Mitochondrial Genome ◽

Central Europe ◽

Glacial Refugia ◽

The Other ◽

Mitochondrial Genomes ◽

Organelle Dna ◽

Phylogenetic Studies ◽

Low Degree ◽

Multiple Copies ◽

Mitochondrial Sequences

Similar to chloroplast loci, mitochondrial markers are frequently used for genotyping, phylogenetic studies, and population genetics, as they are easily amplified due to their multiple copies per cell. In a recent study, it was revealed that the chloroplast offers little variation for this purpose in central European populations of beech. Thus, it was the aim of this study to elucidate, if mitochondrial sequences might offer an alternative, or whether they are similarly conserved in central Europe. For this purpose, a circular mitochondrial genome sequence from the more than 300-year-old beech reference individual Bhaga from the German National Park Kellerwald-Edersee was assembled using long and short reads and compared to an individual from the Jamy Nature Reserve in Poland and a recently published mitochondrial genome from eastern Germany. The mitochondrial genome of Bhaga was 504,730 bp, while the mitochondrial genomes of the other two individuals were 15 bases shorter, due to seven indel locations, with four having more bases in Bhaga and three locations having one base less in Bhaga. In addition, 19 SNP locations were found, none of which were inside genes. In these SNP locations, 17 bases were different in Bhaga, as compared to the other two genomes, while 2 SNP locations had the same base in Bhaga and the Polish individual. While these figures are slightly higher than for the chloroplast genome, the comparison confirms the low degree of genetic divergence in organelle DNA of beech in central Europe, suggesting the colonisation from a common gene pool after the Weichsel Glaciation. The mitochondrial genome might have limited use for population studies in central Europe, but once mitochondrial genomes from glacial refugia become available, it might be suitable to pinpoint the origin of migration for the re-colonising beech population.

Download Full-text

DEUS: an R package for accurate small RNA profiling based on differential expression of unique sequences

Bioinformatics ◽

10.1093/bioinformatics/btz495 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4834-4836

Author(s):

Tim Jeske ◽

Peter Huypens ◽

Laura Stirm ◽

Selina Höckele ◽

Christine M Wurmser ◽

...

Keyword(s):

Differential Expression ◽

Small Rna ◽

Sequence Similarity ◽

Differential Expression Analysis ◽

R Package ◽

Supplementary Information ◽

Small Rna Sequencing ◽

Sequencing Data ◽

Rna Sequences ◽

Rna Profiling

Abstract Summary Despite their fundamental role in various biological processes, the analysis of small RNA sequencing data remains a challenging task. Major obstacles arise when short RNA sequences map to multiple locations in the genome, align to regions that are not annotated or underwent post-transcriptional changes which hamper accurate mapping. In order to tackle these issues, we present a novel profiling strategy that circumvents the need for read mapping to a reference genome by utilizing the actual read sequences to determine expression intensities. After differential expression analysis of individual sequence counts, significant sequences are annotated against user defined feature databases and clustered by sequence similarity. This strategy enables a more comprehensive and concise representation of small RNA populations without any data loss or data distortion. Availability and implementation Code and documentation of our R package at http://ibis.helmholtz-muenchen.de/deus/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Standard operating procedure for somatic variant refinement of tumor sequencing data

10.1101/266262 ◽

2018 ◽

Cited By ~ 1

Author(s):

Erica K. Barnell ◽

Peter Ronning ◽

Katie M. Campbell ◽

Kilannin Krysiak ◽

Benjamin J. Ainscough ◽

...

Keyword(s):

Massively Parallel Sequencing ◽

Variant Calling ◽

Standard Operating Procedure ◽

Sequencing Data ◽

Optimal Method ◽

Somatic Variant ◽

Standard Operating ◽

Variant Detection ◽

Manual Review

AbstractPurposeManual review of aligned sequencing reads is required to develop a high-quality list of somatic variants from massively parallel sequencing data (MPS). Despite widespread use in analyzing MPS data, there has been little attempt to describe methods for manual review, resulting in high inter- and intra-lab variability in somatic variant detection and characterization of tumors.MethodsOpen source software was used to develop an optimal method for manual review setup. We also developed a systemic approach to visually inspect each variant during manual review.ResultsWe present a standard operating procedures for somatic variant refinement for use by manual reviewers. The approach is enhanced through representative examples of 4 different manual review categories that indicate a reviewer’s confidence in the somatic variant call and 19 annotation tags that contextualize commonly observed sequencing patterns during manual review. Representative examples provide detailed instructions on how to classify variants during manual review to rectify lack of confidence in automated somatic variant detection.ConclusionStandardization of somatic variant refinement through systematization of manual review will improve the consistency and reproducibility of identifying true somatic variants after automated variant calling.

Download Full-text

xAtlas: Scalable small variant calling across heterogeneous next-generation sequencing experiments

10.1101/295071 ◽

2018 ◽

Cited By ~ 7

Author(s):

Jesse Farek ◽

Daniel Hughes ◽

Adam Mansfield ◽

Olga Krasheninina ◽

Waleed Nasser ◽

...

Keyword(s):

Next Generation Sequencing ◽

Rapid Development ◽

Variant Calling ◽

Supplementary Information ◽

Data Generation ◽

Next Generation ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Homogeneous Sample ◽

Generation Sequencing

AbstractMotivationThe rapid development of next-generation sequencing (NGS) technologies has lowered the barriers to genomic data generation, resulting in millions of samples sequenced across diverse experimental designs. The growing volume and heterogeneity of these sequencing data complicate the further optimization of methods for identifying DNA variation, especially considering that curated highconfidence variant call sets commonly used to evaluate these methods are generally developed by reference to results from the analysis of comparatively small and homogeneous sample sets.ResultsWe have developed xAtlas, an application for the identification of single nucleotide variants (SNV) and small insertions and deletions (indels) in NGS data. xAtlas is easily scalable and enables execution and retraining with rapid development cycles. Generation of variant calls in VCF or gVCF format from BAM or CRAM alignments is accomplished in less than one CPU-hour per 30× short-read human whole-genome. The retraining capabilities of xAtlas allow its core variant evaluation models to be optimized on new sample data and user-defined truth sets. Obtaining SNV and indels calls from xAtlas can be achieved more than 40 times faster than established methods while retaining the same accuracy.AvailabilityFreely available under a BSD 3-clause license at https://github.com/jfarek/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Primer designing strategy for amplification and sequencing of the complete mitochondrial genome of Semnopithecus hypoleucos

10.21203/rs.3.rs-811077/v1 ◽

2021 ◽

Author(s):

Vipin Hiremath ◽

Chandrakant Jadhav ◽

Gulab Khedkar

Keyword(s):

Mitochondrial Genome ◽

Complete Mitochondrial Genome ◽

Primate Species ◽

Mitochondrial Genomes ◽

Evolutionary Analysis ◽

Closely Related Species ◽

Design And Optimization ◽

Genome Data ◽

Phylogenetic Studies ◽

Conserved Gene

Abstract The mitochondrial genome is highly informative for evolutionary analysis of organism lineages and phylogenetic studies. The availability of robust primers for amplifying complete mitochondrial genomes is a crucial step in current mitogenome studies. However, organism specific characteristics such as variable transition to transversion substitution ratios seen in some groups pose challenges for the development of universal, or at least broadly applicable, primer pairs for this purpose. This study reports on a strategy of primer design and optimization (PDO) where regions of known mtDNA genescan be used for choosing primers for amplification, sequencing and assembly of entire mitochondrial genomes of several closely-related species. In brief, taking advantage of the circular organization of mtDNA, primers are first designed for amplification of “long” products using the 5’ region of one conserved gene and a 3’region from another conserved gene. Additional primers are then used to amplify “short” regions to fill in gaps to allow for complete assembly of the genome. We show how we were able to use this approach to successfully amplify entire mitochondrial genomes from a non-human primate species (Semnopithecus hypoleucos), and also how this provided data useful for annotation of the assembled genome data.

Download Full-text

The mitochondrial genomes of the acoelomorph worms Paratomella rubra and Isodiametra pulchra

10.1101/103556 ◽

2017 ◽

Cited By ~ 1

Author(s):

Helen E Robertson ◽

François Lapraz ◽

Bernhard Egger ◽

Maximilian J Telford ◽

Philipp H. Schiffer

Keyword(s):

Mitochondrial Genome ◽

Phylogenetic Trees ◽

Large Degree ◽

Mitochondrial Genes ◽

Mitochondrial Genomes ◽

Long Branch Attraction ◽

Protein Coding ◽

Simple Body ◽

Mitochondrial Sequences ◽

Marine Worms

AbstractAcoels are small, ubiquitous, but understudied, marine worms with a very simple body plan. Their internal phylogeny is still in parts unresolved, and the position of their proposed phylum Xenacoelomorpha (Xenoturbella+Acoela) is still debated.Here we describe mitochondrial genome sequences from two acoel species: Paratomella rubra and Isodiametra pulchra. The 14,954 nucleotide-long P. rubra sequence is typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome.Mitochondrial sequences for both P. rubra and I. pulchra have a unique genome organisation in comparison to other published metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap in P. rubra, with little non-coding sequence making the genome compact. Conversely, the I. pulchra mitochondrial genome has many long non-coding sequences between genes, likely driving the genome size expansion. Phylogenetic trees inferred from concatenated alignments of mitochondrial genes grouped the fast-evolving Acoela and Tunicata, almost certainly due to the systematic error of long branch attraction: a reconstruction artefact that is probably compounded by the fast substitution rate of mitochondrial genes in this taxon.

Download Full-text

MitoFlex: an efficient, high-performance toolkit for animal mitogenome assembly, annotation, and visualization

Bioinformatics ◽

10.1093/bioinformatics/btab111 ◽

2021 ◽

Author(s):

Jun-Yu Li ◽

Wei-Xuan Li ◽

An-Tai Wang ◽

Zhang Yu

Keyword(s):

Mitochondrial Genome ◽

High Performance ◽

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

Sequencing Data ◽

Protein Coding ◽

High Throughput Sequencing Data ◽

Genome Analysis Toolkit ◽

Overall Performance

Abstract Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text