Species-Specific Quality Control, Assembly and Contamination Detection in Microbial Isolate Sequences with AQUAMIS

Carlus Deneke; Holger Brendebach; Laura Uelze; Maria Borowiak; Burkhard Malorny; Simon H. Tausch

doi:10.3390/genes12050644

Species-Specific Quality Control, Assembly and Contamination Detection in Microbial Isolate Sequences with AQUAMIS

Genes ◽

10.3390/genes12050644 ◽

2021 ◽

Vol 12 (5) ◽

pp. 644

Author(s):

Carlus Deneke ◽

Holger Brendebach ◽

Laura Uelze ◽

Maria Borowiak ◽

Burkhard Malorny ◽

...

Keyword(s):

Quality Control ◽

Data Exchange ◽

De Novo ◽

Standard Procedure ◽

Source Tracking ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Control Assembly ◽

Species Specific

Sequencing of whole microbial genomes has become a standard procedure for cluster detection, source tracking, outbreak investigation and surveillance of many microorganisms. An increasing number of laboratories are currently in a transition phase from classical methods towards next generation sequencing, generating unprecedented amounts of data. Since the precision of downstream analyses depends significantly on the quality of raw data generated on the sequencing instrument, a comprehensive, meaningful primary quality control is indispensable. Here, we present AQUAMIS, a Snakemake workflow for an extensive quality control and assembly of raw Illumina sequencing data, allowing laboratories to automatize the initial analysis of their microbial whole-genome sequencing data. AQUAMIS performs all steps of primary sequence analysis, consisting of read trimming, read quality control (QC), taxonomic classification, de-novo assembly, reference identification, assembly QC and contamination detection, both on the read and assembly level. The results are visualized in an interactive HTML report including species-specific QC thresholds, allowing non-bioinformaticians to assess the quality of sequencing experiments at a glance. All results are also available as a standard-compliant JSON file, facilitating easy downstream analyses and data exchange. We have applied AQUAMIS to analyze ~13,000 microbial isolates as well as ~1000 in-silico contaminated datasets, proving the workflow’s ability to perform in high throughput routine sequencing environments and reliably predict contaminations. We found that intergenus and intragenus contaminations can be detected most accurately using a combination of different QC metrics available within AQUAMIS.

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

International Journal of Genomics ◽

10.1155/2014/434575 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Momchilo Vuyisich ◽

Ayesha Arefin ◽

Karen Davenport ◽

Shihai Feng ◽

Cheryl Gleasner ◽

...

Keyword(s):

Genomic Dna ◽

De Novo ◽

Gc Content ◽

Library Preparation ◽

Sequencing Data ◽

Bacterial Genomes ◽

Dna Amount ◽

High Quality ◽

Preparation Methods

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.

Download Full-text

TrancriptomeReconstructoR, A Data-Driven Annotation of Complex Transcriptomes

10.21203/rs.3.rs-131404/v1 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

A de novo frameshift mutation in ZEB2 causes polledness, abnormal skull shape, small body stature and subfertility in Fleckvieh cattle

Scientific Reports ◽

10.1038/s41598-020-73807-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Lilian J. Gehrke ◽

Maulik Upadhyay ◽

Kristin Heidrich ◽

Elisabeth Kunz ◽

Daniela Klaus-Halla ◽

...

Keyword(s):

De Novo ◽

Small Body ◽

Protein Product ◽

Whole Genome Sequencing Data ◽

Chromosome 2 ◽

Sequencing Data ◽

Autosomal Dominant Trait ◽

Intergenic Regions ◽

Body Stature ◽

Growth Of Animals

Abstract Polledness in cattle is an autosomal dominant trait. Previous studies have revealed allelic heterogeneity at the polled locus and four different variants were identified, all in intergenic regions. In this study, we report a case of polled bull (FV-Polled1) born to horned parents, indicating a de novo origin of this polled condition. Using 50K genotyping and whole genome sequencing data, we identified on chromosome 2 an 11-bp deletion (AC_000159.1:g.52364063_52364073del; Del11) in the second exon of ZEB2 gene as the causal mutation for this de novo polled condition. We predicted that the deletion would shorten the protein product of ZEB2 by almost 91%. Moreover, we showed that all animals carrying Del11 mutation displayed symptoms similar to Mowat-Wilson syndrome (MWS) in humans, which is also associated with genetic variations in ZEB2. The symptoms in cattle include delayed maturity, small body stature and abnormal shape of skull. This is the first report of a de novo dominant mutation affecting only ZEB2 and associated with a genetic absence of horns. Therefore our results demonstrate undoubtedly that ZEB2 plays an important role in the process of horn ontogenesis as well as in the regulation of overall development and growth of animals.

Download Full-text

PaSD-qc: quality control for single cell whole-genome sequencing data using power spectral density estimation

Nucleic Acids Research ◽

10.1093/nar/gkx1195 ◽

2017 ◽

Vol 46 (4) ◽

pp. e20-e20 ◽

Cited By ~ 7

Author(s):

Maxwell A Sherman ◽

Alison R Barton ◽

Michael A Lodato ◽

Carl Vitzthum ◽

Michael E Coulter ◽

...

Keyword(s):

Quality Control ◽

Spectral Density ◽

Density Estimation ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Spectral Density Estimation ◽

Power Spectral ◽

Power Spectral Density Estimation

Download Full-text

Α de novo 3.8-Mb inversion affecting the EDA and XIST genes in a heterozygous female calf with generalized hypohidrotic ectodermal dysplasia

BMC Genomics ◽

10.1186/s12864-019-6087-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Clémentine Escouflaire ◽

Emmanuelle Rebours ◽

Mathieu Charles ◽

Sébastien Orellana ◽

Margarita Cano ◽

...

Keyword(s):

Ectodermal Dysplasia ◽

De Novo ◽

Genetic Disorder ◽

Hair Follicles ◽

Whole Genome Sequencing Data ◽

Recessive Mutation ◽

Sources Of Information ◽

Affected Animal ◽

Sequencing Data ◽

Hypohidrotic Ectodermal Dysplasia

Abstract Background In mammals, hypohidrotic ectodermal dysplasia (HED) is a genetic disorder that is characterized by sparse hair, tooth abnormalities, and defects in cutaneous glands. Only four genes, EDA, EDAR, EDARADD and WNT10A account for more than 90% of HED cases, and EDA, on chromosome X, is involved in 50% of the cases. In this study, we explored an isolated case of a female Holstein calf with symptoms similar to HED. Results Clinical examination confirmed the diagnosis. The affected female showed homogeneous hypotrichosis and oligodontia as previously observed in bovine EDAR homozygous and EDA hemizygous mutants. Under light microscopy, the hair follicles were thinner and located higher in the dermis of the frontal skin in the affected animal than in the control. Moreover, the affected animal showed a five-fold increase in the number of hair follicles and a four-fold decrease in the diameter of the pilary canals. Pedigree analysis revealed that the coefficient of inbreeding of the affected calf (4.58%) was not higher than the average population inbreeding coefficient (4.59%). This animal had ten ancestors in its paternal and maternal lineages. By estimating the number of affected cases that would be expected if any of these common ancestors carried a recessive mutation, we concluded that, if they existed, other cases of HED should have been reported in France, which is not the case. Therefore, we assumed that the causal mutation was dominant and de novo. By analyzing whole-genome sequencing data, we identified a large chromosomal inversion with breakpoints located in the first introns of the EDA and XIST genes. Genotyping by PCR-electrophoresis the case and its parents allowed us to demonstrate the de novo origin of this inversion. Finally, using various sources of information we present a body of evidence that supports the hypothesis that this mutation is responsible for a skewed inactivation of X, and that only the normal X can be inactivated. Conclusions In this article, we report a unique case of X-linked HED affected Holstein female calf with an assumed full inactivation of the normal X-chromosome, thus leading to a severe phenotype similar to that of hemizygous males.

Download Full-text

De novo indels within introns contribute to ASD incidence

10.1101/137471 ◽

2017 ◽

Cited By ~ 2

Author(s):

Adriana Munoz ◽

Boris Yamrom ◽

Yoon-ha Lee ◽

Peter Andrews ◽

Steven Marks ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Target Genes ◽

De Novo ◽

Whole Genome Sequencing Data ◽

P Value ◽

Whole Genome ◽

Sequencing Data ◽

Control Sets ◽

The Difference

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.

Download Full-text

When Less is More: "Slicing" Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality

10.1101/013425 ◽

2015 ◽

Cited By ~ 1

Author(s):

Stefano Lonardi ◽

Hamid Mirebrahim ◽

Steve Wanamaker ◽

Matthew Alpert ◽

Gianfranco Ciardo ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Optimal Size ◽

Sequencing Data ◽

Less Is More ◽

Bac Clones ◽

Deep Sequencing Data ◽

First Time

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

Download Full-text

Leaf form diversification in an heirloom tomato results from alterations in two different HOMEOBOX genes

10.1101/2020.09.08.287011 ◽

2020 ◽

Author(s):

Hokuto Nakayama ◽

Steven D. Rowland ◽

Zizhang Cheng ◽

Kristina Zumstein ◽

Julie Kang ◽

...

Keyword(s):

Gene Network ◽

Natural Variation ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Phenotypic Traits ◽

Comparative Genome Analysis ◽

Sequencing Data ◽

Single Nucleotide ◽

Leaf Phenotype ◽

Gene Network Analysis

AbstractDomesticated plants and animals display tremendous diversity in various phenotypic traits and often this diversity is seen within the same species. Tomato (Solanum lycopersicum; Solanaceae) cultivars show wide variation in leaf morphology, but the influence of breeding efforts in sculpting this diversity is not known. Here, we demonstrate that a single nucleotide deletion in the homeobox motif of BIPINNATA, which is a BEL-LIKE HOMEODOMAIN gene, led to a highly complex leaf phenotype in an heirloom tomato, Silvery Fir Tree (SiFT). Additionally, a comparative gene network analysis revealed that reduced expression of the ortholog of WUSCHEL RELATED HOMEOBOX 1 is also important for the narrow leaflet phenotype seen in SiFT. Phylogenetic and comparative genome analysis using whole-genome sequencing data suggests that the bip mutation in SiFT is likely a de novo mutation, instead of standing genetic variation. These results provide new insights into natural variation in phenotypic traits introduced into crops during improvement processes after domestication.

Download Full-text

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads

10.1101/030437 ◽

2015 ◽

Cited By ~ 3

Author(s):

Ivan Sovic ◽

Kresimir Krizanovic ◽

Karolj Skala ◽

Mile Sikic

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Hybrid Methods ◽

Bacterial Genome ◽

Error Rates ◽

Sequencing Data ◽

E Coli ◽

Recent Emergence ◽

K 12

Recent emergence of nanopore sequencing technology set a challenge for the established assembly methods not optimized for the combination of read lengths and high error rates of nanopore reads. In this work we assessed how existing de novo assembly methods perform on these reads. We benchmarked three non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of E. coli K-12, using several sequencing coverages of nanopore data (20x, 30x, 40x and 50x). We attempted to assess the quality of assembly at each of these coverages, to estimate the requirements for closed bacterial genome assembly. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. Furthermore, when coverage is above 40x, all non-hybrid methods correctly assemble the E. coli genome, even a non-hybrid method tailored for Pacific Bioscience reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.

Download Full-text