Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

Daniel P. Dacey; Frédéric J. J. Chain

doi:10.1186/s12859-021-04410-2

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

BMC Bioinformatics ◽

10.1186/s12859-021-04410-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Daniel P. Dacey ◽

Frédéric J. J. Chain

Keyword(s):

Read Depth ◽

Taxonomic Composition ◽

Taxonomic Classification ◽

Read Length ◽

Reference Database ◽

Reference Databases ◽

Sequence Quality ◽

First Time ◽

Mock Communities

Abstract Background Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases. Results The addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes. Conclusions Merged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations.

Download Full-text

Adaptability of Ultrasonic Lamb Wave Touchscreen to the Variations in Touch Force and Touch Area

Sensors ◽

10.3390/s21051736 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1736

Author(s):

Zengchong Yang ◽

Xiucheng Liu ◽

Bin Wu ◽

Ren Liu

Keyword(s):

Lamb Wave ◽

Weight Coefficient ◽

The Self ◽

Reference Database ◽

Learning Method ◽

Improved Method ◽

Large Area ◽

Localization Model ◽

Reference Databases ◽

First Time

Previous studies on Lamb wave touchscreen (LWT) were carried out based on the assumption that the unknown touch had the consistent parameters with acoustic fingerprints in the reference database. The adaptability of LWT to the variations in touch force and touch area was investigated in this study for the first time. The automatic collection of the databases of acoustic fingerprints was realized with an experimental prototype of LWT employing three pairs of transmitter–receivers. The self-adaptive updated weight coefficient of the used transmitter–receiver pairs was employed to successfully improve the accuracy of the localization model established based on a learning method. The performance of the improved method in locating single- and two-touch actions with the reference database of different parameters was carefully evaluated. The robustness of the LWT to the variation of the touch force varied with the touch area. Moreover, it was feasible to locate touch actions of large area with reference databases of small touch areas as long as the unknown touch and the reference databases met the condition of equivalent averaged stress.

Download Full-text

First report on cyanobacterial flora from Masirah Island, Sultanate of Oman

Algologia ◽

10.15407/alg30.04.440 ◽

2020 ◽

Vol 30 (4) ◽

pp. 440-451

Author(s):

M. Shamina ◽

Keyword(s):

Environmental Conditions ◽

Food Industry ◽

Vital Role ◽

Taxonomic Composition ◽

Climatic Conditions ◽

Taxonomic Classification ◽

Biofuel Production ◽

Sultanate Of Oman ◽

First Report ◽

First Time

Cyanobacteria are organisms which play a vital role in various molecular and biotechnological aspects in food industry, agriculture, pharmaceuticals, neutraceuticals, biofuel production, etc., it is necessary to understand its adaptability to various environmental conditions. Furthermore it is equally important to discover new cyanobacterial taxa and with it occasional changes in taxonomic classification, thus the author set out to study cyanobacteria in extreme climatic conditions of desert, where temperatures are mostly above 45 oC. The taxonomic composition of cyanobacteria of Masirah Island, Sultanate of Oman, was studied for the first time. The studied samples were collected during the period of 2017–2019. The ten samples belonged to two orders: Oscillatoriales Schaffner and Synechococcales L.Hoffmann, Komárek & J.Kastovsky. All of them were filamentous non-heterocyst forms. Three species belonged to the genus Leptolyngbya Anagn. & Komárek, the genera Oscillatoria Vaucher ex Gomont and Lyngbya C.Agardh ex Gomont were represented by two species each, while the genera Pseudanabena Lauterborn, Planktolyngbya Anagn. & Komárek and Geitlerinema (Anagn. & Komárek) Anagn. were one species.

Download Full-text

Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT

Genome Biology ◽

10.1186/s13059-019-1817-x ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 26

Author(s):

F. A. Bastiaan von Meijenfeldt ◽

Ksenia Arkhipova ◽

Diego D. Cambuy ◽

Felipe H. Coutinho ◽

Bas E. Dutilh

Keyword(s):

Dna Sequences ◽

De Novo ◽

Taxonomic Classification ◽

Classification Method ◽

Reference Database ◽

Annotation Tool ◽

Multiple Signals

Abstract Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications that are too specific, especially when the sequences represent novel deep lineages. We present a classification method that integrates multiple signals to classify sequences (Contig Annotation Tool, CAT) and metagenome-assembled genomes (Bin Annotation Tool, BAT). Classifications are automatically made at low taxonomic ranks if closely related organisms are present in the reference database and at higher ranks otherwise. The result is a high classification precision even for sequences from considerably unknown organisms.

Download Full-text

Kaiju: Fast and sensitive taxonomic classification for metagenomics

10.1101/031229 ◽

2015 ◽

Cited By ~ 7

Author(s):

Peter Menzel ◽

Kim Lee Ng ◽

Anders Krogh

Keyword(s):

Large Scale ◽

Taxonomic Classification ◽

Greedy Heuristic ◽

Reference Database ◽

The Novel ◽

Sequencing Technologies ◽

A Genome ◽

Reference Databases ◽

Genome Exclusion ◽

Higher Sensitivity

The constantly decreasing cost and increasing output of current sequencing technologies enable large scale metagenomic studies of microbial communities from diverse habitats. Therefore, fast and accurate methods for taxonomic classification are needed, which can operate on increasingly larger datasets and reference databases. Recently, several fast metagenomic classifiers have been developed, which are based on comparison of genomic k-mers. However, nucleotide comparison using a fixed k-mer length often lacks the sensitivity to overcome the evolutionary distance between sampled species and genomes in the reference database. Here, we present the novel metagenome classifier Kaiju for fast assignment of reads to taxa. Kaiju finds maximum exact matches on the protein-level using the Borrows-Wheeler transform, and can optionally allow amino acid substitutions in the search using a greedy heuristic. We show in a genome exclusion study that Kaiju can classify more reads with higher sensitivity and similar precision compared to fast k-mer based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies more than twice as many reads in ten real metagenomes compared to programs based on genomic k-mers. Kaiju can process up to millions of reads per minute, and its memory footprint is below 6 GB of RAM, allowing the analysis on a standard PC. The program is available under the GPL3 license at: http://bioinformatics-centre.github.io/kaiju

Download Full-text

Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT

10.1101/530188 ◽

2019 ◽

Cited By ~ 9

Author(s):

F.A. Bastiaan von Meijenfeldt ◽

Ksenia Arkhipova ◽

Diego D. Cambuy ◽

Felipe H. Coutinho ◽

Bas E. Dutilh

Keyword(s):

Dna Sequences ◽

Real World ◽

Taxonomic Classification ◽

Reference Database ◽

Annotation Tool ◽

High Quality

ABSTRACTCurrent-day metagenomics increasingly requires taxonomic classification of long DNA sequences and metagenome-assembled genomes (MAGs) of unknown microorganisms. We show that the standard best-hit approach often leads to classifications that are too specific. We present tools to classify high-quality metagenomic contigs (Contig Annotation Tool, CAT) and MAGs (Bin Annotation Tool, BAT) and thoroughly benchmark them with simulated metagenomic sequences that are classified against a reference database where related sequences are increasingly removed, thereby simulating increasingly unknown queries. We find that the query sequences are correctly classified at low taxonomic ranks if closely related organisms are present in the reference database, while classifications are made higher in the taxonomy when closely related organisms are absent, thus avoiding spurious classification specificity. In a real-world challenge, we apply BAT to over 900 MAGs from a recent rumen metagenomics study and classified 97% consistently with prior phylogeny-based classifications, but in a fully automated fashion.

Download Full-text

NGS read classification using AI

PLoS ONE ◽

10.1371/journal.pone.0261548 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261548

Author(s):

Benjamin Voigt ◽

Oliver Fischer ◽

Christian Krumnow ◽

Christian Herta ◽

Piotr Wojciech Dabrowski

Keyword(s):

Neural Network ◽

Reference Database ◽

Metagenomic Sequencing ◽

Huge Amount ◽

Coding Sequences ◽

The Past ◽

Novel Approach ◽

Reference Databases ◽

Powerful Diagnostic Tool

Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient’s sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen’s genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.

Download Full-text

Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data

The Indian Journal of Medical Research ◽

10.4103/ijmr.ijmr_220_18 ◽

2020 ◽

Vol 151 (1) ◽

pp. 93

Author(s):

Rakesh Aggarwal ◽

Shikha Agnihotry ◽

AdityaN Sarangi

Keyword(s):

16S Rrna ◽

Sequence Data ◽

Taxonomic Classification ◽

Reference Database ◽

Rrna Sequence ◽

16S Rrna Sequence

Download Full-text

SprayNPray: user-friendly taxonomic profiling of genome and metagenome contigs

10.1101/2021.07.17.452725 ◽

2021 ◽

Author(s):

Arkadiy I Garber ◽

Catherine R Armbruster ◽

Stella E Lee ◽

Vaughn S Cooper ◽

Jennifer M Bomberger ◽

...

Keyword(s):

Gc Content ◽

Taxonomic Classification ◽

Reference Database ◽

Taxonomic Profiling ◽

Spot Check ◽

Domains Of Life ◽

Multiple Domains ◽

Multiple Metrics ◽

User Friendly

Shotgun sequencing of cultured microbial isolates/individual eukaryotes (whole-genome sequencing) and microbial communities (metagenomics) has become commonplace in biology. Very often, sequenced samples encompass organisms spanning multiple domains of life, necessitating increasingly elaborate software for accurate taxonomic classification of assembled sequences. While many software tools for taxonomic classification exist, SprayNPray offers a quick and user-friendly, semi- automated approach, allowing users to separate contigs by taxonomy (and other metrics) of interest. Easy installation, usage, and intuitive output, which is amenable to visual inspection and/or further computational parsing, will reduce barriers for biologists beginning to analyze genomes and metagenomes. This approach can be used for broad-level overviews, preliminary analyses, or as a supplement to other taxonomic classification or binning software. SprayNPray profiles contigs using multiple metrics, including closest homologs from a user-specified reference database, gene density, read coverage, GC content, tetranucleotide frequency, and codon-usage bias. The output from this software is designed to allow users to spot-check metagenome-assembled genomes, identify, and remove contigs from putative contaminants in isolate assemblies, identify bacteria in eukaryotic assemblies (and vice-versa), and identify possible horizontal gene transfer events.

Download Full-text

СОЦИАЛЬНАЯ КОНФЛИКТОЛОГИЯ: ТЕОРЕТИКО-ИСТОРИЧЕСКИЙ АСПЕКТ

Konfliktologia ◽

10.31312/2310-6085-2020-15-2-68-80 ◽

2020 ◽

Vol 15 (2) ◽

pp. 68

Author(s):

А. Н. Сухов

Keyword(s):

Theory And Practice ◽

Important Task ◽

Social Conflicts ◽

Psychological Approach ◽

Structure Dynamics ◽

History Of ◽

The Subject ◽

First Time

This given article reveals the topicality not only of destructive, but also of constructive, as well as hybrid conflicts. Practically it has been done for the first time. It also describes the history of the formation of both foreign and domestic social conflictology. At the same time, the chronology of the development of the latter is restored and presented objectively, in full, taking into account the contribution of those researchers who actually stood at its origins. The article deals with the essence of the socio-psychological approach to understanding conflicts. The subject of social conflictology includes the regularities of their occurrence and manifestation at various levels, spheres and conditions, including normal, complicated and extreme ones. Social conflictology includes the theory and practice of diagnosing, resolving, and resolving social conflicts. It analyzes the difficulties that occur in defining the concept, structure, dynamics, and classification of social conflicts. Therefore, it is no accident that the most important task is to create a full-fledged theory of social conflicts. Without this, it is impossible to talk about effective settlement and resolution of social conflicts. Social conflictology is an integral part of conflictology. There is still a lot of work to be done, both in theory and in application, for its complete design. At present, there is an urgent need to develop conflict-related competence not only of professionals, but also for various groups of the population.

Download Full-text

Morphological and Ultrastructural Characterization of Hemocytes in an Insect Model, the Hematophagous Dipetalogaster maxima (Hemiptera: Reduviidae)

Insects ◽

10.3390/insects12070640 ◽

2021 ◽

Vol 12 (7) ◽

pp. 640

Author(s):

Natalia R. Moyetta ◽

Fabián O. Ramos ◽

Jimena Leyria ◽

Lilián E. Canavoso ◽

Leonardo L. Fruttero

Keyword(s):

Cell Biology ◽

Cell Types ◽

Transmission Electron ◽

Ultrastructural Characterization ◽

Current Classification ◽

Contrast Microscopy ◽

First Time ◽

Controversial Aspect

Hemocytes, the cells present in the hemolymph of insects and other invertebrates, perform several physiological functions, including innate immunity. The current classification of hemocyte types is based mostly on morphological features; however, divergences have emerged among specialists in triatomines, the insect vectors of Chagas’ disease (Hemiptera: Reduviidae). Here, we have combined technical approaches in order to characterize the hemocytes from fifth instar nymphs of the triatomine Dipetalogaster maxima. Moreover, in this work we describe, for the first time, the ultrastructural features of D. maxima hemocytes. Using phase contrast microscopy of fresh preparations, five hemocyte populations were identified and further characterized by immunofluorescence, flow cytometry and transmission electron microscopy. The plasmatocytes and the granulocytes were the most abundant cell types, although prohemocytes, adipohemocytes and oenocytes were also found. This work sheds light on a controversial aspect of triatomine cell biology and physiology setting the basis for future in-depth studies directed to address hemocyte classification using non-microscopy-based markers.

Download Full-text