DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

Ecological and conservation genetic studies often use noninvasive sampling, especially with elusive or endangered species. Because microsatellites are generally short in length, they can be amplified from low quality samples such as feces. Microsatellites are highly polymorphic so few markers are enough for reliable individual identification, kinship determination, or population characterization. However, the genotyping process from feces is expensive and time consuming. Given next-generation sequencing (NGS) and recent software developments, automated microsatellite genotyping from NGS data may now be possible. These software packages infer the genotypes directly from sequence reads, increasing throughput. Here we evaluate the performance of four software packages to genotype microsatellite loci from Iberian wolf (Canis lupus) feces using NGS. We initially combined 46 markers in a single multiplex reaction for the first time, of which 19 were included in the final analyses. Megasat was the software that provided genotypes with fewer errors. Coverage over 100X provided little additional information, but a relatively high number of PCR replicates were necessary to obtain a high quality genotype from highly unoptimized, multiplexed reactions (10 replicates for 18 of the 19 loci analyzed here). This could be reduced through optimization. The use of new bioinformatic tools and next-generation sequencing data to genotype these highly informative markers may increase throughput at a reasonable cost and with a smaller amount of laboratory work. Thus, high throughput sequencing approaches could facilitate the use of microsatellites with fecal DNA to address ecological and conservation questions.

Download Full-text

Y-LineageTracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04057-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hao Chen ◽

Yan Lu ◽

Dongsheng Lu ◽

Shuhua Xu

Keyword(s):

Next Generation Sequencing ◽

Y Chromosome ◽

High Throughput ◽

Dna Analysis ◽

Analysis Framework ◽

Next Generation ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Ngs Data ◽

Generation Sequencing

Abstract Background Y-chromosome DNA (Y-DNA) has been used for tracing paternal lineages and offers a clear path from an individual to a known, or likely, direct paternal ancestor. The advance of next-generation sequencing (NGS) technologies increasingly improves the resolution of the non-recombining region of the Y-chromosome (NRY). However, a lack of suitable computer tools prevents the use of NGS data from the Y-DNA studies. Results We developed Y-LineageTracker, a high-throughput analysis framework that not only utilizes state-of-the-art methodologies to automatically determine NRY haplogroups and identify microsatellite variants of Y-chromosome on a fine scale, but also optimizes comprehensive Y-DNA analysis methods for NGS data. Notably, Y-LineageTracker integrates the NRY haplogroup and Y-STR analysis modules with recognized strategies to robustly suggest an interpretation for paternal genetics and evolution. NRY haplogroup module mainly covers haplogroup classification, clustering analysis, phylogeny construction, and divergence time estimation of NRY haplogroups, and Y-STR module mainly includes Y-STR genotyping, statistical calculation, network analysis, and estimation of time to the most recent common ancestor (TMRCA) based on Y-STR haplotypes. Performance comparison indicated that Y-LineageTracker outperformed existing Y-DNA analysis tools for the high performance and satisfactory visualization effect. Conclusions Y-LineageTracker is an open-source and user-friendly command-line tool that provide multiple functions to efficiently analyze Y-DNA from NGS data at both Y-SNP and Y-STR level. Additionally, Y-LineageTracker supports various formats of input data and produces high-quality figures suitable for publication. Y-LineageTracker is coded with Python3 and supports Windows, Linux, and macOS platforms, and can be installed manually or via the Python Package Index (PyPI). The source code, examples, and manual of Y-LineageTracker are freely available at https://www.picb.ac.cn/PGG/resource.php or CodeOcean (https://codeocean.com/capsule/7424381/tree).

Download Full-text