snp calling
Recently Published Documents


TOTAL DOCUMENTS

75
(FIVE YEARS 30)

H-INDEX

18
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Zhenxian Zheng ◽  
Shumin Li ◽  
Junhao Su ◽  
Amy Wing-Sze Leung ◽  
Tak-Wah Lam ◽  
...  

Deep learning-based variant callers are becoming the standard and have achieved superior SNP calling performance using long reads. In this paper, we present Clair3, which makes the best of two major method categories: pile-up calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 ran faster than any of the other state-of-the-art variant callers and performed the best, especially at lower coverage.


Author(s):  
Merlijn H.I. van Haren ◽  
Theun de Groot ◽  
Bram Spruijtenburg ◽  
Kusum Jain ◽  
Anuradha Chowdhary ◽  
...  

Candida krusei is a human pathogenic yeast that can cause candidemia with the lowest 90-day survival rate in comparison to other Candida species. Infections occur frequently in immunocompromised patients and several C. krusei outbreaks in health care facilities have been described. Here, we developed a short tandem repeat (STR) typing scheme for C. krusei to allow for fast and cost-effective genotyping of an outbreak and compared identified relatedness of ten isolates to SNP calling from whole-genome sequencing (WGS). From a selection of 14 novel STR markers, six were used to develop two multiplex PCRs. Additionally, three previously reported markers were selected for a third multiplex PCR. In total, 119 C. krusei isolates were typed using these nine markers and 79 different genotypes were found. STR typing correlated well with WGS SNP typing, as isolates with the same STR genotype varied by 8 and 19 SNPs, while isolates that differed in all STR markers varied at least tens of thousands of SNPs. The STR typing assay was found to be specific for C. krusei , stable in 100 subcloned generations, and comparable to SNP calling by WGS. In summary, this newly developed C. krusei STR typing scheme is a fast, reliable, easy-to-interpret and cost-effective method compared to other typing methods. Moreover, the two newly developed multiplexes showed the same discriminatory power as all nine markers combined, indicating that multiplexes M3-1 and M9 are sufficient to type C. krusei .


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S497-S498
Author(s):  
Mohamad Sater ◽  
Remy Schwab ◽  
Ian Herriott ◽  
Tim Farrell ◽  
Miriam Huntley

Abstract Background Healthcare associated infections (HAIs) are a major contributor to patient morbidity and mortality worldwide. HAIs are increasingly important due to the rise of multidrug resistant pathogens which can lead to deadly nosocomial outbreaks. Current methods for investigating transmissions are slow, costly, or have poor detection resolution. A rapid, cost-effective and high-resolution method to identify transmission events is imperative to guide infection control. Whole genome sequencing of infecting pathogens paired with a single nucleotide polymorphism (SNP) analysis can provide high-resolution clonality determination, yet these methods typically have long turnaround times. Here we examined the utility of the Oxford Nanopore Technologies (ONT) platform, a rapid sequencing technology, for whole genome sequencing based transmission analysis. Methods We developed a SNP calling pipeline customized for ONT data, which exhibit higher sequencing error rates and can therefore be challenging for transmission analysis. The pipeline leverages the latest basecalling tools as well as a suite of custom variant calling and filtering algorithms to achieve highest accuracy in clonality calls compared to Illumina-based sequencing. We also capitalize on ONT long reads by assembling outbreak-specific genomes in order to overcome the need for an external reference genome. Results We examined 20 bacterial isolates from 5 HAI investigations previously performed at Day Zero Diagnostics as part of epiXact®, our commercialized Illumina-based HAI sequencing and analysis service. Using the ONT data and pipeline, we achieved greater than 90% SNP-calling sensitivity and precision, allowing 100% accuracy of clonality classification compared to Illumina-based results across common HAI species. We demonstrate the validity and increased resolution of our SNP analysis pipeline using assembled genomes from each outbreak. We also demonstrate that this ONT-based workflow can produce isolate to transmission determination (i.e. including WGS and analysis) in less than 24 hours. SNP calling performance ONT-based SNP calling sensitivity and precision compared to Illumina-based pipeline Conclusion We demonstrate the utility of ONT for HAI investigation, establishing the potential to transform healthcare epidemiology with same-day high-resolution transmission determination. Disclosures Mohamad Sater, PhD, Day Zero Diagnostics (Employee, Shareholder) Remy Schwab, MSc, Day Zero Diagnostics (Employee, Shareholder) Ian Herriott, BS, Day Zero Diagnostics (Employee, Shareholder) Tim Farrell, MS, Day Zero Diagnostics, Inc. (Employee, Shareholder) Miriam Huntley, PhD, Day Zero Diagnostics (Employee, Shareholder)


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dan Wang ◽  
Liu Yang ◽  
Chao Ning ◽  
Jian-Feng Liu ◽  
Xingbo Zhao

Abstract Background Reference sequences play a vital role in next-generation sequencing (NGS), impacting mapping quality during genome analyses. However, reference genomes usually do not represent the full range of genetic diversity of a species as a result of geographical divergence and independent demographic events of different populations. For the mitochondrial genome (mitogenome), which occurs in high copy numbers in cells and is strictly maternally inherited, an optimal reference sequence has the potential to make mitogenome alignment both more accurate and more efficient. In this study, we used three different types of reference sequences for mitogenome mapping, i.e., the commonly used reference sequence (CU-ref), the breed-specific reference sequence (BS-ref) and the sample-specific reference sequence (SS-ref), respectively, and compared the accuracy of mitogenome alignment and SNP calling among them, for the purpose of proposing the optimal reference sequence for mitochondrial DNA (mtDNA) analyses of specific populations Results Four pigs, representing three different breeds, were high-throughput sequenced, subsequently mapping reads to the reference sequences mentioned above, resulting in a largest mapping ratio and a deepest coverage without increased running time when aligning reads to a BS-ref. Next, single nucleotide polymorphism (SNP) calling was carried out by 18 detection strategies with the three tools SAMtools, VarScan and GATK with different parameters, using the bam results mapping to BS-ref. The results showed that all eighteen strategies achieved the same high specificity and sensitivity, which suggested a high accuracy of mitogenome alignment by the BS-ref because of a low requirement for SNP calling tools and parameter choices. Conclusions This study showed that different reference sequences representing different genetic relationships to sample reads influenced mitogenome alignment, with the breed-specific reference sequences being optimal for mitogenome analyses, which provides a refined processing perspective for NGS data.


Author(s):  
Russ Jasper ◽  
Tegan Krista McDonald ◽  
Pooja Singh ◽  
Mengmeng Lu ◽  
Clément Rougeux ◽  
...  

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.


Author(s):  
Russ Jasper ◽  
Tegan Krista McDonald ◽  
Pooja Singh ◽  
Menhmeng Lu ◽  
Clément Rougeux ◽  
...  

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.


2021 ◽  
Vol 22 (19) ◽  
pp. 10300
Author(s):  
Tomasz Mamos ◽  
Michał Grabowski ◽  
Tomasz Rewicz ◽  
Jamie Bojko ◽  
Dominik Strapagiel ◽  
...  

The Ponto-Caspian region is the main donor of invasive amphipods to freshwater ecosystems, with at least 13 species successfully established in European inland waters. Dikerogammarus spp. and Pontogammarus robustoides are among the most successful, due to their strong invasive impact on local biota. However, genomic knowledge about these invaders is scarce, while phylogeography and population genetics have been based on short fragments of mitochondrial markers or nuclear microsatellites. In this study, we provide: (i) a reconstruction of six mitogenomes for four invasive gammarids (D. villosus, D. haemobaphes, D. bispinosus, and P. robustoides); (ii) a comparison between the structure of the newly obtained mitogenomes and those from the literature; (iii) SNP calling rates for individual D. villosus and D. haemobaphes from different invasion sites across Europe; and (iv) the first time-calibrated full mitogenome phylogeny reconstruction of several Ponto-Caspian taxa. We found that, in comparison to other gammarids, the mitogenomes of Ponto-Caspian species show a translocation between the tRNA-E and tRNA-R positions. Phylogenetic reconstruction using the mitogenomes identified that Ponto-Caspian gammarids form a well-supported group that originated in the Miocene. Our study supports paraphyly in the family Gammaridae. These provided mitogenomes will serve as vital genetic resources for the development of new markers for PCR-based identification methods and demographic studies.


2021 ◽  
Author(s):  
Maria Luigi-Sierra ◽  
Joaquim Casellas ◽  
Amparo Martinez ◽  
Juan Vicente Delgado ◽  
Javier Fernandez Alvarez ◽  
...  

Transmission ratio distortion (TRD) is the preferential transmission of one specific allele to offspring at the expense of the other one. The existence of TRD is mostly explained by the segregation of genetic variants with deleterious effects on the developmental processes that go from the formation of gametes to fecundation and birth. A few years ago, a statistical methodology was implemented in order to detect TRD signals on a genome-wide scale as a first step to uncover the biological basis of TRD and reproductive success in domestic species. In the current work, we have analyzed the impact of SNP calling quality on the detection of TRD signals in a population of Murciano-Granadina goats. Seventeen bucks and their offspring (N=288) were typed with the Goat SNP50 BeadChip, while the genotypes of the dams were lacking. Performance of a genome-wide scan revealed the existence of 36 SNPs showing significant evidence of TRD. When we calculated GenTrain scores for each one of the SNPs, we observed that 25 SNPs showed scores below 0.8. The allele frequencies of these SNPs in the offspring were not correlated with the allele frequencies estimated in the dams with statistical methods, thus evidencing that flawed SNP calling quality might lead to the detection of spurious TRD signals. We conclude that, when performing TRD scans, the GenTrain scores of markers should be taken into account to discriminate SNPs that are truly under TRD from those yielding spurious signals due to technical problems.


Author(s):  
Fereshteh Shahoveisi ◽  
Atena Oladzad ◽  
Luis E. del Rio Mendoza ◽  
Seyedali Hosseinirad ◽  
Susan Ruud ◽  
...  

The polyploid nature of canola (Brassica napus) represents a challenge for the accurate identification of single nucleotide polymorphisms (SNPs) and the detection of quantitative trait loci (QTL). In this study, combinations of eight phenotyping scoring systems and six SNP calling and filtering parameters were evaluated for their efficiency in detection of QTL associated with response to Sclerotinia stem rot, caused by Sclerotinia sclerotiorum, in two doubled haploid (DH) canola mapping populations. Most QTL were detected in lesion length, relative areas under the disease progress curve (rAUDPC) for lesion length, and binomial-plant mortality data sets. Binomial data derived from lesion size were less efficient in QTL detection. Inclusion of additional phenotypic sets to the analysis increased the numbers of significant QTL by 2.3-fold; however, the continuous data sets were more efficient. Between two filtering parameters used to analyze genotyping by sequencing (GBS) data, imputation of missing data increased QTL detection in one population with a high level of missing data but not in the other. Inclusion of segregation-distorted SNPs increased QTL detection but did not impact their R2 values significantly. Twelve of the 16 detected QTL were on chromosomes A02 and C01, and the rest were on A07, A09, and C03. Marker A02-7594120, associated with a QTL on chromosome A02 was detected in both populations. Results of this study suggest the impact of genotypic variant calling and filtering parameters may be population dependent while deriving additional phenotyping scoring systems such as rAUDPC datasets and mortality binary may improve QTL detection efficiency.


Sign in / Sign up

Export Citation Format

Share Document