scholarly journals Customized de novo mutation detection for any variant calling pipeline: SynthDNM

Author(s):  
Aojie Lian ◽  
James Guevara ◽  
Kun Xia ◽  
Jonathan Sebat

AbstractMotivationAs sequencing technologies and analysis pipelines evolve, DNM calling tools must be adapted. Therefore, a flexible approach is needed that can accurately identify de novo mutations from genome or exome sequences from a variety of datasets and variant calling pipelines.ResultsHere, we describe SynthDNM, a random-forest based classifier that can be readily adapted to new sequencing or variant-calling pipelines by applying a flexible approach to constructing simulated training examples from real data. The optimized SynthDNM classifiers predict de novo SNPs and indels with robust accuracy across multiple methods of variant calling.AvailabilitySynthDNM is freely available on Github (https://github.com/james-guevara/synthdnm)[email protected] informationSupplementary data are available at Bioinformatics online.

Author(s):  
Aojie Lian ◽  
James Guevara ◽  
Kun Xia ◽  
Jonathan Sebat

Abstract Motivation As sequencing technologies and analysis pipelines evolve, de novo mutation (DNM) calling tools must be adapted. Therefore, a flexible approach is needed that can accurately identify DNMs from genome or exome sequences from a variety of datasets and variant calling pipelines. Results Here, we describe SynthDNM, a random-forest based classifier that can be readily adapted to new sequencing or variant-calling pipelines by applying a flexible approach to constructing simulated training examples from real data. The optimized SynthDNM classifiers predict de novo SNPs and indels with robust accuracy across multiple methods of variant calling. Availabilityand implementation SynthDNM is freely available on Github (https://github.com/james-guevara/synthdnm). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3242-3243 ◽  
Author(s):  
Samuel O’Donnell ◽  
Gilles Fischer

Abstract Summary MUM&Co is a single bash script to detect structural variations (SVs) utilizing whole-genome alignment (WGA). Using MUMmer’s nucmer alignment, MUM&Co can detect insertions, deletions, tandem duplications, inversions and translocations greater than 50 bp. Its versatility depends upon the WGA and therefore benefits from contiguous de-novo assemblies generated by third generation sequencing technologies. Benchmarked against five WGA SV-calling tools, MUM&Co outperforms all tools on simulated SVs in yeast, plant and human genomes and performs similarly in two real human datasets. Additionally, MUM&Co is particularly unique in its ability to find inversions in both simulated and real datasets. Lastly, MUM&Co’s primary output is an intuitive tabulated file containing a list of SVs with only necessary genomic details. Availability and implementation https://github.com/SAMtoBAM/MUMandCo. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Krisztian Buza ◽  
Bartek Wilczynski ◽  
Norbert Dojer

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used.Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge.Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.


2018 ◽  
Author(s):  
Adrian Fritz ◽  
Peter Hofmann ◽  
Stephan Majda ◽  
Eik Dahms ◽  
Johannes Dröge ◽  
...  

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Here, we describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series and differential abundance studies, includes real and simulated strain-level diversity, and generates second and third generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with truth standards for method evaluation. All data sets and the software are freely available at: https://github.com/CAMI-challenge/CAMISIM


Brain ◽  
2020 ◽  
Vol 143 (8) ◽  
pp. 2380-2387 ◽  
Author(s):  
Alisdair McNeill ◽  
Emanuela Iovino ◽  
Luke Mansard ◽  
Christel Vache ◽  
David Baux ◽  
...  

Abstract The SLC12 gene family consists of SLC12A1–SLC12A9, encoding electroneutral cation-coupled chloride co-transporters. SCL12A2 has been shown to play a role in corticogenesis and therefore represents a strong candidate neurodevelopmental disorder gene. Through trio exome sequencing we identified de novo mutations in SLC12A2 in six children with neurodevelopmental disorders. All had developmental delay or intellectual disability ranging from mild to severe. Two had sensorineural deafness. We also identified SLC12A2 variants in three individuals with non-syndromic bilateral sensorineural hearing loss and vestibular areflexia. The SLC12A2 de novo mutation rate was demonstrated to be significantly elevated in the deciphering developmental disorders cohort. All tested variants were shown to reduce co-transporter function in Xenopus laevis oocytes. Analysis of SLC12A2 expression in foetal brain at 16–18 weeks post-conception revealed high expression in radial glial cells, compatible with a role in neurogenesis. Gene co-expression analysis in cells robustly expressing SLC12A2 at 16–18 weeks post-conception identified a transcriptomic programme associated with active neurogenesis. We identify SLC12A2 de novo mutations as the cause of a novel neurodevelopmental disorder and bilateral non-syndromic sensorineural hearing loss and provide further data supporting a role for this gene in human neurodevelopment.


Weed Science ◽  
2019 ◽  
Vol 67 (4) ◽  
pp. 361-368 ◽  
Author(s):  
Federico A. Casale ◽  
Darci A. Giacomini ◽  
Patrick J. Tranel

AbstractIn a predictable natural selection process, herbicides select for adaptive alleles that allow weed populations to survive. These resistance alleles may be available immediately from the standing genetic variation within the population or may arise from immigration via pollen or seeds from other populations. Moreover, because all populations are constantly generating new mutant genotypes by de novo mutations, resistant mutants may arise spontaneously in any herbicide-sensitive weed population. Recognizing that the relative contribution of each of these three sources of resistance alleles influences what strategies should be applied to counteract herbicide-resistance evolution, we aimed to add experimental information to the resistance evolutionary framework. Specifically, the objectives of this experiment were to determine the de novo mutation rate conferring herbicide resistance in a natural plant population and to test the hypothesis that the mutation rate increases when plants are stressed by sublethal herbicide exposure. We used grain amaranth (Amaranthus hypochondriacus L.) and resistance to acetolactate synthase (ALS)-inhibiting herbicides as a model system to discover spontaneous herbicide-resistant mutants. After screening 70.8 million plants, however, we detected no spontaneous resistant genotypes, indicating the probability of finding a spontaneous ALS-resistant mutant in a given sensitive population is lower than 1.4 × 10−8. This empirically determined upper limit is lower than expected from theoretical calculations based on previous studies. We found no evidence that herbicide stress increased the mutation rate, but were not able to robustly test this hypothesis. The results found in this study indicate that de novo mutations conferring herbicide resistance might occur at lower frequencies than previously expected.


Author(s):  
Yuansheng Liu ◽  
Xiaocai Zhang ◽  
Quan Zou ◽  
Xiangxiang Zeng

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 96 (2) ◽  
pp. 179-185 ◽  
Author(s):  
K.D. Khandelwal ◽  
N. Ishorst ◽  
H. Zhou ◽  
K.U. Ludwig ◽  
H. Venselaar ◽  
...  

Common variants in interferon regulatory factor 6 ( IRF6) have been associated with nonsyndromic cleft lip with or without cleft palate (NSCL/P) as well as with tooth agenesis (TA). These variants contribute a small risk towards the 2 congenital conditions and explain only a small percentage of heritability. On the other hand, many IRF6 mutations are known to be a monogenic cause of disease for syndromic orofacial clefting (OFC). We hypothesize that IRF6 mutations in some rare instances could also cause nonsyndromic OFC. To find novel rare variants in IRF6 responsible for nonsyndromic OFC and TA, we performed targeted multiplex sequencing using molecular inversion probes (MIPs) in 1,072 OFC patients, 67 TA patients, and 706 controls. We identified 3 potentially pathogenic de novo mutations in OFC patients. In addition, 3 rare missense variants were identified, for which pathogenicity could not unequivocally be shown, as all variants were either inherited from an unaffected parent or the parental DNA was not available. Retrospective investigation of the patients with these variants revealed the presence of lip pits in one of the patients with a de novo mutation suggesting a Van der Woude syndrome (VWS) phenotype, whereas, in other patients, no lip pits were identified.


2017 ◽  
Author(s):  
Sebastian Deorowicz ◽  
Agnieszka Debudaj-Grabysz ◽  
Adam Gudyś ◽  
Szymon Grabowski

AbstractMotivationMapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. Mistakes made at this computationally challenging stage cannot be recovered easily.ResultsWe present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known Bowtie2 and BWA-MEM tools at a comparable accuracy (validated in variant calling pipeline).AvailabilityWhisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/[email protected] informationSupplementary data are available at publisher Web site.


2020 ◽  
Author(s):  
Colin M Brand ◽  
Frances J White ◽  
Nelson Ting ◽  
Timothy H Webster

Two modes of positive selection have been recognized: 1) hard sweeps that result in the rapid fixation of a beneficial allele typically from a de novo mutation and 2) soft sweeps that are characterized by intermediate frequencies of at least two haplotypes that stem from standing genetic variation or recurrent de novo mutations. While many populations exhibit both hard and soft sweeps throughout the genome, there is increasing evidence that soft sweeps, rather than hard sweeps, are the predominant mode of adaptation in many species, including humans. Here, we use a supervised machine learning approach to assess the extent of hard and soft sweeps in the closest living relatives of humans: bonobos and chimpanzees (genus Pan). We trained convolutional neural network classifiers using simulated data and applied these classifiers to population genomic data for 71 individuals representing all five extant Pan lineages, of which we successfully analyzed 60 individuals from four lineages. We found that recent adaptation in Pan is largely the result of soft sweeps, ranging from 73.1 to 97.7% of all identified sweeps. While few hard sweeps were shared among lineages, we found that between 19 and 267 soft sweep windows were shared by at least two lineages. We also identify novel candidate genes subject to recent positive selection. This study emphasizes the importance of shifts in the physical and social environment, rather than novel mutation, in shaping recent adaptations in bonobos and chimpanzees.


Sign in / Sign up

Export Citation Format

Share Document