Genetic Variation and the Distribution of Variant Types in the Horse

Frontiers in Genetics ◽

10.3389/fgene.2021.758366 ◽

2021 ◽

Vol 12 ◽

Author(s):

S. A. Durward-Akhurst ◽

R. J. Schaefer ◽

B. Grantham ◽

W. K. Carey ◽

J. R. Mickelson ◽

...

Keyword(s):

Genetic Variation ◽

Reference Genome ◽

Variant Prioritization ◽

Major Goal ◽

Variant Discovery ◽

Health And Disease ◽

Equine Diseases ◽

And Performance ◽

Disease Understanding ◽

Molecular Nature

Genetic variation is a key contributor to health and disease. Understanding the link between an individual’s genotype and the corresponding phenotype is a major goal of medical genetics. Whole genome sequencing (WGS) within and across populations enables highly efficient variant discovery and elucidation of the molecular nature of virtually all genetic variation. Here, we report the largest catalog of genetic variation for the horse, a species of importance as a model for human athletic and performance related traits, using WGS of 534 horses. We show the extent of agreement between two commonly used variant callers. In data from ten target breeds that represent major breed clusters in the domestic horse, we demonstrate the distribution of variants, their allele frequencies across breeds, and identify variants that are unique to a single breed. We investigate variants with no homozygotes that may be potential embryonic lethal variants, as well as variants present in all individuals that likely represent regions of the genome with errors, poor annotation or where the reference genome carries a variant. Finally, we show regions of the genome that have higher or lower levels of genetic variation compared to the genome average. This catalog can be used for variant prioritization for important equine diseases and traits, and to provide key information about regions of the genome where the assembly and/or annotation need to be improved.

Download Full-text

Faculty Opinions recommendation of Genetic variation in the social environment contributes to health and disease.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727239588.793551200 ◽

2018 ◽

Author(s):

Joël Bockaert

Keyword(s):

Genetic Variation ◽

Social Environment ◽

The Social ◽

Health And Disease

Download Full-text

Whole-Genome Sequencing and Characterization of Buffalo Genetic Resources: Recent Advances and Future Challenges

Animals ◽

10.3390/ani11030904 ◽

2021 ◽

Vol 11 (3) ◽

pp. 904

Author(s):

Saif ur Rehman ◽

Faiz-ul Hassan ◽

Xier Luo ◽

Zhipeng Li ◽

Qingyou Liu

Keyword(s):

Selective Breeding ◽

Reference Genome ◽

De Novo ◽

Phenotypic Diversity ◽

Molecular Data ◽

Genomic Diversity ◽

Production Performance ◽

Phylogeographic Structure ◽

Economic Significance ◽

And Performance

The buffalo was domesticated around 3000–6000 years ago and has substantial economic significance as a meat, dairy, and draught animal. The buffalo has remained underutilized in terms of the development of a well-annotated and assembled reference genome de novo. It is mandatory to explore the genetic architecture of a species to understand the biology that helps to manage its genetic variability, which is ultimately used for selective breeding and genomic selection. Morphological and molecular data have revealed that the swamp buffalo population has strong geographical genomic diversity with low gene flow but strong phenotypic consistency, while the river buffalo population has higher phenotypic diversity with a weak phylogeographic structure. The availability of recent high-quality reference genome and genotyping marker panels has invigorated many genome-based studies on evolutionary history, genetic diversity, functional elements, and performance traits. The increasing molecular knowledge syndicate with selective breeding should pave the way for genetic improvement in the climatic resilience, disease resistance, and production performance of water buffalo populations globally.

Download Full-text

Reference flow: reducing reference bias using multiple population genomes

Genome Biology ◽

10.1186/s13059-020-02229-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Nae-Chyun Chen ◽

Brad Solomon ◽

Taher Mun ◽

Sheila Iyer ◽

Ben Langmead

Keyword(s):

Genetic Variation ◽

Reference Genome ◽

Alignment Method ◽

Sequencing Data ◽

Computational Overhead ◽

Reference Flow ◽

Multiple Population ◽

Reference Bias ◽

Flow Alignment ◽

Reference Genomes

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.

Download Full-text

Enabling multiscale variation analysis with genome graphs

10.1101/2021.02.03.429603 ◽

2021 ◽

Author(s):

Brice Letcher ◽

Martin Hunt ◽

Zamin Iqbal

Keyword(s):

Genetic Variation ◽

Directed Acyclic Graph ◽

Structural Variation ◽

Reference Genome ◽

Multiple Scales ◽

State Of The Art ◽

Variant Calling ◽

Variation Analysis ◽

New Algorithms ◽

Genome Graph

AbstractBackgroundStandard approaches to characterising genetic variation revolve around mapping reads to a reference genome and describing variants in terms of differences from the reference; this is based on the assumption that these differences will be small and provides a simple coordinate system. However this fails, and the coordinates break down, when there are diverged haplotypes at a locus (e.g. one haplotype contains a multi-kilobase deletion, a second contains a few SNPs, and a third is highly diverged with hundreds of SNPs). To handle these, we need to model genetic variation that occurs at different length-scales (SNPs to large structural variants) and that occurs on alternate backgrounds. We refer to these together as multiscale variation.ResultsWe model the genome as a directed acyclic graph consisting of successive hierarchical subgraphs (“sites”) that naturally incorporate multiscale variation, and introduce an algorithm for genotyping, implemented in the software gramtools. This enables variant calling on different sequence backgrounds. In addition to producing regular VCF files, we introduce a JSON file format based on VCF, which records variant site relationships and alternate sequence backgrounds.We show two applications. First, we benchmark gramtools against existing state-of-the-art methods in joint-genotyping 17 M. tuberculosis samples at long deletions and the overlapping small variants that segregate in a cohort of 1,017 genomes. Second, in 706 African and SE Asian P. falciparum genomes, we analyse a dimorphic surface antigen gene which possesses variation on two diverged backgrounds which appeared to not recombine. This generates the first map of variation on both backgrounds, revealing patterns of recombination that were previously unknown.ConclusionsWe need new approaches to be able to jointly analyse SNP and structural variation in cohorts, and even more to handle variants on different genetic backgrounds. We have demonstrated that by modelling with a directed, acyclic and locally hierarchical genome graph, we can apply new algorithms to accurately genotype dense variation at multiple scales. We also propose a generalisation of VCF for accessing multiscale variation in genome graphs, which we hope will be of wide utility.

Download Full-text

Exploring the Role of Endothelial Cell Resilience in Cardiovascular Health and Disease

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvbaha.120.314346 ◽

2020 ◽

Author(s):

Yunling Gao ◽

Zorina S. Galis

Keyword(s):

Risk Factors ◽

Paradigm Shift ◽

Cardiovascular Health ◽

Research Effort ◽

Future Research ◽

Therapeutic Approaches ◽

New Knowledge ◽

Health And Disease ◽

Disease Understanding

Traditionally, much research effort has been invested into focusing on disease, understanding pathogenic mechanisms, identifying risk factors, and developing effective treatments. A few recent studies unraveling the basis for absence of disease, including cardiovascular disease, despite existing risk factors, a phenomenon commonly known as resilience, are adding new knowledge and suggesting novel therapeutic approaches. Given the central role of endothelial function in cardiovascular health, we herein provide a number of considerations that warrant future research and considering a paradigm shift toward identifying the molecular underpinnings of endothelial resilience.

Download Full-text

Genetic variation and performance of the alpine plant species Dianthus callizonus differ in two elevational zones of the Carpathians

Alpine Botany ◽

10.1007/s00035-016-0177-3 ◽

2016 ◽

Vol 127 (1) ◽

pp. 65-74 ◽

Cited By ~ 6

Author(s):

Anna-Rita Gabel ◽

Julia Sattler ◽

Christoph Reisch

Keyword(s):

Genetic Variation ◽

Plant Species ◽

Alpine Plant ◽

The Carpathians ◽

And Performance

Download Full-text

Genetic Variation in the Social Environment Contributes to Health and Disease

PLoS Genetics ◽

10.1371/journal.pgen.1006498 ◽

2017 ◽

Vol 13 (1) ◽

pp. e1006498 ◽

Cited By ~ 50

Author(s):

Amelie Baud ◽

Megan K. Mulligan ◽

Francesco Paolo Casale ◽

Jesse F. Ingels ◽

Casey J. Bohl ◽

...

Keyword(s):

Genetic Variation ◽

Social Environment ◽

The Social ◽

Health And Disease

Download Full-text

High-throughput single nucleotide variant discovery in E14 mouse embryonic stem cells provides a new reference genome assembly

Genomics ◽

10.1016/j.ygeno.2014.06.007 ◽

2014 ◽

Vol 104 (2) ◽

pp. 121-127 ◽

Cited By ~ 8

Author(s):

Danny Incarnato ◽

Anna Krepelova ◽

Francesco Neri

Keyword(s):

Stem Cells ◽

Embryonic Stem Cells ◽

High Throughput ◽

Genome Assembly ◽

Reference Genome ◽

Embryonic Stem ◽

Mouse Embryonic Stem Cells ◽

Single Nucleotide ◽

Variant Discovery ◽

Reference Genome Assembly

Download Full-text

Rapid low-cost assembly of the Drosophila melanogaster reference genome using low-coverage, long-read sequencing

10.1101/267401 ◽

2018 ◽

Cited By ~ 6

Author(s):

Edwin A. Solares ◽

Mahul Chakraborty ◽

Danny E. Miller ◽

Shannon Kalsow ◽

Kate Hall ◽

...

Keyword(s):

Drosophila Melanogaster ◽

Genetic Variation ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Low Cost ◽

Nucleotide Polymorphisms ◽

Structural Variants ◽

High Coverage ◽

Reference Assembly

ABSTRACTAccurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila melanogaster reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of D. melanogaster (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hours. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous de novo genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the D. melanogaster reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the D. melanogaster reference genome demonstrates that high-quality de novo assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD).

Download Full-text

Reducing reference bias using multiple population reference genomes

10.1101/2020.03.03.975219 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nae-Chyun Chen ◽

Brad Solomon ◽

Taher Mun ◽

Sheila Iyer ◽

Ben Langmead

Keyword(s):

Genetic Variation ◽

Reference Genome ◽

Alignment Method ◽

Sequencing Data ◽

Computational Overhead ◽

Reference Flow ◽

Multiple Population ◽

Reference Bias ◽

Flow Alignment ◽

Reference Genomes

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome. But failure to account for genetic variation causes reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the “reference flow” alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance, but with 14% of the memory footprint and 5.5 times the speed.

Download Full-text