Reference transcriptomes and comparative analyses of six species in the threatened rosewood genus Dalbergia

Abstract Dalbergia is a pantropical genus with more than 250 species, many of which are highly threatened due to overexploitation for their rosewood timber, along with general deforestation. Many Dalbergia species have received international attention for conservation, but the lack of genomic resources for Dalbergia hinders evolutionary studies and conservation applications, which are important for adaptive management. This study produced the first reference transcriptomes for 6 Dalbergia species with different geographical origins and predicted ~ 32 to 49 K unique genes. We showed the utility of these transcriptomes by phylogenomic analyses with other Fabaceae species, estimating the divergence time of extant Dalbergia species to ~ 14.78 MYA. We detected over-representation in 13 Pfam terms including HSP, ALDH and ubiquitin families in Dalbergia. We also compared the gene families of geographically co-occurring D. cochinchinensis and D. oliveri and observed that more genes underwent positive selection and there were more diverged disease resistance proteins in the more widely distributed D. oliveri, consistent with reports that it occupies a wider ecological niche and has higher genetic diversity. We anticipate that the reference transcriptomes will facilitate future population genomics and gene-environment association studies on Dalbergia, as well as contributing to the genomic database where plants, particularly threatened ones, are currently underrepresented.

Download Full-text

Taxus yunnanensis genome offers insights into gymnosperm phylogeny and taxol production

Communications Biology ◽

10.1038/s42003-021-02697-8 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Chi Song ◽

Fangfang Fu ◽

Lulu Yang ◽

Yan Niu ◽

Zhaoyang Tian ◽

...

Keyword(s):

Biosynthetic Pathway ◽

Repetitive Sequences ◽

Divergence Time ◽

Gene Families ◽

Long Terminal Repeat Retrotransposons ◽

Taxus Yunnanensis ◽

Sequoiadendron Giganteum ◽

Large Genome Size ◽

Taxol Production ◽

Phylogenomic Analyses

AbstractTaxol, a natural product derived from Taxus, is one of the most effective natural anticancer drugs and the biosynthetic pathway of Taxol is the basis of heterologous bio-production. Here, we report a high-quality genome assembly and annotation of Taxus yunnanensis based on 10.7 Gb sequences assembled into 12 chromosomes with contig N50 and scaffold N50 of 2.89 Mb and 966.80 Mb, respectively. Phylogenomic analyses show that T. yunnanensis is most closely related to Sequoiadendron giganteum among the sampled taxa, with an estimated divergence time of 133.4−213.0 MYA. As with most gymnosperms, and unlike most angiosperms, there is no evidence of a recent whole-genome duplication in T. yunnanensis. Repetitive sequences, especially long terminal repeat retrotransposons, are prevalent in the T. yunnanensis genome, contributing to its large genome size. We further integrated genomic and transcriptomic data to unveil clusters of genes involved in Taxol synthesis, located on the chromosome 12, while gene families encoding hydroxylase in the Taxol pathway exhibited significant expansion. Our study contributes to the further elucidation of gymnosperm relationships and the Taxol biosynthetic pathway.

Download Full-text

Estimating the effective sample size in association studies of quantitative traits

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab057 ◽

2021 ◽

Author(s):

Andrey Ziyatdinov ◽

Jihye Kim ◽

Dmitry Prokopenko ◽

Florian Privé ◽

Fabien Laporte ◽

...

Keyword(s):

Statistical Power ◽

Quantitative Traits ◽

Mixed Model ◽

Association Studies ◽

Effective Sample Size ◽

Environment Interaction ◽

Uk Biobank ◽

Gene Environment Interaction ◽

Gene Environment ◽

The Uk

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.

Download Full-text

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Download Full-text

Investigation of gene–environment interactions in relation to tic severity

Journal of Neural Transmission ◽

10.1007/s00702-021-02396-y ◽

2021 ◽

Author(s):

Mohamed Abdulkadir ◽

Dongmei Yu ◽

Lisa Osiecki ◽

Robert A. King ◽

Thomas V. Fernandez ◽

...

Keyword(s):

Tourette Syndrome ◽

Association Studies ◽

Autism Spectrum ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Linear Regression Models ◽

Compulsive Disorder ◽

Gene Environment ◽

Tic Severity

AbstractTourette syndrome (TS) is a neuropsychiatric disorder with involvement of genetic and environmental factors. We investigated genetic loci previously implicated in Tourette syndrome and associated disorders in interaction with pre- and perinatal adversity in relation to tic severity using a case-only (N = 518) design. We assessed 98 single-nucleotide polymorphisms (SNPs) selected from (I) top SNPs from genome-wide association studies (GWASs) of TS; (II) top SNPs from GWASs of obsessive–compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), and autism spectrum disorder (ASD); (III) SNPs previously implicated in candidate-gene studies of TS; (IV) SNPs previously implicated in OCD or ASD; and (V) tagging SNPs in neurotransmitter-related candidate genes. Linear regression models were used to examine the main effects of the SNPs on tic severity, and the interaction effect of these SNPs with a cumulative pre- and perinatal adversity score. Replication was sought for SNPs that met the threshold of significance (after correcting for multiple testing) in a replication sample (N = 678). One SNP (rs7123010), previously implicated in a TS meta-analysis, was significantly related to higher tic severity. We found a gene–environment interaction for rs6539267, another top TS GWAS SNP. These findings were not independently replicated. Our study highlights the future potential of TS GWAS top hits in gene–environment studies.

Download Full-text

Comparative Analysis of SNP Discovery and Genotyping in Fagus sylvatica L. and Quercus robur L. Using RADseq, GBS, and ddRAD Methods

Forests ◽

10.3390/f12020222 ◽

2021 ◽

Vol 12 (2) ◽

pp. 222

Author(s):

Bartosz Ulaszewski ◽

Joanna Meger ◽

Jaroslaw Burczyk

Keyword(s):

Population Genomics ◽

De Novo ◽

Genetic Studies ◽

Genomic Libraries ◽

Reduced Representation ◽

Large Numbers ◽

Broadleaved Tree Species ◽

Fagus Sylvatica L ◽

Reference Genomes ◽

Future Population

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.

Download Full-text

Population Genomics of American Mink Using Whole Genome Sequencing Data

Genes ◽

10.3390/genes12020258 ◽

2021 ◽

Vol 12 (2) ◽

pp. 258

Author(s):

Karim Karimi ◽

Duy Ngoc Do ◽

Mehdi Sargolzaei ◽

Younes Miar

Keyword(s):

Population Genomics ◽

Association Studies ◽

American Mink ◽

Population History ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Effective Population ◽

Cross Validation Error

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.

Download Full-text

Gene, Environment and Methylation (GEM): a tool suite to efficiently navigate large scale epigenome wide association studies and integrate genotype and interaction between genotype and environment

BMC Bioinformatics ◽

10.1186/s12859-016-1161-z ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 12

Author(s):

Hong Pan ◽

Joanna D. Holbrook ◽

Neerja Karnani ◽

Chee Keong Kwoh

Keyword(s):

Large Scale ◽

Association Studies ◽

Gene Environment

Download Full-text

Gene-Gene and Gene-Environment Interactions in Meta-Analysis of Genetic Association Studies

PLoS ONE ◽

10.1371/journal.pone.0124967 ◽

2015 ◽

Vol 10 (4) ◽

pp. e0124967 ◽

Cited By ~ 8

Author(s):

Chin Lin ◽

Chi-Ming Chu ◽

John Lin ◽

Hsin-Yi Yang ◽

Sui-Lung Su

Keyword(s):

Genetic Association ◽

Association Studies ◽

Meta Analysis ◽

Genetic Association Studies ◽

Gene Environment

Download Full-text

Complex Traits in Natural Populations

A Primer of Population Genetics and Genomics ◽

10.1093/oso/9780198862291.003.0009 ◽

2020 ◽

pp. 263-290

Author(s):

Daniel L. Hartl

Keyword(s):

Complex Traits ◽

Population Genomics ◽

Association Studies ◽

Natural Populations ◽

Complex Diseases ◽

Stabilizing Selection ◽

Phenotypic Evolution ◽

Genetic Changes ◽

Number Of Genes ◽

Almost All

This chapter could as well be titled “Population Genomics,” although many aspects of population genomics are integrated throughout the other chapters. It includes estimates of mutational variance and standing variance, phenotypic evolution under directional selection as measured by the linear selection gradient, and phenotypic evolution under stabilizing selection. It explores the strengths and limitations of genome-wide association studies of quantitative trait loci (QTLs) and expression (eQTLs) to detect genetic influencing complex traits in natural populations and genetic risk factors for complex diseases such as heart disease or diabetes. The number of genes affecting complex traits is considered, as well as evidence bearing on the issue of whether complex diseases are primarily affected by a very large number of genes, almost all of small effect, and how this bears on direct-to-consumer and over-the-counter genetic testing. The population genomics of adaptation is considered, including drug resistance, domestication, and local selection versus gene flow. The chapter concludes with the population genomics of speciation as illustrated by reinforcement of mating barriers, the reproducibility of phenotypic and genetic changes, and the accumulation of genetic incompatibilities.

Download Full-text

The First Transcriptome Assembly of Yenyuan Stream Salamander (Batrachuperus yenyuanensis) Provides Novel Insights into Its Molecular Evolution

International Journal of Molecular Sciences ◽

10.3390/ijms20071529 ◽

2019 ◽

Vol 20 (7) ◽

pp. 1529 ◽

Cited By ~ 2

Author(s):

Jianli Xiong ◽

Yunyun Lv ◽

Yong Huang ◽

Qiangqiang Liu

Keyword(s):

Sequence Similarity ◽

Transcriptome Assembly ◽

Repetitive Sequences ◽

Divergence Time ◽

Gene Families ◽

Single Copy ◽

Phylogenetic Position ◽

Interesting Insight ◽

Chinese Giant Salamander ◽

Lineage Divergence

The Yenyuan stream salamander (Batrachuperus yenyuanensis) has been previously evaluated with regards to phylogeny, population genetics, and hematology, but genomic information is sparse due to the giant genome size of salamanders which contain highly repetitive sequences, thus resulting in the lack of a complete reference genome. This study evaluates the encoding genetic sequences and provides the first transcriptome assembly of Yenyuan stream salamander based on mixed samples from the liver, spermary, muscle and spleen tissues. Using this transcriptome assembly and available encoding sequences from other vertebrates, the gene families, phylogenetic status, and species divergence time were compared or estimated. A total of 13,750 encoding sequences were successfully obtained from the transcriptome assembly of Yenyuan stream salamander, estimated to contain 40.1% of the unigenes represented in tetrapod databases. A total of 88.79% of these genes could be annotated to a biological function by current databases. Through gene family clustering, we found multiple possible isoforms of the Scribble gene—whose function is related to regeneration—based on sequence similarity. Meanwhile, we constructed a robust phylogenetic tree based on 56 single-copy orthologues, which indicates that based on phylogenetic position, the Yenyuan stream salamander presents the closest relationship with the Chinese giant salamander (Andrias davidianus) of the investigated vertebrates. Based on the fossil-calibrated phylogeny, we estimated that the lineage divergence between the ancestral Yenyuan stream salamander and the Chinese giant salamander may have occurred during the Cretaceous period (~78.4 million years ago). In conclusion, this study not only provides a candidate gene that is valuable for exploring the remarkable capacity of regeneration in the future, but also gives an interesting insight into the understanding of Yenyuan stream salamander by this first transcriptome assembly.

Download Full-text