phenotype data
Recently Published Documents


TOTAL DOCUMENTS

212
(FIVE YEARS 79)

H-INDEX

27
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Yuening Wang ◽  
Rodrigo Benavides ◽  
Luda Diatchenko ◽  
Audrey Grant ◽  
Yue Li

Large biobank repositories of clinical conditions and medications data open opportunities to investigate the phenotypic disease network. To enable systematic investigation of entire structured phenomes, we present graph embedded topic model (GETM). We offer two main methodological contributions in GETM. First, to aid topic inference, we integrate existing biomedical knowledge graph information in the form of pre-trained graph embedding into the embedded topic model. Second, leveraging deep learning techniques, we developed a variational autoencoder framework to infer patient phenotypic mixture. For interpretability, we use a linear decoder to simultaneously infer the bi-modal distributions of the disease conditions and medications. We applied GETM to UK Biobank (UKB) self-reported clinical phenotype data, which contains conditions and medications for 457,461 individuals. Compared to existing methods, GETM demonstrates overall superior performance in imputing missing conditions and medications. Here, we focused on characterizing pain phenotypes recorded in the questionnaire of the UKB individuals. GETM accurately predicts the status of chronic musculoskeletal (CMK) pain, chronic pain by body-site, and non-specific chronic pain using past conditions and medications. Our analyses revealed not only the known pain-related topics but also the surprising predominance of medications and conditions in the cardiovascular category among the most predictive topics across chronic pain phenotypes.


2021 ◽  
Author(s):  
Tianqing Zheng ◽  
Yinghui Li ◽  
Yanfei Li ◽  
Shengrui Zhang ◽  
Chunchao Wang ◽  
...  

In Chinese National Soybean GeneBank (CNSGB), we have collected more than 30,000 soybean accessions. However, data sharing for soybean remains an especially sensitive question, and how to share the genome variations within rule frame has been bothering the soybean germplasm workers for a long time. Here we release a big data source named Soybean Functional Genomics & Breeding database (SoyFGB v2.0) (https://sfgb.rmbreeding.cn/), which embed a core collection of 2,214 soybean resequencing genome (2K-SG) from the CNSGB germplasm. This source presents a unique example which may help elucidating the following three major functions for multiple genome data mining with general interests for plant researchers. 1) On-line analysis tools are provided by the Analysis module for haplotype mining in high-throughput genotyped germplasms with different methods. 2) Variations for 2K-SG are provided in SoyFGB v2.0 by Browse module which contains two functions of SNP and InDel. Together with the Gene (SNP & InDel) function embedded in Search module, the genotypic information of 2K-SG for targeting gene / region is accessible. 3) Scaled phenotype data of 42 traits, including 9 quality and 33 quantitative traits are provided by SoyFGB v2.0. With the scaled-phenotype data search and seed request tools under a control list, the germplasm information could be shared without direct downloading the unpublished phenotypic data or information of sensitive germplasms. In a word, the mode of data mining and sharing underlies SoyFGB v2.0 may inspire more ideas for works on genome resources of not only soybean but also the other plants.


2021 ◽  
Author(s):  
Sarah M Alghamdi ◽  
Paul N Schofield ◽  
Robert Hoehndorf

Computing phenotypic similarity has been shown to be useful in identification of new disease genes and for rare disease diagnostic support. Genotype--phenotype data from orthologous genes in model organisms can compensate for lack of human data to greatly increase genome coverage. Work over the past decade has demonstrated the power of cross-species phenotype comparisons, and several cross-species phenotype ontologies have been developed for this purpose. The relative contribution of different model organisms to identifying disease-associated genes using computational approaches is not yet fully explored. We use methods based on phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in different model organisms to disease-associated phenotypes in humans. Semantic machine learning methods are used to measure how much different model organisms contribute to the identification of known human gene--disease associations. We find that only mouse phenotypes can accurately predict human gene--disease associations. Our work has implications for the future development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation.


Plants ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 2805
Author(s):  
Jing Yu ◽  
Sook Jung ◽  
Chun-Huai Cheng ◽  
Taein Lee ◽  
Ping Zheng ◽  
...  

Over the last eight years, the volume of whole genome, gene expression, SNP genotyping, and phenotype data generated by the cotton research community has exponentially increased. The efficient utilization/re-utilization of these complex and large datasets for knowledge discovery, translation, and application in crop improvement requires them to be curated, integrated with other types of data, and made available for access and analysis through efficient online search tools. Initiated in 2012, CottonGen is an online community database providing access to integrated peer-reviewed cotton genomic, genetic, and breeding data, and analysis tools. Used by cotton researchers worldwide, and managed by experts with crop-specific knowledge, it continuous to be the logical choice to integrate new data and provide necessary interfaces for information retrieval. The repository in CottonGen contains colleague, gene, genome, genotype, germplasm, map, marker, metabolite, phenotype, publication, QTL, species, transcriptome, and trait data curated by the CottonGen team. The number of data entries housed in CottonGen has increased dramatically, for example, since 2014 there has been an 18-fold increase in genes/mRNAs, a 23-fold increase in whole genomes, and a 372-fold increase in genotype data. New tools include a genetic map viewer, a genome browser, a synteny viewer, a metabolite pathways browser, sequence retrieval, BLAST, and a breeding information management system (BIMS), as well as various search pages for new data types. CottonGen serves as the home to the International Cotton Genome Initiative, managing its elections and serving as a communication and coordination hub for the community. With its extensive curation and integration of data and online tools, CottonGen will continue to facilitate utilization of its critical resources to empower research for cotton crop improvement.


2021 ◽  
Author(s):  
Malcolm E Fisher ◽  
Erik J Segerdell ◽  
Nicolas Matentzoglu ◽  
Mardi J Nenni ◽  
Joshua D Fortriede ◽  
...  

Background: Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. Results: Here we present the Xenopus Phenotype Ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. Conclusions: The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype-phenotype data that can be directly related to other uPheno compliant resources.


2021 ◽  
Author(s):  
◽  
Nuovella Williams

<p>The advent of new technology for extracting genetic information from tissue samples has increased the availability of suitable data for finding genes controlling complex traits in plants, animals and humans. Quantitative trait locus (QTL) analysis relies on statistical methods to interpret genetic data in the presence of phenotype data and possibly other factors such as environmental factors. The goal is to both detect the presence of QTL with significant effects on trait value as well as to estimate their locations on the genome relative to those of known markers. This thesis reviews commonly used statistical techniques for QTL mapping in experimental populations. Regression and likelihood methods are discussed. The mixture-modelling approach to QTL mapping is explored in some detail. This thesis presents new matrix formulas for exact and convenient calculation of both the Observed and Fisher information matrices in the context of Multinomial mixtures of Univariate Normal distributions. An extension to Composite Interval mapping is proposed, together with a hypothesis testing strategy which is robust enough to de- tect existing QTL in the presence of slight deviations from model assumptions while reducing false detections.</p>


2021 ◽  
Author(s):  
◽  
Nuovella Williams

<p>The advent of new technology for extracting genetic information from tissue samples has increased the availability of suitable data for finding genes controlling complex traits in plants, animals and humans. Quantitative trait locus (QTL) analysis relies on statistical methods to interpret genetic data in the presence of phenotype data and possibly other factors such as environmental factors. The goal is to both detect the presence of QTL with significant effects on trait value as well as to estimate their locations on the genome relative to those of known markers. This thesis reviews commonly used statistical techniques for QTL mapping in experimental populations. Regression and likelihood methods are discussed. The mixture-modelling approach to QTL mapping is explored in some detail. This thesis presents new matrix formulas for exact and convenient calculation of both the Observed and Fisher information matrices in the context of Multinomial mixtures of Univariate Normal distributions. An extension to Composite Interval mapping is proposed, together with a hypothesis testing strategy which is robust enough to de- tect existing QTL in the presence of slight deviations from model assumptions while reducing false detections.</p>


Plant Methods ◽  
2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Chen Chen ◽  
Huimin Liu ◽  
Ningning Gou ◽  
Mengzhen Huang ◽  
Wanyu Xu ◽  
...  

Abstract Background Apricot is cultivated worldwide because of its high nutritive content and strong adaptability. Its flesh is delicious and has a unique and pleasant aroma. Apricot kernel is also consumed as nuts. The genome of apricot has been sequenced, and the transcriptome, resequencing, and phenotype data have been increasely generated. However, with the emergence of new information, the data are expected to integrate, and disseminate. Results To better manage the continuous addition of new data and increase convenience, we constructed the apricot genomic and phenotypic database (AprGPD, http://apricotgpd.com). At present, AprGPD contains three reference genomes, 1692 germplasms, 306 genome resequencing data, 90 RNA sequencing data. A set of user-friendly query, analysis, and visualization tools have been implemented in AprGPD. We have also performed a detailed analysis of 59 transcription factor families for the three genomes of apricot. Conclusion Six modules are displayed in AprGPD, including species, germplasm, genome, variation, product, tools. The data integrated by AprGPD will be helpful for the molecular breeding of apricot.


2021 ◽  
Vol 5 ◽  
pp. 264
Author(s):  
Kurt Taylor ◽  
Nancy McBride ◽  
Neil J Goulding ◽  
Kimberley Burrows ◽  
Dan Mason ◽  
...  

Metabolomics is the quantification of small molecules, commonly known as metabolites. Collectively, these metabolites and their interactions within a biological system are known as the metabolome. The metabolome is a unique area of study, capturing influences from both genotype and environment. The availability of high-throughput technologies for quantifying large numbers of metabolites, as well as lipids and lipoprotein particles, has enabled detailed investigation of human metabolism in large-scale epidemiological studies. The Born in Bradford (BiB) cohort includes 12,453 women who experienced 13,776 pregnancies recruited between 2007-2011, their partners and their offspring. In this data note, we describe the metabolomic data available in BiB, profiled during pregnancy, in cord blood and during early life in the offspring. These include two platforms of metabolomic profiling: nuclear magnetic resonance and mass spectrometry. The maternal measures, taken at 26-28 weeks’ gestation, can provide insight into the metabolome during pregnancy and how it relates to maternal and offspring health. The offspring cord blood measurements provide information on the fetal metabolome. These measures, alongside maternal pregnancy measures, can be used to explore how they may influence outcomes. The infant measures (taken around ages 12 and 24 months) provide a snapshot of the early life metabolome during a key phase of nutrition, environmental exposures, growth, and development. These metabolomic data can be examined alongside the BiB cohorts’ extensive phenotype data from questionnaires, medical, educational and social record linkage, and other ‘omics data.


Sign in / Sign up

Export Citation Format

Share Document