scholarly journals Estimation of dynamic SNP-heritability with Bayesian Gaussian process models

2020 ◽  
Vol 36 (12) ◽  
pp. 3795-3802
Author(s):  
Arttu Arjas ◽  
Andreas Hauptmann ◽  
Mikko J Sillanpää

Abstract Motivation Improved DNA technology has made it practical to estimate single-nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth- and development-related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. Results We introduce a completely tuning-free Bayesian Gaussian process (GP)-based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo method which allows full uncertainty quantification. Several datasets are analysed and our results clearly illustrate that the 95% credible intervals of the proposed joint estimation method (which ‘borrows strength’ from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 software and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. Availability and implementation The C++ implementation dynBGP and simulated data are available in GitHub: https://github.com/aarjas/dynBGP. The programmes can be run in R. Real datasets are available in QTL archive: https://phenome.jax.org/centers/QTLA. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
Yanwen Xu ◽  
Pingfeng Wang

Abstract The Gaussian Process (GP) model has become one of the most popular methods and exhibits superior performance among surrogate models in many engineering design applications. However, the standard Gaussian process model is not able to deal with high dimensional applications. The root of the problem comes from the similarity measurements of the GP model that relies on the Euclidean distance, which becomes uninformative in the high-dimensional cases, and causes accuracy and efficiency issues. Limited studies explore this issue. In this study, thereby, we propose an enhanced squared exponential kernel using Manhattan distance that is more effective at preserving the meaningfulness of proximity measures and preferred to be used in the GP model for high-dimensional cases. The experiments show that the proposed approach has obtained a superior performance in high-dimensional problems. Based on the analysis and experimental results of similarity metrics, a guide to choosing the desirable similarity measures which result in the most accurate and efficient results for the Kriging model with respect to different sample sizes and dimension levels is provided in this paper.


2019 ◽  
Vol 35 (23) ◽  
pp. 4955-4961
Author(s):  
Yongzhuang Liu ◽  
Jian Liu ◽  
Yadong Wang

Abstract Motivation Whole-genome sequencing (WGS) of tumor–normal sample pairs is a powerful approach for comprehensively characterizing germline copy number variations (CNVs) and somatic copy number alterations (SCNAs) in cancer research and clinical practice. Existing computational approaches for detecting copy number events cannot detect germline CNVs and SCNAs simultaneously, and yield low accuracy for SCNAs. Results In this study, we developed TumorCNV, a novel approach for jointly detecting germline CNVs and SCNAs from WGS data of the matched tumor–normal sample pair. We compared TumorCNV with existing copy number event detection approaches using the simulated data and real data for the COLO-829 melanoma cell line. The experimental results showed that TumorCNV achieved superior performance than existing approaches. Availability and implementation The software TumorCNV is implemented using a combination of Java and R, and it is freely available from the website at https://github.com/yongzhuang/TumorCNV. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Adrian L Hauber ◽  
Raphael Engesser ◽  
Joep Vanlier ◽  
Jens Timmer

Abstract Motivation Apparent time delays in partly observed, biochemical reaction networks can be modeled by lumping a more complex reaction into a series of linear reactions often referred to as the linear chain trick. Since most delays in biochemical reactions are no true, hard delays but a consequence of complex unobserved processes, this approach often more closely represents the true system compared to delay differential equations. In this paper, we address the question of how to select the optimal number of additional equations, i.e. the chain length. Results We derive a criterion based on parameter identifiability to infer chain lengths and compare this method to choosing the model with a chain length that leads to the best fit in a maximum likelihood sense, which corresponds to optimising the Bayesian information criterion. We evaluate performance with simulated data as well as with measured biological data for a model of JAK2/STAT5 signalling and access the influence of different model structures and data characteristics. Our analysis revealed that the proposed method features a superior performance when applied to biological models and data compared to choosing the model that maximises the likelihood. Availability Models and data used for simulations are available at https://github.com/Data2Dynamics/d2d and http://jeti.uni-freiburg.de/PNAS_Swameye_Data. Supplementary information Supplementary data are available at Bioinformatics online.


2007 ◽  
Vol 145 (5) ◽  
pp. 501-508 ◽  
Author(s):  
D. VAGENAS ◽  
I. M. S. WHITE ◽  
M. J. STEAR ◽  
S. C. BISHOP

SUMMARYThe development of the genetic control of nematode resistance in growing lambs is of biological interest, as well as being important in terms of designing practical strategies to breed for increased nematode resistance. The current paper demonstrates the use of random regression techniques for quantifying the development of the heritability of faecal egg count (Fec), the indicator of nematode resistance, in growing lambs and predicted inter-age genetic and phenotypic correlations for Fec. Fec data from 732 lambs, collected at 4-week intervals from c. 8–24 weeks of age, were analysed using random regression techniques. Random effects fitted in the model included genetic, individual animal environmental, litter and residual random effects. Output (co)variance components were interpolated to weekly time points. Individual variance components showed complex patterns of change over time; however, the estimated heritability increased smoothly with age, from 0·10 to 0·38, and showed more stable time trends than were obtained from univariate analyses of Fec at individual time points. Inter-age correlations decreased as the time interval between measurements increased. Genetic correlations were always positive, with 0·6 of all possible inter-age correlations being greater than 0·80. Phenotypic correlations were lower, and decreased more quickly as the time interval between measurements increased. The results presented confirm biological understanding of the development of immunity to nematode infections in growing lambs. Additionally, they provide a tool to determine optimal sampling ages when assessing lambs' relative resistance to nematode infections.


Genetics ◽  
2000 ◽  
Vol 156 (2) ◽  
pp. 913-922
Author(s):  
Florence Jaffrézic ◽  
Scott D Pletcher

Abstract The genetic analysis of characters that are best considered as functions of some independent and continuous variable, such as age, can be a complicated matter, and a simple and efficient procedure is desirable. Three methods are common in the literature: random regression, orthogonal polynomial approximation, and character process models. The goals of this article are (i) to clarify the relationships between these methods; (ii) to develop a general extension of the character process model that relaxes correlation stationarity, its most stringent assumption; and (iii) to compare and contrast the techniques and evaluate their performance across a range of actual and simulated data. We find that the character process model, as described in 1999 by Pletcher and Geyer, is the most successful method of analysis for the range of data examined in this study. It provides a reasonable description of a wide range of different covariance structures, and it results in the best models for actual data. Our analysis suggests genetic variance for Drosophila mortality declines with age, while genetic variance is constant at all ages for reproductive output. For growth in beef cattle, however, genetic variance increases linearly from birth, and genetic correlations are high across all observed ages.


2014 ◽  
Vol 134 (11) ◽  
pp. 1708-1715
Author(s):  
Tomohiro Hachino ◽  
Kazuhiro Matsushita ◽  
Hitoshi Takata ◽  
Seiji Fukushima ◽  
Yasutaka Igarashi

Author(s):  
M D MacNeil ◽  
J W Buchanan ◽  
M L Spangler ◽  
E Hay

Abstract The objective of this study was to evaluate the effects of various data structures on the genetic evaluation for the binary phenotype of reproductive success. The data were simulated based on an existing pedigree and an underlying fertility phenotype with a heritability of 0.10. A data set of complete observations was generated for all cows. This data set was then modified mimicking the culling of cows when they first failed to reproduce, cows having a missing observation at either their second or fifth opportunity to reproduce as if they had been selected as donors for embryo transfer, and censoring records following the sixth opportunity to reproduce as in a cull-for-age strategy. The data were analyzed using a third order polynomial random regression model. The EBV of interest for each animal was the sum of the age-specific EBV over the first 10 observations (reproductive success at ages 2-11). Thus, the EBV might be interpreted as the genetic expectation of number of calves produced when a female is given ten opportunities to calve. Culling open cows resulted in the EBV for 3 year-old cows being reduced from 8.27 ± 0.03 when open cows were retained to 7.60 ± 0.02 when they were culled. The magnitude of this effect decreased as cows grew older when they first failed to reproduce and were subsequently culled. Cows that did not fail over the 11 years of simulated data had an EBV of 9.43 ± 0.01 and 9.35 ± 0.01 based on analyses of the complete data and the data in which cows that failed to reproduce were culled, respectively. Cows that had a missing observation for their second record had a significantly reduced EBV, but the corresponding effect at the fifth record was negligible. The current study illustrates that culling and management decisions, and particularly those that impact the beginning of the trajectory of sustained reproductive success, can influence both the magnitude and accuracy of resulting EBV.


2019 ◽  
Vol 29 (1) ◽  
pp. 265-274
Author(s):  
Ali Kiadaliri ◽  
Monica Hernández Alava ◽  
Ewa M. Roos ◽  
Martin Englund

Abstract Purpose To develop a mapping model to estimate EQ-5D-3L from the Knee Injury and Osteoarthritis Outcome Score (KOOS). Methods The responses to EQ-5D-3L and KOOS questionnaires (n = 40,459 observations) were obtained from the Swedish National anterior cruciate ligament (ACL) Register for patients ≥ 18 years with the knee ACL injury. We used linear regression (LR) and beta-mixture (BM) for direct mapping and the generalized ordered probit model for response mapping (RM). We compared the distribution of the original data to the distributions of the data generated using the estimated models. Results Models with individual KOOS subscales performed better than those with the average of KOOS subscale scores (KOOS5, KOOS4). LR had the poorest performance overall and across the range of disease severity particularly at the extremes of the distribution of severity. Compared with the RM, the BM performed better across the entire range of disease severity except the most severe range (KOOS5 < 25). Moving from the most to the least disease severity was associated with 0.785 gain in the observed EQ-5D-3L. The corresponding value was 0.743, 0.772 and 0.782 for LR, BM and RM, respectively. LR generated simulated EQ-5D-3L values outside the feasible range. The distribution of simulated data generated from the BM model was almost identical to the original data. Conclusions We developed mapping models to estimate EQ-5D-3L from KOOS facilitating application of KOOS in cost-utility analyses. The BM showed superior performance for estimating EQ-5D-3L from KOOS. Further validation of the estimated models in different independent samples is warranted.


Author(s):  
Yufei Li ◽  
Xiaoyong Ma ◽  
Xiangyu Zhou ◽  
Pengzhen Cheng ◽  
Kai He ◽  
...  

Abstract Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document