scholarly journals Bayesian Inference of Species Trees using Diffusion Models

2020 ◽  
Vol 70 (1) ◽  
pp. 145-161 ◽  
Author(s):  
Marnus Stoltz ◽  
Boris Baeumer ◽  
Remco Bouckaert ◽  
Colin Fox ◽  
Gordon Hiscott ◽  
...  

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]

Author(s):  
Diego F Morales-Briones ◽  
Gudrun Kadereit ◽  
Delphine T Tefarikis ◽  
Michael J Moore ◽  
Stephen A Smith ◽  
...  

Abstract Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]


2019 ◽  
Vol 491 (4) ◽  
pp. 5238-5247 ◽  
Author(s):  
X Saad-Olivera ◽  
C F Martinez ◽  
A Costa de Souza ◽  
F Roig ◽  
D Nesvorný

ABSTRACT We characterize the radii and masses of the star and planets in the Kepler-59 system, as well as their orbital parameters. The star parameters are determined through a standard spectroscopic analysis, resulting in a mass of $1.359\pm 0.155\, \mathrm{M}_\odot$ and a radius of $1.367\pm 0.078\, \mathrm{R}_\odot$. The obtained planetary radii are $1.5\pm 0.1\, R_\oplus$ for the inner and $2.2\pm 0.1\, R_\oplus$ for the outer planet. The orbital parameters and the planetary masses are determined by the inversion of Transit Timing Variations (TTV) signals. We consider two different data sets: one provided by Holczer et al. (2016), with TTVs only for Kepler-59c, and the other provided by Rowe et al. (2015), with TTVs for both planets. The inversion method applies an algorithm of Bayesian inference (MultiNest) combined with an efficient N-body integrator (Swift). For each of the data set, we found two possible solutions, both having the same probability according to their corresponding Bayesian evidences. All four solutions appear to be indistinguishable within their 2-σ uncertainties. However, statistical analyses show that the solutions from Rowe et al. (2015) data set provide a better characterization. The first solution infers masses of $5.3_{-2.1}^{+4.0}~M_{\mathrm{\oplus }}$ and $4.6_{-2.0}^{+3.6}~M_{\mathrm{\oplus }}$ for the inner and outer planet, respectively, while the second solution gives masses of $3.0^{+0.8}_{-0.8}~M_{\mathrm{\oplus }}$ and $2.6^{+0.9}_{-0.8}~M_{\mathrm{\oplus }}$. These values point to a system with an inner super-Earth and an outer mini-Neptune. A dynamical study shows that the planets have almost co-planar orbits with small eccentricities (e < 0.1), close to the 3:2 mean motion resonance. A stability analysis indicates that this configuration is stable over million years of evolution.


Mathematics ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 1942
Author(s):  
Andrés R. Masegosa ◽  
Darío Ramos-López ◽  
Antonio Salmerón ◽  
Helge Langseth ◽  
Thomas D. Nielsen

In many modern data analysis problems, the available data is not static but, instead, comes in a streaming fashion. Performing Bayesian inference on a data stream is challenging for several reasons. First, it requires continuous model updating and the ability to handle a posterior distribution conditioned on an unbounded data set. Secondly, the underlying data distribution may drift from one time step to another, and the classic i.i.d. (independent and identically distributed), or data exchangeability assumption does not hold anymore. In this paper, we present an approximate Bayesian inference approach using variational methods that addresses these issues for conjugate exponential family models with latent variables. Our proposal makes use of a novel scheme based on hierarchical priors to explicitly model temporal changes of the model parameters. We show how this approach induces an exponential forgetting mechanism with adaptive forgetting rates. The method is able to capture the smoothness of the concept drift, ranging from no drift to abrupt drift. The proposed variational inference scheme maintains the computational efficiency of variational methods over conjugate models, which is critical in streaming settings. The approach is validated on four different domains (energy, finance, geolocation, and text) using four real-world data sets.


2021 ◽  
Author(s):  
Petya Kindalova ◽  
Ioannis Kosmidis ◽  
Thomas E. Nichols

AbstractObjectivesWhite matter lesions are a very common finding on MRI in older adults and their presence increases the risk of stroke and dementia. Accurate and computationally efficient modelling methods are necessary to map the association of lesion incidence with risk factors, such as hypertension. However, there is no consensus in the brain mapping literature whether a voxel-wise modelling approach is better for binary lesion data than a more computationally intensive spatial modelling approach that accounts for voxel dependence.MethodsWe review three regression approaches for modelling binary lesion masks including massunivariate probit regression modelling with either maximum likelihood estimates, or mean bias-reduced estimates, and spatial Bayesian modelling, where the regression coefficients have a conditional autoregressive model prior to account for local spatial dependence. We design a novel simulation framework of artificial lesion maps to compare the three alternative lesion mapping methods. The age effect on lesion probability estimated from a reference data set (13,680 individuals from the UK Biobank) is used to simulate a realistic voxel-wise distribution of lesions across age. To mimic the real features of lesion masks, we suggest matching brain lesion summaries (total lesion volume, average lesion size and lesion count) across the reference data set and the simulated data sets. Thus, we allow for a fair comparison between the modelling approaches, under a realistic simulation setting.ResultsOur findings suggest that bias-reduced estimates for voxel-wise binary-response generalized linear models (GLMs) overcome the drawbacks of infinite and biased maximum likelihood estimates and scale well for large data sets because voxel-wise estimation can be performed in parallel across voxels. Contrary to the assumption of spatial dependence being key in lesion mapping, our results show that voxel-wise bias-reduction and spatial modelling result in largely similar estimates.ConclusionBias-reduced estimates for voxel-wise GLMs are not only accurate but also computationally efficient, which will become increasingly important as more biobank-scale neuroimaging data sets become available.


2020 ◽  
Vol 69 (5) ◽  
pp. 973-986 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Timothy G Vaughan ◽  
Tanja Stadler

Abstract Heterogeneous populations can lead to important differences in birth and death rates across a phylogeny. Taking this heterogeneity into account is necessary to obtain accurate estimates of the underlying population dynamics. We present a new multitype birth–death model (MTBD) that can estimate lineage-specific birth and death rates. This corresponds to estimating lineage-dependent speciation and extinction rates for species phylogenies, and lineage-dependent transmission and recovery rates for pathogen transmission trees. In contrast with previous models, we do not presume to know the trait driving the rate differences, nor do we prohibit the same rates from appearing in different parts of the phylogeny. Using simulated data sets, we show that the MTBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a reanalysis of two empirical data sets and compare the results obtained by MTBD and by the existing software BAMM. We compare two implementations of the model, one exact and one approximate (assuming that no rate changes occur in the extinct parts of the tree), and show that the approximation only slightly affects results. The MTBD model is implemented as a package in the Bayesian inference software BEAST 2 and allows joint inference of the phylogeny and the model parameters.[Birth–death; lineage specific rates, multi-type model.]


2017 ◽  
Vol 5 (4) ◽  
pp. 1
Author(s):  
I. E. Okorie ◽  
A. C. Akpanta ◽  
J. Ohakwe ◽  
D. C. Chikezie ◽  
C. U. Onyemachi ◽  
...  

This paper introduces a new generator of probability distribution-the adjusted log-logistic generalized (ALLoG) distribution and a new extension of the standard one parameter exponential distribution called the adjusted log-logistic generalized exponential (ALLoGExp) distribution. The ALLoGExp distribution is a special case of the ALLoG distribution and we have provided some of its statistical and reliability properties. Notably, the failure rate could be monotonically decreasing, increasing or upside-down bathtub shaped depending on the value of the parameters $\delta$ and $\theta$. The method of maximum likelihood estimation was proposed to estimate the model parameters. The importance and flexibility of he ALLoGExp distribution was demonstrated with a real and uncensored lifetime data set and its fit was compared with five other exponential related distributions. The results obtained from the model fittings shows that the ALLoGExp distribution provides a reasonably better fit than the one based on the other fitted distributions. The ALLoGExp distribution is therefore ecommended for effective modelling of lifetime data sets.


2021 ◽  
Vol 37 (3) ◽  
pp. 481-490
Author(s):  
Chenyong Song ◽  
Dongwei Wang ◽  
Haoran Bai ◽  
Weihao Sun

HighlightsThe proposed data enhancement method can be used for small-scale data sets with rich sample image features.The accuracy of the new model reaches 98.5%, which is better than the traditional CNN method.Abstract: GoogLeNet offers far better performance in identifying apple disease compared to traditional methods. However, the complexity of GoogLeNet is relatively high. For small volumes of data, GoogLeNet does not achieve the same performance as it does with large-scale data. We propose a new apple disease identification model using GoogLeNet’s inception module. The model adopts a variety of methods to optimize its generalization ability. First, geometric transformation and image modification of data enhancement methods (including rotation, scaling, noise interference, random elimination, color space enhancement) and random probability and appropriate combination of strategies are used to amplify the data set. Second, we employ a deep convolution generative adversarial network (DCGAN) to enhance the richness of generated images by increasing the diversity of the noise distribution of the generator. Finally, we optimize the GoogLeNet model structure to reduce model complexity and model parameters, making it more suitable for identifying apple tree diseases. The experimental results show that our approach quickly detects and classifies apple diseases including rust, spotted leaf disease, and anthrax. It outperforms the original GoogLeNet in recognition accuracy and model size, with identification accuracy reaching 98.5%, making it a feasible method for apple disease classification. Keywords: Apple disease identification, Data enhancement, DCGAN, GoogLeNet.


Author(s):  
Rajendra Prasad ◽  
Lalit Kumar Gupta ◽  
A. Beesham ◽  
G. K. Goswami ◽  
Anil Kumar Yadav

In this paper, we investigate a Bianchi type I exact Universe by taking into account the cosmological constant as the source of energy at the present epoch. We have performed a [Formula: see text] test to obtain the best fit values of the model parameters of the Universe in the derived model. We have used two types of data sets, viz., (i) 31 values of the Hubble parameter and (ii) the 1048 Pantheon data set of various supernovae distance moduli and apparent magnitudes. From both the data sets, we have estimated the current values of the Hubble constant, density parameters [Formula: see text] and [Formula: see text]. The dynamics of the deceleration parameter shows that the Universe was in a decelerating phase for redshift [Formula: see text]. At a transition redshift [Formula: see text], the present Universe entered an accelerating phase of expansion. The current age of the Universe is obtained as [Formula: see text] Gyrs. This is in good agreement with the value of [Formula: see text] calculated from the Plank collaboration results and WMAP observations.


2021 ◽  
Author(s):  
Gah-Yi Ban ◽  
N. Bora Keskin

We consider a seller who can dynamically adjust the price of a product at the individual customer level, by utilizing information about customers’ characteristics encoded as a d-dimensional feature vector. We assume a personalized demand model, parameters of which depend on s out of the d features. The seller initially does not know the relationship between the customer features and the product demand but learns this through sales observations over a selling horizon of T periods. We prove that the seller’s expected regret, that is, the revenue loss against a clairvoyant who knows the underlying demand relationship, is at least of order [Formula: see text] under any admissible policy. We then design a near-optimal pricing policy for a semiclairvoyant seller (who knows which s of the d features are in the demand model) who achieves an expected regret of order [Formula: see text]. We extend this policy to a more realistic setting, where the seller does not know the true demand predictors, and show that this policy has an expected regret of order [Formula: see text], which is also near-optimal. Finally, we test our theory on simulated data and on a data set from an online auto loan company in the United States. On both data sets, our experimentation-based pricing policy is superior to intuitive and/or widely-practiced customized pricing methods, such as myopic pricing and segment-then-optimize policies. Furthermore, our policy improves upon the loan company’s historical pricing decisions by 47% in expected revenue over a six-month period. This paper was accepted by Noah Gans, stochastic models and simulation.


2016 ◽  
Vol 5 (2) ◽  
pp. 1
Author(s):  
Yao Luo ◽  
Eunji Lim

When estimating an unknown function from a data set of n observations, the function is often known to be convex. For example, the long-run average waiting time of a customer in a single server queue is known to be convex in the service rate (Weber 1983) even though there is no closed-form formula for the mean waiting time, and hence, it needs to be estimated from a data set. A computationally efficient way of finding the best fit of the convex function to the data set is to compute the least absolute deviations estimator minimizing the sum of absolute deviations over the set of convex functions. This estimator exhibits numerically preferred behavior since it can be computed faster and for a larger data sets compared to other existing methods (Lim & Luo 2014). In this paper, we establish the validity of the least absolute deviations estimator by proving that the least absolute deviations estimator converges almost surely to the true function as n increases to infinity under modest assumptions.


Sign in / Sign up

Export Citation Format

Share Document