scholarly journals A $C_p$ criterion for semiparametric causal inference

Biometrika ◽  
2017 ◽  
Vol 104 (4) ◽  
pp. 845-861 ◽  
Author(s):  
Takamichi Baba ◽  
Takayuki Kanemori ◽  
Yoshiyuki Ninomiya

Summary For marginal structural models, which play an important role in causal inference, we consider a model selection problem within a semiparametric framework using inverse-probability-weighted estimation or doubly robust estimation. In this framework, the modelling target is a potential outcome that may be missing, so there is no classical information criterion. We define a mean squared error for treating the potential outcome and derive an asymptotic unbiased estimator as a $C_{p}$ criterion using an ignorable treatment assignment condition. Simulation shows that the proposed criterion outperforms a conventional one by providing smaller squared errors and higher frequencies of selecting the true model in all the settings considered. Moreover, in a real-data analysis we found a clear difference between the two criteria.

2020 ◽  
Author(s):  
Ali Ghazizadeh ◽  
Frederic Ambroggi

AbstractPeri-event time histograms (PETH) are widely used to study correlations between experimental events and neuronal firing. The accuracy of firing rate estimate using a PETH depends on the choice of binsize. We show that the optimal binsize for a PETH depends on factors such as the number of trials and the temporal dynamics of the firing rate. These factors argue against the use of a one-size-fits-all binsize when making PETHs for an inhomogeneous population of neurons. Here we propose a binsize selection method by adapting the Akaike Information Criterion (AIC). Simulations show that optimal binsizes estimated by AIC closely match the optimal binsizes using mean squared error (MSE). Furthermore, using real data, we find that optimal binning improves detection of responses and their dynamics. Together our analysis strongly supports optimal binning of PETHs and proposes a computationally efficient method for this optimization based on AIC approach to model selection.


2007 ◽  
Vol 15 (3) ◽  
pp. 199-236 ◽  
Author(s):  
Daniel E. Ho ◽  
Kosuke Imai ◽  
Gary King ◽  
Elizabeth A. Stuart

Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it ispossibleto find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.


Author(s):  
Hanna Unterauer ◽  
Norbert Brunner ◽  
Manfred Kühleitner

Scientific growth literature often uses the models of Brody, Gompertz, Verhulst, and von Bertalanffy. The versatile five-parameter Bertalanffy-Pütter (BP) model generalizes them. Using the least-squares method, we fitted the BP model to mass-at-age data of 161 calves, cows, bulls, and oxen of cattle breeds that are common in Austria and Southern Germany. We used three measures to assess the goodness of fit: R-squared, normalized root-mean squared error, and the Akaike information criterion together with a correction for sample size. Although the BP model improved the fit of the linear growth model considerably in terms of R-squared, the better fit did not, in general, justify the use of its additional parameters, because most of the data had a non-sigmoidal character. In terms of the Akaike criterion, we could identify only a small core of data (15%) where sigmoidal models were indispensable.    


Entropy ◽  
2019 ◽  
Vol 21 (4) ◽  
pp. 394 ◽  
Author(s):  
Andrea Murari ◽  
Emmanuele Peluso ◽  
Francesco Cianfrani ◽  
Pasquale Gaudio ◽  
Michele Lungaroni

The most widely used forms of model selection criteria, the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC), are expressed in terms of synthetic indicators of the residual distribution: the variance and the mean-squared error of the residuals respectively. In many applications in science, the noise affecting the data can be expected to have a Gaussian distribution. Therefore, at the same level of variance and mean-squared error, models, whose residuals are more uniformly distributed, should be favoured. The degree of uniformity of the residuals can be quantified by the Shannon entropy. Including the Shannon entropy in the BIC and AIC expressions improves significantly these criteria. The better performances have been demonstrated empirically with a series of simulations for various classes of functions and for different levels and statistics of the noise. In presence of outliers, a better treatment of the errors, using the Geodesic Distance, has proved essential.


2016 ◽  
Vol 2 (11) ◽  
Author(s):  
William Stewart

<p>For modern linkage studies involving many small families, Stewart et al. (2009)[1] introduced an efficient estimator of disease gene location (denoted ) that averages location estimates from random subsamples of the dense SNP data. Their estimator has lower mean squared error than competing estimators and yields narrower confidence intervals (CIs) as well. However, when the number of families is small and the pedigree structure is large (possibly extended), the computational feasibility and statistical properties of  are not known. We use simulation and real data to show that (1) for this extremely important but often overlooked study design, CIs based on  are narrower than CIs based on a single subsample, and (2) the reduction in CI length is proportional to the square root of the expected Monte Carlo error. As a proof of principle, we applied  to the dense SNP data of four large, extended, specific language impairment (SLI) pedigrees, and reduced the single subsample CI by 18%. In summary, confidence intervals based on  should minimize re-sequencing costs beneath linkage peaks, and reduce the number of candidate genes to investigate.</p>


Sign in / Sign up

Export Citation Format

Share Document