Testing equality of means in partially paired data with incompleteness in single response

2018 ◽  
Vol 28 (5) ◽  
pp. 1508-1522 ◽  
Author(s):  
Qianya Qi ◽  
Li Yan ◽  
Lili Tian

In testing differentially expressed genes between tumor and healthy tissues, data are usually collected in paired form. However, incomplete paired data often occur. While extensive statistical researches exist for paired data with incompleteness in both arms, hardly any recent work can be found on paired data with incompleteness in single arm. This paper aims to fill this gap by proposing some new methods, namely, P-value pooling methods and a nonparametric combination test. Simulation studies are conducted to investigate the performance of the proposed methods in terms of type I error and power at small to moderate sample sizes. A real data set from The Cancer Genome Atlas (TCGA) breast cancer study is analyzed using the proposed methods.

2018 ◽  
Vol 28 (9) ◽  
pp. 2868-2875
Author(s):  
Zhongxue Chen ◽  
Qingzhong Liu ◽  
Kai Wang

Several gene- or set-based association tests have been proposed recently in the literature. Powerful statistical approaches are still highly desirable in this area. In this paper we propose a novel statistical association test, which uses information of the burden component and its complement from the genotypes. This new test statistic has a simple null distribution, which is a special and simplified variance-gamma distribution, and its p-value can be easily calculated. Through a comprehensive simulation study, we show that the new test can control type I error rate and has superior detecting power compared with some popular existing methods. We also apply the new approach to a real data set; the results demonstrate that this test is promising.


2020 ◽  
Author(s):  
Marton Soskuthy

Generalised additive mixed models (GAMMs) are increasingly popular in dynamic speech analysis, where the focus is on measurements with temporal or spatial structure such as formant, pitch or tongue contours. GAMMs provide a range of tools for dealing with the non-linear contour shapes and complex hierarchical organisation characteristic of such data sets. This, however, means that analysts are faced with non-trivial choices, many of which have a serious impact on the statistical validity of their analyses. This paper presents type I and type II error simulations to help researchers make informed decisions about modelling strategies when using GAMMs to analyse phonetic data. The simulations are based on two real data sets containing F2 and pitch contours, and a simulated data set modelled after the F2 data. They reflect typical scenarios in dynamic speech analysis. The main emphasis is on (i) dealing with dependencies within contours and higher-level units using random structures and other tools, and (ii) strategies for significance testing using GAMMs. The paper concludes with a small set of recommendations for fitting GAMMs, and provides advice on diagnosing issues and tailoring GAMMs to specific data sets. It is also accompanied by a GitHub repository including a tutorial on running type I error simulations for existing data sets: https://github.com/soskuthy/gamm_strategies.


Genetics ◽  
1998 ◽  
Vol 150 (2) ◽  
pp. 931-943 ◽  
Author(s):  
Claude M Lebreton ◽  
Peter M Visscher ◽  
Christopher S Haley ◽  
Andrei Semikhodskii ◽  
Steve A Quarrie

Abstract A novel method using the nonparametric bootstrap is proposed for testing whether a quantitative trait locus (QTL) at one chromosomal position could explain effects on two separate traits. If the single-QTL hypothesis is accepted, pleiotropy could explain the effect on two traits. If it is rejected, then the effects on two traits are due to linked QTLs. The method can be used in conjunction with several QTL mapping methods as long as they provide a straightforward estimate of the number of QTLs detectable from the data set. A selection step was introduced in the bootstrap procedure to reduce the conservativeness of the test of close linkage vs. pleiotropy, so that the erroneous rejection of the null hypothesis of pleiotropy only happens at a frequency equal to the nominal type I error risk specified by the user. The approach was assessed using computer simulations and proved to be relatively unbiased and robust over the range of genetic situations tested. An example of its application on a real data set from a saline stress experiment performed on a recombinant population of wheat (Triticum aestivum L.) doubled haploid lines is also provided.


2021 ◽  
pp. 001316442199489
Author(s):  
Luyao Peng ◽  
Sandip Sinharay

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 934
Author(s):  
Yuxuan Zhang ◽  
Kaiwei Liu ◽  
Wenhao Gui

For the purpose of improving the statistical efficiency of estimators in life-testing experiments, generalized Type-I hybrid censoring has lately been implemented by guaranteeing that experiments only terminate after a certain number of failures appear. With the wide applications of bathtub-shaped distribution in engineering areas and the recently introduced generalized Type-I hybrid censoring scheme, considering that there is no work coalescing this certain type of censoring model with a bathtub-shaped distribution, we consider the parameter inference under generalized Type-I hybrid censoring. First, estimations of the unknown scale parameter and the reliability function are obtained under the Bayesian method based on LINEX and squared error loss functions with a conjugate gamma prior. The comparison of estimations under the E-Bayesian method for different prior distributions and loss functions is analyzed. Additionally, Bayesian and E-Bayesian estimations with two unknown parameters are introduced. Furthermore, to verify the robustness of the estimations above, the Monte Carlo method is introduced for the simulation study. Finally, the application of the discussed inference in practice is illustrated by analyzing a real data set.


2017 ◽  
Vol 7 (1) ◽  
pp. 72 ◽  
Author(s):  
Lamya A Baharith

Truncated type I generalized logistic distribution has been used in a variety of applications. In this article, a new bivariate truncated type I generalized logistic (BTTGL) distributional models driven from three different copula functions are introduced. A study of some properties is illustrated. Parametric and semiparametric methods are used to estimate the parameters of the BTTGL models. Maximum likelihood and inference function for margin estimates of the BTTGL parameters are compared with semiparametric estimates using real data set. Further, a comparison between BTTGL, bivariate generalized exponential and bivariate exponentiated Weibull models is conducted using Akaike information criterion and the maximized log-likelihood. Extensive Monte Carlo simulation study is carried out for different values of the parameters and different sample sizes to compare the performance of parametric and semiparametric estimators based on relative mean square error.


2017 ◽  
Author(s):  
František Váša ◽  
Edward T. Bullmore ◽  
Ameera X. Patel

AbstractFunctional connectomes are commonly analysed as sparse graphs, constructed by thresholding cross-correlations between regional neurophysiological signals. Thresholding generally retains the strongest edges (correlations), either by retaining edges surpassing a given absolute weight, or by constraining the edge density. The latter (more widely used) method risks inclusion of false positive edges at high edge densities and exclusion of true positive edges at low edge densities. Here we apply new wavelet-based methods, which enable construction of probabilistically-thresholded graphs controlled for type I error, to a dataset of resting-state fMRI scans of 56 patients with schizophrenia and 71 healthy controls. By thresholding connectomes to fixed edge-specific P value, we found that functional connectomes of patients with schizophrenia were more dysconnected than those of healthy controls, exhibiting a lower edge density and a higher number of (dis)connected components. Furthermore, many participants’ connectomes could not be built up to the fixed edge densities commonly studied in the literature (~5-30%), while controlling for type I error. Additionally, we showed that the topological randomisation previously reported in the schizophrenia literature is likely attributable to “non-significant” edges added when thresholding connectomes to fixed density based on correlation. Finally, by explicitly comparing connectomes thresholded by increasing P value and decreasing correlation, we showed that probabilistically thresholded connectomes show decreased randomness and increased consistency across participants. Our results have implications for future analysis of functional connectivity using graph theory, especially within datasets exhibiting heterogenous distributions of edge weights (correlations), between groups or across participants.


2019 ◽  
Author(s):  
Leili Tapak ◽  
Omid Hamidi ◽  
Majid Sadeghifar ◽  
Hassan Doosti ◽  
Ghobad Moradi

Abstract Objectives Zero-inflated proportion or rate data nested in clusters due to the sampling structure can be found in many disciplines. Sometimes, the rate response may not be observed for some study units because of some limitations (false negative) like failure in recording data and the zeros are observed instead of the actual value of the rate/proportions (low incidence). In this study, we proposed a multilevel zero-inflated censored Beta regression model that can address zero-inflation rate data with low incidence.Methods We assumed that the random effects are independent and normally distributed. The performance of the proposed approach was evaluated by application on a three level real data set and a simulation study. We applied the proposed model to analyze brucellosis diagnosis rate data and investigate the effects of climatic and geographical position. For comparison, we also applied the standard zero-inflated censored Beta regression model that does not account for correlation.Results Results showed the proposed model performed better than zero-inflated censored Beta based on AIC criterion. Height (p-value <0.0001), temperature (p-value <0.0001) and precipitation (p-value = 0.0006) significantly affected brucellosis rates. While, precipitation in ZICBETA model was not statistically significant (p-value =0.385). Simulation study also showed that the estimations obtained by maximum likelihood approach had reasonable in terms of mean square error.Conclusions The results showed that the proposed method can capture the correlations in the real data set and yields accurate parameter estimates.


2020 ◽  
Vol 45 (1) ◽  
pp. 37-53
Author(s):  
Wenchao Ma ◽  
Ragip Terzi ◽  
Jimmy de la Torre

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.


2016 ◽  
Vol 5 (5) ◽  
pp. 16 ◽  
Author(s):  
Guolong Zhao

To evaluate a drug, statistical significance alone is insufficient and clinical significance is also necessary. This paper explains how to analyze clinical data with considering both statistical and clinical significance. The analysis is practiced by combining a confidence interval under null hypothesis with that under non-null hypothesis. The combination conveys one of the four possible results: (i) both significant, (ii) only significant in the former, (iii) only significant in the latter or (iv) neither significant. The four results constitute a quadripartite procedure. Corresponding tests are mentioned for describing Type I error rates and power. The empirical coverage is exhibited by Monte Carlo simulations. In superiority trials, the four results are interpreted as clinical superiority, statistical superiority, non-superiority and indeterminate respectively. The interpretation is opposite in inferiority trials. The combination poses a deflated Type I error rate, a decreased power and an increased sample size. The four results may helpful for a meticulous evaluation of drugs. Of these, non-superiority is another profile of equivalence and so it can also be used to interpret equivalence. This approach may prepare a convenience for interpreting discordant cases. Nevertheless, a larger data set is usually needed. An example is taken from a real trial in naturally acquired influenza.


Sign in / Sign up

Export Citation Format

Share Document