scholarly journals Statistical testing and power analysis for brain-wide association study

2016 ◽  
Author(s):  
Weikang Gong ◽  
Lin Wan ◽  
Wenlian Lu ◽  
Liang Ma ◽  
Fan Cheng ◽  
...  

AbstractThe identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression [Cheng et al., 2015a,b, 2016], the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method considerably reduces the computational complexity of a permutation-or simulation-based approach, thus, it can efficiently tackle large datasets with ultra-high resolution images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods failed. A software package is available at https://github.com/weikanggong/BWAS.

2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


2018 ◽  
Author(s):  
Qianying Wang ◽  
Jing Liao ◽  
Kaitlyn Hair ◽  
Alexandra Bannach-Brown ◽  
Zsanett Bahor ◽  
...  

AbstractBackgroundMeta-analysis is increasingly used to summarise the findings identified in systematic reviews of animal studies modelling human disease. Such reviews typically identify a large number of individually small studies, testing efficacy under a variety of conditions. This leads to substantial heterogeneity, and identifying potential sources of this heterogeneity is an important function of such analyses. However, the statistical performance of different approaches (normalised compared with standardised mean difference estimates of effect size; stratified meta-analysis compared with meta-regression) is not known.MethodsUsing data from 3116 experiments in focal cerebral ischaemia to construct a linear model predicting observed improvement in outcome contingent on 25 independent variables. We used stochastic simulation to attribute these variables to simulated studies according to their prevalence. To ascertain the ability to detect an effect of a given variable we introduced in addition this “variable of interest” of given prevalence and effect. To establish any impact of a latent variable on the apparent influence of the variable of interest we also introduced a “latent confounding variable” with given prevalence and effect, and allowed the prevalence of the variable of interest to be different in the presence and absence of the latent variable.ResultsGenerally, the normalised mean difference (NMD) approach had higher statistical power than the standardised mean difference (SMD) approach. Even when the effect size and the number of studies contributing to the meta-analysis was small, there was good statistical power to detect the overall effect, with a low false positive rate. For detecting an effect of the variable of interest, stratified meta-analysis was associated with a substantial false positive rate with NMD estimates of effect size, while using an SMD estimate of effect size had very low statistical power. Univariate and multivariable meta-regression performed substantially better, with low false positive rate for both NMD and SMD approaches; power was higher for NMD than for SMD. The presence or absence of a latent confounding variables only introduced an apparent effect of the variable of interest when there was substantial asymmetry in the prevalence of the variable of interest in the presence or absence of the confounding variable.ConclusionsIn meta-analysis of data from animal studies, NMD estimates of effect size should be used in preference to SMD estimates, and meta-regression should, where possible, be chosen over stratified meta-analysis. The power to detect the influence of the variable of interest depends on the effect of the variable of interest and its prevalence, but unless effects are very large adequate power is only achieved once at least 100 experiments are included in the meta-analysis.


2009 ◽  
Vol 53 (7) ◽  
pp. 2949-2954 ◽  
Author(s):  
Isabel Cuesta ◽  
Concha Bielza ◽  
Pedro Larrañaga ◽  
Manuel Cuenca-Estrella ◽  
Fernando Laguna ◽  
...  

ABSTRACT European Committee on Antimicrobial Susceptibility Testing (EUCAST) breakpoints classify Candida strains with a fluconazole MIC ≤ 2 mg/liter as susceptible, those with a fluconazole MIC of 4 mg/liter as representing intermediate susceptibility, and those with a fluconazole MIC > 4 mg/liter as resistant. Machine learning models are supported by complex statistical analyses assessing whether the results have statistical relevance. The aim of this work was to use supervised classification algorithms to analyze the clinical data used to produce EUCAST fluconazole breakpoints. Five supervised classifiers (J48, Correlation and Regression Trees [CART], OneR, Naïve Bayes, and Simple Logistic) were used to analyze two cohorts of patients with oropharyngeal candidosis and candidemia. The target variable was the outcome of the infections, and the predictor variables consisted of values for the MIC or the proportion between the dose administered and the MIC of the isolate (dose/MIC). Statistical power was assessed by determining values for sensitivity and specificity, the false-positive rate, the area under the receiver operating characteristic (ROC) curve, and the Matthews correlation coefficient (MCC). CART obtained the best statistical power for a MIC > 4 mg/liter for detecting failures (sensitivity, 87%; false-positive rate, 8%; area under the ROC curve, 0.89; MCC index, 0.80). For dose/MIC determinations, the target was >75, with a sensitivity of 91%, a false-positive rate of 10%, an area under the ROC curve of 0.90, and an MCC index of 0.80. Other classifiers gave similar breakpoints with lower statistical power. EUCAST fluconazole breakpoints have been validated by means of machine learning methods. These computer tools must be incorporated in the process for developing breakpoints to avoid researcher bias, thus enhancing the statistical power of the model.


2021 ◽  
Author(s):  
Baptiste Couvy-Duchesne ◽  
Futao Zhang ◽  
Kathryn E. Kemper ◽  
Julia Sidorenko ◽  
Naomi R. Wray ◽  
...  

2.AbstractCovariance between grey-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by confounding factors (e.g. age, head size, scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertex-wise analyses. We evaluated this concern by performing state-of-the-art mass-univariate analyses (general linear model, GLM) on traits simulated from real vertex-wise grey matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies. We showed that when performed on a large sample (N=8,662, UK Biobank), GLMs yielded large spatial clusters of significant vertices and greatly inflated false positive rate (Family Wise Error Rate: FWER=1, cluster false discovery rate: FDR>0.6). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate (yet FWER>5% after Bonferroni correction) but at a cost of increased computation. In practice, the parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching significance. Next, we performed mass-univariate association analyses on five real UKB traits (age, sex, BMI, fluid intelligence and smoking status) and LMM yielded fewer and more localised associations. We identified 19 significant clusters displaying small associations with age, sex and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Wenhan Chen ◽  
Yang Wu ◽  
Zhili Zheng ◽  
Ting Qi ◽  
Peter M. Visscher ◽  
...  

AbstractSummary statistics from genome-wide association studies (GWAS) have facilitated the development of various summary data-based methods, which typically require a reference sample for linkage disequilibrium (LD) estimation. Analyses using these methods may be biased by errors in GWAS summary data or LD reference or heterogeneity between GWAS and LD reference. Here we propose a quality control method, DENTIST, that leverages LD among genetic variants to detect and eliminate errors in GWAS or LD reference and heterogeneity between the two. Through simulations, we demonstrate that DENTIST substantially reduces false-positive rate in detecting secondary signals in the summary-data-based conditional and joint association analysis, especially for imputed rare variants (false-positive rate reduced from >28% to <2% in the presence of heterogeneity between GWAS and LD reference). We further show that DENTIST can improve other summary-data-based analyses such as fine-mapping analysis.


2020 ◽  
Author(s):  
Kodi Taraszka ◽  
Noah Zaitlen ◽  
Eleazar Eskin

AbstractWe introduce pleiotropic association test (PAT) for joint analysis of multiple traits using GWAS summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect.Additionally, simulations comparing PAT to two multi-trait methods, HIPO and MTAG show PAT having a 43.0% increase in the number of omnibus associations over the other methods. When these associations are interpreted on a per trait level using m-values, PAT has 52.2% more per trait interpretations with a 0.57% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT identifies 22,095 novel associated variants. Through the m-values interpretation framework, the number of total per trait associations for two traits are almost tripled and are nearly doubled for another trait relative to the original single trait GWAS.


2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Farid Rajabli ◽  
Gul Inan ◽  
Ozlem Ilk

In family-based genetic association studies, it is possible to encounter missing genotype information for one of the parents. This leads to a study consisting of both case-parent trios and case-parent pairs. One of the approaches to this problem is permutation-based combined transmission disequilibrium test statistic. However, it is still unknown how powerful this test statistic is with small sample sizes. In this paper, a simulation study is carried out to estimate the power and false positive rate of this test across different sample sizes for a family-based genome-wide association study. It is observed that a statistical power of over 80% and a reasonable false positive rate estimate can be achieved even with a combination of 50 trios and 30 pairs when 2% of the SNPs are assumed to be associated. Moreover, even smaller samples provide high power when smaller percentages of SNPs are associated with the disease.


2002 ◽  
Vol 41 (01) ◽  
pp. 37-41 ◽  
Author(s):  
S. Shung-Shung ◽  
S. Yu-Chien ◽  
Y. Mei-Due ◽  
W. Hwei-Chung ◽  
A. Kao

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.


1993 ◽  
Vol 32 (02) ◽  
pp. 175-179 ◽  
Author(s):  
B. Brambati ◽  
T. Chard ◽  
J. G. Grudzinskas ◽  
M. C. M. Macintosh

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.


Sign in / Sign up

Export Citation Format

Share Document