Testing for baseline differences in clinical trials

Henian Chen; Yuanyuan Lu; Nicole Slye

doi:10.18203/2349-3259.ijct20201720

Testing for baseline differences in clinical trials

International Journal of Clinical Trials ◽

10.18203/2349-3259.ijct20201720 ◽

2020 ◽

Vol 7 (2) ◽

pp. 150

Author(s):

Henian Chen ◽

Yuanyuan Lu ◽

Nicole Slye

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Sofa Score ◽

Statistical Tests ◽

Statistical Significance ◽

P Value ◽

Large Trial ◽

P Values ◽

Failure Assessment ◽

The Relationship

Reporting statistical tests for baseline measures of clinical trials does not make sense since the statistical significance is dependent on sample size, as a large trial can find significance in the same difference that a small trial did not find to be statistically significant. We use 3 published trials using the same baseline measures to provide the relationship between trial sample size and p value. For trial 1 sequential organ failure assessment (SOFA) score, p=0.01, 10.4±3.4 vs. 9.6±3.2, difference=0.8; p=0.007 for vasopressors, 83.0% vs. 72.6%. Trial 2 has SOFA score 11±3 vs. 12±3, difference=1, p=0.42. Trial 3 has vasopressors 73% vs. 83%, p=0.21. Based on trial 2, supine group has a mean of 12 and an SD of 3 for SOFA score, while prone group has a mean of 11 and an SD of 3 for SOFA score. The p values are 0.29850, 0.09877, 0.01940, 0.00094, 0.00005, and <0.00001 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Based on trial 3 information, the vasopressors percentages are 73.0% in the supine group vs. 83.0% in the prone group. The p values are 0.4452, 0.2274, 0.0878, 0.0158, 0.0031, and 0.0006 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Small trials provide larger p values than big trials for the same baseline differences. We cannot define the imbalance in baseline measures only based on these p values. There is no statistical basis for advocating the baseline difference tests

Download Full-text

Bayesian interpretation of p values in clinical trials

BMJ evidence-based medicine ◽

10.1136/bmjebm-2020-111603 ◽

2021 ◽

pp. bmjebm-2020-111603

Author(s):

John Ferguson

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Sample Size ◽

Confidence Intervals ◽

Statistical Significance ◽

Large Sample Size ◽

P Values ◽

Clinical Trial Results ◽

Sound Treatment ◽

Counterintuitive Result

Commonly accepted statistical advice dictates that large-sample size and highly powered clinical trials generate more reliable evidence than trials with smaller sample sizes. This advice is generally sound: treatment effect estimates from larger trials tend to be more accurate, as witnessed by tighter confidence intervals in addition to reduced publication biases. Consider then two clinical trials testing the same treatment which result in the same p values, the trials being identical apart from differences in sample size. Assuming statistical significance, one might at first suspect that the larger trial offers stronger evidence that the treatment in question is truly effective. Yet, often precisely the opposite will be true. Here, we illustrate and explain this somewhat counterintuitive result and suggest some ramifications regarding interpretation and analysis of clinical trial results.

Download Full-text

Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings

Mathematics ◽

10.3390/math9060603 ◽

2021 ◽

Vol 9 (6) ◽

pp. 603

Author(s):

Leonid Hanin

Keyword(s):

Sample Size ◽

Gaussian Approximation ◽

Statistical Significance ◽

Statistical Analyses ◽

Random Sample Size ◽

P Values ◽

The Central Limit Theorem ◽

Fixed Sample ◽

Large Numbers ◽

Significance Levels

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.

Download Full-text

Abstract MP11: Circulating Plasma Biomarkers Associated With Brain Arteriovenous Malformations

Stroke ◽

10.1161/str.52.suppl_1.mp11 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Sarah E Wetzel-Strong ◽

Shantel M Weinsheimer ◽

Jeffrey Nelson ◽

Ludmila Pawlikowska ◽

Dewi Clark ◽

...

Keyword(s):

Multiple Testing ◽

Statistical Significance ◽

Protein Profiling ◽

P Value ◽

P Values ◽

Plasma Biomarkers ◽

Standard Curve ◽

Disease States ◽

Heparin Plasma ◽

Circulating Levels

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.

Download Full-text

Statistical Principles for Clinical Trials

Journal of International Medical Research ◽

10.1177/030006059802600201 ◽

1998 ◽

Vol 26 (2) ◽

pp. 57-65 ◽

Cited By ~ 2

Author(s):

R Kay

Keyword(s):

Clinical Trials ◽

Confidence Interval ◽

Sample Size ◽

Sample Size Calculation ◽

P Value ◽

Clinical Value ◽

Treatment Groups ◽

Correct Calculation ◽

Intent To Treat ◽

Comparative Trials

If a trial is to be well designed, and the conclusions drawn from it valid, a thorough understanding of the benefits and pitfalls of basic statistical principles is required. When setting up a trial, appropriate sample-size calculation is vital. If initial calculations are inaccurate, trial results will be unreliable. The principle of intent-to-treat in comparative trials is examined. Randomization as a method of selecting patients to treatment is essential to ensure that the treatment groups are equalized in terms of avoiding biased allocation in the mix of patients within groups. Once trial results are available the correct calculation and interpretation of the P-value is important. Its limitations are examined, and the use of the confidence interval to help draw valid conclusions regarding the clinical value of treatments is explored.

Download Full-text

Kejadian Hiperbilirubinemia Ditinjau dari Berat Badan Lahir

Cendekia Medika ◽

10.52235/cendekiamedika.v5i1.18 ◽

2020 ◽

Vol 5 (1) ◽

pp. 36-42

Author(s):

Yeviza Puspitasari

Keyword(s):

Random Sampling ◽

Statistical Tests ◽

Univariate Analysis ◽

Bivariate Analysis ◽

P Value ◽

Chi Square ◽

Cross Sectional ◽

Study Population ◽

Relationship Of ◽

The Relationship

Hyperbilirubinemia is one of the clinical phenomena most often found in neonates occurring in the first week of life, which is also one of the factors causing infant death is influenced by the immature liver function of the baby to process erythrocytes (red blood cells), resulting in the accumulation of bilirubin. The purpose of this study was to determine the relationship of birth weight of infants with the incidence of hyperbilirubinemia in RSUD dr. Ibnu Soetowo Baturaja Ogan Komering Ulu Regency in 2019. This study uses analytic methods with a cross-sectional approach. The study population was all infants aged 0-7 days in the neonatal room at RSUD dr. Ibnu Soetowo Baturaja Ogan Komering Ulu Regency in 2019, with a random sampling. Data analysis uses univariate analysis and bivariate analysis using distribution tables and Chi-Square statistical tests, with a 95% confidence level. In the univariate analysis, of 203 respondents found 26.5% had hyperbilirubinemia and those without hyperbilirubinemia 72.5%, 24.6% of infants with LBW and non-LBW infants 75.4%. Bivariate analysis showed that there was an LBW relationship with the incidence of hyperbilirubinemia (p-value 0,000).

Download Full-text

Hubungan Faktor Maternal terhadap Posisi pada Waktu Persalinan Kala II dengan Kejadian Ruptur Perineum

Jurnal Bidan Cerdas (JBC) ◽

10.33860/jbc.v2i1.80 ◽

2020 ◽

Vol 2 (1) ◽

pp. 31-36

Author(s):

Wa Ode Hajrah ◽

Niken Purbowati ◽

Novia Nuraini

Keyword(s):

Maternal Age ◽

Statistical Tests ◽

Reproductive Organs ◽

P Value ◽

Chi Square ◽

Cross Sectional ◽

Second Stage ◽

Second Stage Labor ◽

Relationship Of ◽

The Relationship

erineal rupture needs attention because it can cause dysfunction of the female reproductive organs, as a source of bleeding, a source, or a way in and out of infection, then it can cause death due to bleeding or sepsis. About 85% of Women who delivery vaginally experience perineal rupture, in the age group 25-30 years 24%, while in maternal age 32-39 years by 62%. In Asia, perineal rupture is also a problem in society, 50 % of the world's occurrence is in Asia. The study aims to determine the relationship of maternal factors to the position of the second stage labor and perineal rupture occurrence. This research applied a descriptive-analytic method using a cross-sectional research design. The research sample was 102 respondents, accidental random sampling, which was all labor with perineal rupture in July to November 2018. Statistical tests used chi-square. The results of perineal rupture with maternal age was p-value 0.042 (p <0.05), perineal rupture with maternal parity was p-value 0.01 (p <0.05). Suggestions for various maternal positions in maternity and ANC classes to prevent perineal rupture.

Download Full-text

HUBUNGAN MASA KERJA DAN USIA DENGAN TINGKAT HIDRASI PEKERJA PERAJIN MANIK-MANIK DI KABUPATEN JOMBANG

Jurnal Kesehatan Terpadu (Integrated Health Journal) ◽

10.32695/jkt.v2i9.14 ◽

2018 ◽

Vol 9 (2) ◽

pp. 1-9

Author(s):

Neffrety Nilamsari ◽

Ratih Damayanti ◽

Erwin Dyah Nawawinetu

Keyword(s):

Statistical Tests ◽

Hydration Status ◽

P Value ◽

Design Data ◽

Cross Sectional ◽

Hydration Level ◽

Physical Hazards ◽

Hours Of Work ◽

Cross Sectional Design ◽

The Relationship

Every workplace always has potential hazards. The potential hazards most often found inmanufacturing industries are potential physical hazards that can affect labor productivity. The purposeof this study was to analyze the relationship between working period and age of bead craftsmen withhydration levels. Respondents in this study were 19 workers in PT X Jombang Regency. This researchis an observational study with a cross sectional design. Data collection was conducted from April toJuly 2018. Statistical tests used correlation test in testing urin color indications to determinedehydration levels. The results showed a relationship between the variable work period and the level oflabor hydration with p-value 0.000, age variable with hydration level did not have a relationship withp-value 0.087 where the temperature in the workspace averages 34.1°C. There is a relationship betweenthe length of work and the level of hydration of bead craftsmen and there is no relationship betweenage and level of hydration of bead craftsmen. To reduce the level of hydration status, it isrecommended that every bead craftsman every 2 hours drink as much as 0.5 liters of water, so that theneed for fluids is approximately 2 liter in 8 hours of work can be fulfilled. Companies are advised toadd the amount of ventilation in the workspace to reduce exposure to hot temperatures in theworkspace. Keywords: Hydration level, working period, age

Download Full-text

Heteroskedasticity between GNSS time-series repeatabilities and noise magnitudes

10.5194/egusphere-egu21-4702 ◽

2021 ◽

Author(s):

Huseyin Duman ◽

Doğan Uğur Şanlı

Keyword(s):

Time Series ◽

Linear Equations ◽

Test Methods ◽

P Value ◽

P Values ◽

Modeling Errors ◽

Alternative Hypotheses ◽

The Relationship ◽

Baseline Component ◽

Gnss Time Series

In the analysis of GNSS time series, when the sampling frequency and time-series lengths are almost identical, it is possible to highlight a linear relationship between the series repeatabilities (i.e. WRMS) and noise magnitudes. In the literature, linear equations as a function of WRMSs allowed many researchers to estimate the noise magnitudes. However, this was built upon homoskedasticity. We experienced the higher WRMSs, the more erroneous analysis results using the noise magnitudes from the linear equations stated. We hence studied whether or not homoscedasticity clearly describes the modeling errors. To test that, we used the published results of GPS baseline components from the previous work in the literature and realized here that each component forms part of the totality. We introduced all baseline component results as a whole into statistical analysis to check heteroskedasticity. We established null and alternative hypotheses on the residuals which are homoscedastic (H0) or heteroskedastic (HA). We adopted both the Breusch-Pagan test and the Goldfeld-Quandt test to prove heteroskedasticity and obtained p-values for both methods. The p-value, which is the probability measure, equals to almost zero for both test methods, that is, we fail to accept the null hypothesis. Consequently, we can confidently state that the relationship between the WRMSs and the noise magnitudes is heteroskedastic.Keywords: Noise magnitudes, repeatabilities, heteroskedasticity, time-series analysis

Download Full-text

Sequential Organ Failure Assessment Component Score Prediction of In-hospital Mortality From Sepsis

Journal of Intensive Care Medicine ◽

10.1177/0885066618795400 ◽

2018 ◽

Vol 35 (8) ◽

pp. 810-817 ◽

Cited By ~ 3

Author(s):

Tushar Gupta ◽

Michael A. Puskarich ◽

Elizabeth DeVos ◽

Adnan Javed ◽

Carmen Smotherman ◽

...

Keyword(s):

Hospital Mortality ◽

Sequential Organ Failure Assessment ◽

Organ Failure ◽

Sofa Score ◽

Statistical Significance ◽

Predictive Ability ◽

Hospital Death ◽

Icu Stay ◽

Relative Contribution ◽

Failure Assessment

Objectives: Early organ dysfunction in sepsis confers a high risk of in-hospital mortality, but the relative contribution of specific types of organ failure to overall mortality is unclear. The objective of this study was to assess the predictive ability of individual types of organ failure to in-hospital mortality or prolonged intensive care. Methods: Retrospective cohort study of adult emergency department patients with sepsis from October 1, 2013, to November 10, 2015. Multivariable regression was used to assess the odds ratios of individual organ failure types for the outcomes of in-hospital death (primary) and in-hospital death or ICU stay ≥ 3 days (secondary). Results: Of 2796 patients, 283 (10%) experienced in-hospital mortality, and 748 (27%) experienced in-hospital mortality or an ICU stay ≥ 3 days. The following components of Sequential Organ Failure Assessment (SOFA) score were most predictive of in-hospital mortality (descending order): coagulation (odds ratio [OR]: 1.60, 95% confidence interval [CI]: 1.32-1.93), hepatic (1.58, 95% CI: 1.32-1.90), respiratory (OR: 1.33, 95% CI: 1.21-1.47), neurologic (OR: 1.20, 95% CI: 1.07-1.35), renal (OR: 1.14, 95% CI: 1.02-1.27), and cardiovascular (OR: 1.13, 95% CI: 1.01-1.25). For mortality or ICU stay ≥3 days, the most predictive SOFA components were respiratory (OR: 1.97, 95% CI: 1.79-2.16), neurologic (OR: 1.72, 95% CI: 1.54-1.92), cardiovascular (OR: 1.38, 95% CI: 1.23-1.54), coagulation (OR: 1.31, 95% CI: 1.10-1.55), and renal (OR: 1.19, 95% CI: 1.08-1.30) while hepatic SOFA (OR: 1.16, 95% CI: 0.98-1.37) did not reach statistical significance ( P = .092). Conclusion: In this retrospective study, SOFA score components demonstrated varying predictive abilities for mortality in sepsis. Elevated coagulation or hepatic SOFA scores were most predictive of in-hospital death, while an elevated respiratory SOFA was most predictive of death or ICU stay >3 days.

Download Full-text

Testing for baseline differences in clinical trials

Bayesian interpretation of p values in clinical trials

Suggested modification of presentation of stroke trial results

Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings

Abstract MP11: Circulating Plasma Biomarkers Associated With Brain Arteriovenous Malformations

Statistical Principles for Clinical Trials

Kejadian Hiperbilirubinemia Ditinjau dari Berat Badan Lahir

Hubungan Faktor Maternal terhadap Posisi pada Waktu Persalinan Kala II dengan Kejadian Ruptur Perineum

HUBUNGAN MASA KERJA DAN USIA DENGAN TINGKAT HIDRASI PEKERJA PERAJIN MANIK-MANIK DI KABUPATEN JOMBANG

Heteroskedasticity between GNSS time-series repeatabilities and noise magnitudes

Sequential Organ Failure Assessment Component Score Prediction of In-hospital Mortality From Sepsis

Export Citation Format