scholarly journals Testing for baseline differences in clinical trials

2020 ◽  
Vol 7 (2) ◽  
pp. 150
Author(s):  
Henian Chen ◽  
Yuanyuan Lu ◽  
Nicole Slye

<p class="abstract">Reporting statistical tests for baseline measures of clinical trials does not make sense since the statistical significance is dependent on sample size, as a large trial can find significance in the same difference that a small trial did not find to be statistically significant. We use 3 published trials using the same baseline measures to provide the relationship between trial sample size and p value. For trial 1 sequential organ failure assessment (SOFA) score, p=0.01, 10.4±3.4 vs. 9.6±3.2, difference=0.8; p=0.007 for vasopressors, 83.0% vs. 72.6%. Trial 2 has SOFA score 11±3 vs. 12±3, difference=1, p=0.42. Trial 3 has vasopressors 73% vs. 83%, p=0.21. Based on trial 2, supine group has a mean of 12 and an SD of 3 for SOFA score, while prone group has a mean of 11 and an SD of 3 for SOFA score. The p values are 0.29850, 0.09877, 0.01940, 0.00094, 0.00005, and &lt;0.00001 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Based on trial 3 information, the vasopressors percentages are 73.0% in the supine group vs. 83.0% in the prone group. The p values are 0.4452, 0.2274, 0.0878, 0.0158, 0.0031, and 0.0006 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Small trials provide larger p values than big trials for the same baseline differences. We cannot define the imbalance in baseline measures only based on these p values. There is no statistical basis for advocating the baseline difference tests</p>

2021 ◽  
pp. bmjebm-2020-111603
Author(s):  
John Ferguson

Commonly accepted statistical advice dictates that large-sample size and highly powered clinical trials generate more reliable evidence than trials with smaller sample sizes. This advice is generally sound: treatment effect estimates from larger trials tend to be more accurate, as witnessed by tighter confidence intervals in addition to reduced publication biases. Consider then two clinical trials testing the same treatment which result in the same p values, the trials being identical apart from differences in sample size. Assuming statistical significance, one might at first suspect that the larger trial offers stronger evidence that the treatment in question is truly effective. Yet, often precisely the opposite will be true. Here, we illustrate and explain this somewhat counterintuitive result and suggest some ramifications regarding interpretation and analysis of clinical trial results.


2018 ◽  
Vol 13 (7) ◽  
pp. 669-672 ◽  
Author(s):  
Mayank Goyal ◽  
Aravind Ganesh ◽  
Scott Brown ◽  
Bijoy K Menon ◽  
Michael D Hill

The modified Rankin Scale (mRS) at 90 days after stroke onset has become the preferred outcome measure in acute stroke trials, including recent trials of interventional therapies. Reporting the range of modified Rankin Scale scores as a paired horizontal stacked bar graph (colloquially known as “Grotta bars”) has become the conventional method of visualizing modified Rankin Scale results. Grotta bars readily illustrate the levels of the ordinal modified Rankin Scale in which benefit may have occurred. However, complementing the available graphical information by including additional features to convey statistical significance may be advantageous. We propose a modification of the horizontal stacked bar graph with illustrative examples. In this suggested modification, the line joining the segments of the bar graph (e.g. modified Rankin Scale 1–2 in treatment arm to modified Rankin Scale 1–2 in control arm) is given a color and thickness based on the p-value of the result at that level (in this example, the p-value of modified Rankin Scale 0–1 vs. 2–6)—a thick green line for p-values <0.01, thin green for p-values of 0.01 to <0.05, gray for 0.05 to <0.10, thin red for 0.10 to <0.90, and thick red for p-values ≥0.90 or outcome favoring the control group. Illustrative examples from four recent trials (ESCAPE, SWIFT-PRIME, IST-3, ASTER) are shown to demonstrate the range of significant and non-significant effects that can be captured using this proposed method. By formalizing a display of outcomes which includes statistical tests of all possible dichotomizations of the Rankin scale, this approach also encourages pre-specification of such hypotheses. Prespecifying tests of all six dichotomizations of the Rankin scale provides all possible statistical information in an a priori fashion. Since the result of our proposed approach is six distinct dichotomized tests in addition to a primary test, e.g. of the ordinal Rankin shift, it may be prudent to account for multiplicity in testing by using dichotomized p-values only after adjustment, such as by the Bonferroni or Hochberg-Holm methods. Whether p-values are nominal or adjusted may be left to the discretion of the presenter as long as the presence or absence is clearly stated in the statistical methods. Our proposed modification results in a visually intuitive summary of both the size of the effect—represented by the matched bars and their connecting segments—as well as its statistical relevance.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 603
Author(s):  
Leonid Hanin

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Sarah E Wetzel-Strong ◽  
Shantel M Weinsheimer ◽  
Jeffrey Nelson ◽  
Ludmila Pawlikowska ◽  
Dewi Clark ◽  
...  

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.


1998 ◽  
Vol 26 (2) ◽  
pp. 57-65 ◽  
Author(s):  
R Kay

If a trial is to be well designed, and the conclusions drawn from it valid, a thorough understanding of the benefits and pitfalls of basic statistical principles is required. When setting up a trial, appropriate sample-size calculation is vital. If initial calculations are inaccurate, trial results will be unreliable. The principle of intent-to-treat in comparative trials is examined. Randomization as a method of selecting patients to treatment is essential to ensure that the treatment groups are equalized in terms of avoiding biased allocation in the mix of patients within groups. Once trial results are available the correct calculation and interpretation of the P-value is important. Its limitations are examined, and the use of the confidence interval to help draw valid conclusions regarding the clinical value of treatments is explored.


2020 ◽  
Vol 5 (1) ◽  
pp. 36-42
Author(s):  
Yeviza Puspitasari

Hyperbilirubinemia is one of the clinical phenomena most often found in neonates occurring in the first week of life, which is also one of the factors causing infant death is influenced by the immature liver function of the baby to process erythrocytes (red blood cells), resulting in the accumulation of bilirubin. The purpose of this study was to determine the relationship of birth weight of infants with the incidence of hyperbilirubinemia in RSUD dr. Ibnu Soetowo Baturaja Ogan Komering Ulu Regency in 2019. This study uses analytic methods with a cross-sectional approach. The study population was all infants aged 0-7 days in the neonatal room at RSUD dr. Ibnu Soetowo Baturaja Ogan Komering Ulu Regency in 2019, with a random sampling. Data analysis uses univariate analysis and bivariate analysis using distribution tables and Chi-Square statistical tests, with a 95% confidence level. In the univariate analysis, of 203 respondents found 26.5% had hyperbilirubinemia and those without hyperbilirubinemia 72.5%, 24.6% of infants with LBW and non-LBW infants 75.4%. Bivariate analysis showed that there was an LBW relationship with the incidence of hyperbilirubinemia (p-value 0,000).


2020 ◽  
Vol 2 (1) ◽  
pp. 31-36
Author(s):  
Wa Ode Hajrah ◽  
Niken Purbowati ◽  
Novia Nuraini

erineal rupture needs attention because it can cause dysfunction of the female reproductive organs, as a source of bleeding, a source, or a way in and out of infection, then it can cause death due to bleeding or sepsis. About 85% of Women who delivery vaginally experience perineal rupture, in the age group 25-30 years 24%, while in maternal age 32-39 years by 62%. In Asia, perineal rupture is also a problem in society, 50 % of the world's occurrence is in Asia. The study aims to determine the relationship of maternal factors to the position of the second stage labor and perineal rupture occurrence. This research applied a descriptive-analytic method using a cross-sectional research design. The research sample was 102 respondents, accidental random sampling, which was all labor with perineal rupture in July to November 2018. Statistical tests used chi-square. The results of perineal rupture with maternal age was p-value 0.042 (p <0.05), perineal rupture with maternal parity was p-value 0.01 (p <0.05). Suggestions for various maternal positions in maternity and ANC classes to prevent perineal rupture.


2018 ◽  
Vol 9 (2) ◽  
pp. 1-9
Author(s):  
Neffrety Nilamsari ◽  
Ratih Damayanti ◽  
Erwin Dyah Nawawinetu

Every workplace always has potential hazards. The potential hazards most often found inmanufacturing industries are potential physical hazards that can affect labor productivity. The purposeof this study was to analyze the relationship between working period and age of bead craftsmen withhydration levels. Respondents in this study were 19 workers in PT X Jombang Regency. This researchis an observational study with a cross sectional design. Data collection was conducted from April toJuly 2018. Statistical tests used correlation test in testing urin color indications to determinedehydration levels. The results showed a relationship between the variable work period and the level oflabor hydration with p-value 0.000, age variable with hydration level did not have a relationship withp-value 0.087 where the temperature in the workspace averages 34.1°C. There is a relationship betweenthe length of work and the level of hydration of bead craftsmen and there is no relationship betweenage and level of hydration of bead craftsmen. To reduce the level of hydration status, it isrecommended that every bead craftsman every 2 hours drink as much as 0.5 liters of water, so that theneed for fluids is approximately 2 liter in 8 hours of work can be fulfilled. Companies are advised toadd the amount of ventilation in the workspace to reduce exposure to hot temperatures in theworkspace. Keywords: Hydration level, working period, age


2021 ◽  
Author(s):  
Huseyin Duman ◽  
Doğan Uğur Şanlı

&lt;p&gt;In the analysis of GNSS time series, when the sampling frequency and time-series lengths are almost identical, it is possible to highlight a linear relationship between the series repeatabilities (i.e. WRMS) and noise magnitudes. In the literature, linear equations as a function of WRMSs allowed many researchers to estimate the noise magnitudes. However, this was built upon homoskedasticity. We experienced the higher WRMSs, the more erroneous analysis results using the noise magnitudes from the linear equations stated. We hence studied whether or not homoscedasticity clearly describes the modeling errors. To test that, we used the published results of GPS baseline components from the previous work in the literature and realized here that each component forms part of the totality. We introduced all baseline component results as a whole into statistical analysis to check heteroskedasticity. We established null and alternative hypotheses on the residuals which are homoscedastic (H0) or heteroskedastic (HA). We adopted both the Breusch-Pagan test and the Goldfeld-Quandt test to prove heteroskedasticity and obtained p-values for both methods. The p-value, which is the probability measure, equals to almost zero for both test methods, that is, we fail to accept the null hypothesis. Consequently, we can confidently state that the relationship between the WRMSs and the noise magnitudes is heteroskedastic.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; Noise magnitudes, repeatabilities, heteroskedasticity, time-series analysis&lt;/p&gt;


2018 ◽  
Vol 35 (8) ◽  
pp. 810-817 ◽  
Author(s):  
Tushar Gupta ◽  
Michael A. Puskarich ◽  
Elizabeth DeVos ◽  
Adnan Javed ◽  
Carmen Smotherman ◽  
...  

Objectives: Early organ dysfunction in sepsis confers a high risk of in-hospital mortality, but the relative contribution of specific types of organ failure to overall mortality is unclear. The objective of this study was to assess the predictive ability of individual types of organ failure to in-hospital mortality or prolonged intensive care. Methods: Retrospective cohort study of adult emergency department patients with sepsis from October 1, 2013, to November 10, 2015. Multivariable regression was used to assess the odds ratios of individual organ failure types for the outcomes of in-hospital death (primary) and in-hospital death or ICU stay ≥ 3 days (secondary). Results: Of 2796 patients, 283 (10%) experienced in-hospital mortality, and 748 (27%) experienced in-hospital mortality or an ICU stay ≥ 3 days. The following components of Sequential Organ Failure Assessment (SOFA) score were most predictive of in-hospital mortality (descending order): coagulation (odds ratio [OR]: 1.60, 95% confidence interval [CI]: 1.32-1.93), hepatic (1.58, 95% CI: 1.32-1.90), respiratory (OR: 1.33, 95% CI: 1.21-1.47), neurologic (OR: 1.20, 95% CI: 1.07-1.35), renal (OR: 1.14, 95% CI: 1.02-1.27), and cardiovascular (OR: 1.13, 95% CI: 1.01-1.25). For mortality or ICU stay ≥3 days, the most predictive SOFA components were respiratory (OR: 1.97, 95% CI: 1.79-2.16), neurologic (OR: 1.72, 95% CI: 1.54-1.92), cardiovascular (OR: 1.38, 95% CI: 1.23-1.54), coagulation (OR: 1.31, 95% CI: 1.10-1.55), and renal (OR: 1.19, 95% CI: 1.08-1.30) while hepatic SOFA (OR: 1.16, 95% CI: 0.98-1.37) did not reach statistical significance ( P = .092). Conclusion: In this retrospective study, SOFA score components demonstrated varying predictive abilities for mortality in sepsis. Elevated coagulation or hepatic SOFA scores were most predictive of in-hospital death, while an elevated respiratory SOFA was most predictive of death or ICU stay >3 days.


Sign in / Sign up

Export Citation Format

Share Document