Testing for baseline differences in clinical trials
<p class="abstract">Reporting statistical tests for baseline measures of clinical trials does not make sense since the statistical significance is dependent on sample size, as a large trial can find significance in the same difference that a small trial did not find to be statistically significant. We use 3 published trials using the same baseline measures to provide the relationship between trial sample size and p value. For trial 1 sequential organ failure assessment (SOFA) score, p=0.01, 10.4±3.4 vs. 9.6±3.2, difference=0.8; p=0.007 for vasopressors, 83.0% vs. 72.6%. Trial 2 has SOFA score 11±3 vs. 12±3, difference=1, p=0.42. Trial 3 has vasopressors 73% vs. 83%, p=0.21. Based on trial 2, supine group has a mean of 12 and an SD of 3 for SOFA score, while prone group has a mean of 11 and an SD of 3 for SOFA score. The p values are 0.29850, 0.09877, 0.01940, 0.00094, 0.00005, and <0.00001 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Based on trial 3 information, the vasopressors percentages are 73.0% in the supine group vs. 83.0% in the prone group. The p values are 0.4452, 0.2274, 0.0878, 0.0158, 0.0031, and 0.0006 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Small trials provide larger p values than big trials for the same baseline differences. We cannot define the imbalance in baseline measures only based on these p values. There is no statistical basis for advocating the baseline difference tests</p>