An improved one-sample log-rank test

2020 ◽  
Vol 29 (10) ◽  
pp. 2814-2829
Author(s):  
Laura Kerschke ◽  
Andreas Faldum ◽  
Rene Schmidt

The one-sample log-rank test allows to compare the survival of a single sample with a prefixed reference survival curve. It naturally applies in single-arm phase IIa trials with time-to-event endpoint. Several authors have described that the original one-sample log-rank test is conservative when sample size is small and have proposed strategies to correct the conservativeness. Here, we propose an alternative approach to improve the one-sample log-rank test. Our new one-sample log-rank statistic is based on the unique transformation of the underlying counting process martingale such that the moments of the limiting normal distribution have no shared parameters. Simulation results show that the new one-sample log-rank test gives type I error rate and power close to the nominal levels also when sample size is small, while relevantly reducing the required sample size to achieve the desired power as compared to current approaches to design studies to compare the survival outcome of a sample with a reference.

2020 ◽  
Vol 53 (2) ◽  
pp. 93-109
Author(s):  
ZHENG WANG ◽  
ALICIA ZHANG ◽  
YUQI CHEN ◽  
QUI TRAN ◽  
CHRIS HOLLAND

The log-rank test is a well-accepted non parametric test in comparing the survival time be- tween experimental and control group in regulatory settings. However, we have observed type I error inflation as high as 28% using the test in the simulation settings we have with even moderate sample sizes. In this paper, we explore several factors that potentially con- tribute to the inflation by simulation. Sample size, randomization ratio and significance levels are found to be influential factors. We propose an alternative log-rank test using an approximate permutation distribution instead of the standard normal distribution. It is shown that type I error is controlled when applying the approximate permutation test to both simple clinical trial designs and complicated group sequential designs.


Author(s):  
Patrick Royston

Most randomized controlled trials with a time-to-event outcome are designed and analyzed assuming proportional hazards of the treatment effect. The sample-size calculation is based on a log-rank test or the equivalent Cox test. Nonproportional hazards are seen increasingly in trials and are recognized as a potential threat to the power of the log-rank test. To address the issue, Royston and Parmar (2016, BMC Medical Research Methodology 16: 16) devised a new “combined test” of the global null hypothesis of identical survival curves in each trial arm. The test, which combines the conventional Cox test with a new formulation, is based on the maximal standardized difference in restricted mean survival time (RMST) between the arms. The test statistic is based on evaluations of RMST over several preselected time points. The combined test involves the minimum p-value across the Cox and RMST-based tests, appropriately standardized to have the correct null distribution. In this article, I outline the combined test and introduce a command, stctest, that implements the combined test. I point the way to additional tools currently under development for power and sample-size calculation for the combined test.


2016 ◽  
Vol 27 (7) ◽  
pp. 2132-2141 ◽  
Author(s):  
Guogen Shan

In an agreement test between two raters with binary endpoints, existing methods for sample size calculation are always based on asymptotic approaches that use limiting distributions of a test statistic under null and alternative hypotheses. These calculated sample sizes may be not reliable due to the unsatisfactory type I error control of asymptotic approaches. We propose a new sample size calculation based on exact approaches which control for the type I error rate. The two exact approaches are considered: one approach based on maximization and the other based on estimation and maximization. We found that the latter approach is generally more powerful than the one based on maximization. Therefore, we present the sample size calculation based on estimation and maximization. A real example from a clinical trial to diagnose low back pain of patients is used to illustrate the two exact testing procedures and sample size determination.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. 4004-4004 ◽  
Author(s):  
Salah-Eddin Al-Batran ◽  
Nils Homann ◽  
Harald Schmalenberg ◽  
Hans-Georg Kopp ◽  
Georg Martin Haag ◽  
...  

4004 Background: The MAGIC trial established perioperative (periop) epirubicin, cisplatin, and 5-FU (ECF) as a standard treatment for patients (pts) with operable esophagogastric cancer, but survival continues to remain poor. FLOT4 (NCT01216644) is a multicenter, randomized, investigator-initiated, phase 3 trial. It compares the docetaxel-based triplet FLOT with the anthracycline-based triplet ECF/ECX as a periop treatment for pts with resectable gastric or GEJ adenocarcinoma. Methods: Eligible pts of stage ≥cT2 and/or cN+ were randomized to either 3 preoperative and 3 post-operative 3-week cycles of ECF/ECX (epirubicin 50 mg/m2, cisplatin 60 mg/m², both d1, and 5-FU 200 mg/m² as continuous infusion or capecitabine 1250 mg/m2 orally d1-21) or 4 pre-operative and 4 post-operative 2-week cycles of FLOT (docetaxel 50 mg/m2, oxaliplatin 85 mg/m², leucovorin 200 mg/m², and 5-FU 2600 mg/m² as 24-hour infusion, all d1). The primary endpoint was overall survival (OS; 80% power; HR of 0.76; 2-sided log-rank test at 5% type I error). Results: Between Aug 2010 and Feb 2015, 716 pts (360 ECF/ECX; 356 FLOT) were randomly allocated. Baseline characteristics were similar between arms (overall, male 74%; median age 62; cT3/T4 81%; cN+ 80%; GEJ 56%). 91% and 37% of pts with ECF/ECX and 90% and 50% with FLOT completed planned pre-operative and post-operative cycles, respectively. Median follow-up was 43 mon. 369 pts died (203 ECF/ECX; 166 FLOT). FLOT improved OS (mOS, 35 mon with ECX/ECF vs. 50 mon with FLOT; HR 0.77 [0.63 - 0.94]; p = 0.012). 3y OS rate was 48% with ECF/ECX and 57% with FLOT. FLOT also improved PFS (mPFS, 18 mon with ECX/ECF vs. 30 mon with FLOT; HR 0.75 [0.62 - 0.91]; p = 0.004). Periop complications were 50% with ECF/ECX and 51% with FLOT. 30- and 90-day mortality was 3% and 8% with ECF/ECX and 2% and 5% with FLOT. There was more G3/4 nausea and vomiting with ECF/ECX and more G3/4 neutropenia with FLOT. Conclusion: Periop FLOT improved outcome in patients with resectable gastric and GEJ cancer compared to periop ECF/ECX. Clinical trial information: NCT01216644.


2014 ◽  
Vol 34 (6) ◽  
pp. 1031-1040 ◽  
Author(s):  
René Schmidt ◽  
Robert Kwiecien ◽  
Andreas Faldum ◽  
Frank Berthold ◽  
Barbara Hero ◽  
...  

2018 ◽  
Vol 31 (Supplement_1) ◽  
pp. 150-150
Author(s):  
Stefan Mönig ◽  
Salah Al-Batran

Abstract Background The MAGIC trial established perioperative (periop) epirubicin, cisplatin, and 5-FU (ECF) as a standard treatment for patients (pts) with operable esophagogastric cancer, but survival continues to remain poor. FLOT4 (NCT01216644) is a multicenter, randomized, investigator-initiated, phase 3 trial (AIO-trial). It compares the docetaxel-based triplet FLOT with the anthracycline-based triplet ECF/ECX as a periop treatment for pts with resectable gastric or GEJ adenocarcinoma. Methods Eligible pts of stage ≥ cT2 and/or cN + were randomized to either 3 preoperative and 3 post-operative 3-week cycles of ECF/ECX (epirubicin 50 mg/m2, cisplatin 60 mg/m², both d1, and 5-FU 200 mg/m² as continuous infusion or capecitabine 1250 mg/m2 orally d1–21) or 4 pre-operative and 4 post-operative 2-week cycles of FLOT (docetaxel 50 mg/m2, oxaliplatin 85 mg/m², leucovorin 200 mg/m², and 5-FU 2600 mg/m² as 24-hour infusion, all d1). The primary endpoint was overall survival (OS; 80% power; HR of 0.76; 2-sided log-rank test at 5% type I error). Results Between Aug 2010 and Feb 2015, 716 pts (360 ECF/ECX; 356 FLOT) were randomly allocated. Baseline characteristics were similar between arms (overall, male 74%; median age 62; cT3/T4 81%; cN + 80%; GEJ 56%). 91% and 37% of pts with ECF/ECX and 90% and 50% with FLOT completed planned pre-operative and post-operative cycles, respectively. Median follow-up was 43 mon. 369 pts died (203 ECF/ECX; 166 FLOT). FLOT improved OS (mOS, 35 mon with ECX/ECF vs. 50 mon with FLOT; HR 0.77 [0.63 - 0.94]; P = 0.012). 3y OS rate was 48% with ECF/ECX and 57% with FLOT. FLOT also improved PFS (mPFS, 18 mon with ECX/ECF vs. 30 mon with FLOT; HR 0.75 [0.62 - 0.91]; P = 0.004). Periop complications were 50% with ECF/ECX and 51% with FLOT. 30- and 90-day mortality was 3% and 8% with ECF/ECX and 2% and 5% with FLOT. There was more G3/4 nausea and vomiting with ECF/ECX and more G3/4 neutropenia with FLOT. Conclusion Periop FLOT improved outcome in patients with resectable gastric and GEJ cancer compared to periop ECF/ECX. Disclosure All authors have declared no conflicts of interest.


Author(s):  
Patrick Royston

Randomized controlled trials with a time-to-event outcome are usually designed and analyzed assuming proportional hazards (PH) of the treatment effect. The sample-size calculation is based on a log-rank test or the nearly identical Cox test, henceforth called the Cox/log-rank test. Nonproportional hazards (non-PH) has become more common in trials and is recognized as a potential threat to interpreting the trial treatment effect and the power of the log-rank test—hence to the success of the trial. To address the issue, in 2016, Royston and Parmar ( BMC Medical Research Methodology 16: 16) proposed a “combined test” of the global null hypothesis of identical survival curves in each trial arm. The Cox/log-rank test is combined with a new test derived from the maximal standardized difference in restricted mean survival time (RMST) between the trial arms. The test statistic is based on evaluations of the between-arm difference in RMST over several preselected time points. The combined test involves the minimum p-value across the Cox/log-rank and RMST-based tests, appropriately standardized to have the correct distribution under the global null hypothesis. In this article, I introduce a new command, power_ct, that uses simulation to implement power and sample-size calculations for the combined test. power_ct supports designs with PH or non-PH of the treatment effect. I provide examples in which the power of the combined test is compared with that of the Cox/log-rank test under PH and non-PH scenarios. I conclude by offering guidance for sample-size calculations in time-to-event trials to allow for possible non-PH.


2021 ◽  
pp. 174077452110101
Author(s):  
Jennifer Proper ◽  
John Connett ◽  
Thomas Murray

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.


Sign in / Sign up

Export Citation Format

Share Document