Double-smoothing in Kernel Hazard Rate Estimation
Summary Objectives: In oncological studies, the hazard rate can be used to differentiate subgroups of the study population according to their patterns of survival risk over time. Nonparametric curve estimation has been suggested as an exploratory means of revealing such patterns. The decision about the type of smoothing parameter is critical for performance in practice. In this paper, we study data-adaptive smoothing. Methods: A decade ago, the nearest-neighbor bandwidth was introduced for censored data in survival analysis. It is specified by one parameter, namely the number of nearest neighbors. Bandwidth selection in this setting has rarely been investigated, although the heuristical advantages over the frequently-studied fixed bandwidth are quite obvious. The asymptotical relationship between the fixed and the nearest-neighbor bandwidth can be used to generate novel approaches. Results: We develop a new selection algorithm termed double-smoothing for the nearest-neighbor bandwidth in hazard rate estimation. Our approach uses a finite sample approximation of the asymptotical relationship between the fixed and nearest-neighbor bandwidth. By so doing, we identify the nearest-neighbor bandwidth as an additional smoothing step and achieve further data-adaption after fixed bandwidth smoothing. We illustrate the application of the new algorithm in a clinical study and compare the outcome to the traditional fixed bandwidth result, thus demonstrating the practical performance of the technique. Conclusion: The double-smoothing approach enlarges the methodological repertoire for selecting smoothing parameters in nonparametric hazard rate estimation. The slight increase in computational effort is rewarded with a substantial amount of estimation stability, thus demonstrating the benefit of the technique for biostatistical applications.