Regionalization of hydrological model parameters using gradient boosting machine
Abstract. Regionalization of hydrological model parameters is key to hydrological predictions in ungauged basins. The commonly used multiple linear regression (MLR) method may not be applicable in complex and nonlinear relationships between model parameters and watershed properties. Moreover, most regionalization methods assume lumped parameters for each catchment without considering within-catchment heterogeneity. Here we incorporated the Penman-Monteith-Leuning (PML) equation into the Distributed Time-Variant Gain Model (DTVGM) to improve the mechanistic representation of the evapotranspiration process. We calibrated six key model parameters grid-by-grid across China using a multivariable calibration strategy, which incorporates spatiotemporal runoff and evapotranspiration (ET) datasets (0.25°, monthly) as reference. In addition, we used the gradient boosting machine (GBM), a machine learning technique, to portray the dependence of model parameters on soil and terrain attributes in four distinct climatic zones across China. We show that the modified DTVGM could reasonably estimate the runoff and ET over China using the calibrated parameters, but performed better in humid than arid regions for the validation period. The regionalized parameters by the GBM method exhibited better spatial coherence relative to the calibrated grid-by-grid parameters. In addition, GBM outperformed the stepwise MLR method in both parameter regionalization and gridded runoff simulations at national scale, though the improvement is not significant pertaining to watershed streamflow validation due to most of the watersheds being located in humid regions. We also revealed that the slope, saturated soil moisture content, and elevation are the most important explanatory variables to inform model parameters based on the GBM approach. The machine-learning-based regionalization approach provides an effective alternative to deriving hydrological model parameters by using watershed properties in ungauged regions.