Machine Learning-Based Model Selection and Parameter Estimation from Kinetic Data of Complex First-Order Reaction Systems
<a>Dealing with a system of first-order reactions is a recurrent problem in chemometrics, especially in the analysis of data obtained by spectroscopic methods. Here we argue that global multiexponential fitting, the still common way to solve this kind of problems has serious weaknesses, in contrast to the available contemporary methods of sparse modeling. Combining the advantages of group-lasso and elastic net – the statistical methods proven to be very powerful in other areas – we obtained an optimization problem tunable to result in from very sparse to very dense distribution over a large pre-defined grid of time constants, fitting both simulated and experimental multiwavelength spectroscopic data with very high performance. Moreover, it was found that the optimal values of the tuning hyperparameters can be selected by a machine-learning algorithm based on a Bayesian optimization procedure, utilizing a widely used and a novel version of cross-validation. The applied algorithm recovered very exactly the true sparse kinetic parameters of an extremely complex simulated model of the bacteriorhodopsin photocycle, as well as the wide peak of hypothetical distributed kinetics in the presence of different levels of noise. It also performed well in the analysis of the ultrafast experimental fluorescence kinetics data detected on the coenzyme FAD in a very wide logarithmic time window.</a>