Error bounds for learning the kernel
The problem of learning the kernel function has received considerable attention in machine learning. Much of the work has focused on kernel selection criteria, particularly on minimizing a regularized error functional over a prescribed set of kernels. Empirical studies indicate that this approach can enhance statistical performance and is computationally feasible. In this paper, we present a theoretical analysis of its generalization error. We establish for a wide variety of classes of kernels, such as the set of all multivariate Gaussian kernels, that this learning method generalizes well and, when the regularization parameter is appropriately chosen, it is consistent. A central role in our analysis is played by the interaction between the sample error and the approximation error.