Error bounds for learning the kernel

2016 ◽  
Vol 14 (06) ◽  
pp. 849-868 ◽  
Author(s):  
Charles A. Micchelli ◽  
Massimiliano Pontil ◽  
Qiang Wu ◽  
Ding-Xuan Zhou

The problem of learning the kernel function has received considerable attention in machine learning. Much of the work has focused on kernel selection criteria, particularly on minimizing a regularized error functional over a prescribed set of kernels. Empirical studies indicate that this approach can enhance statistical performance and is computationally feasible. In this paper, we present a theoretical analysis of its generalization error. We establish for a wide variety of classes of kernels, such as the set of all multivariate Gaussian kernels, that this learning method generalizes well and, when the regularization parameter is appropriately chosen, it is consistent. A central role in our analysis is played by the interaction between the sample error and the approximation error.

2016 ◽  
Vol 28 (1) ◽  
pp. 71-88 ◽  
Author(s):  
Hongzhi Tong

We present a better theoretical foundation of support vector machines with polynomial kernels. The sample error is estimated under Tsybakov’s noise assumption. In bounding the approximation error, we take advantage of a geometric noise assumption that was introduced to analyze gaussian kernels. Compared with the previous literature, the error analysis in this note does not require any regularity of the marginal distribution or smoothness of Bayes’ rule. We thus establish the learning rates for polynomial kernels for a wide class of distributions.


2011 ◽  
Vol 65 (1) ◽  
pp. 125-144 ◽  
Author(s):  
Ching-Sheng Chiu ◽  
Chris Rizos

In a car navigation system the conventional information used to guide drivers in selecting their driving routes typically considers only one criterion, usually the Shortest Distance Path (SDP). However, drivers may apply multiple criteria to decide their driving routes. In this paper, possible route selection criteria together with a Multi Objective Path Optimisation (MOPO) model and algorithms for solving the MOPO problem are proposed. Three types of decision criteria were used to present the characteristics of the proposed model. They relate to the cumulative SDP, passed intersections (Least Node Path – LNP) and number of turns (Minimum Turn Path – MTP). A two-step technique which incorporates shortest path algorithms for solving the MOPO problem was tested. To demonstrate the advantage that the MOPO model provides drivers to assist in route selection, several empirical studies were conducted using two real road networks with different roadway types. With the aid of a Geographic Information System (GIS), drivers can easily and quickly obtain the optimal paths of the MOPO problem, despite the fact that these paths are highly complex and difficult to solve manually.


2019 ◽  
Vol 19 (01) ◽  
pp. 107-124
Author(s):  
Fusheng Lv ◽  
Jun Fan

Correntropy-based learning has achieved great success in practice during the last decades. It is originated from information-theoretic learning and provides an alternative to classical least squares method in the presence of non-Gaussian noise. In this paper, we investigate the theoretical properties of learning algorithms generated by Tikhonov regularization schemes associated with Gaussian kernels and correntropy loss. By choosing an appropriate scale parameter of Gaussian kernel, we show the polynomial decay of approximation error under a Sobolev smoothness condition. In addition, we employ a tight upper bound for the uniform covering number of Gaussian RKHS in order to improve the estimate of sample error. Based on these two results, we show that the proposed algorithm using varying Gaussian kernel achieves the minimax rate of convergence (up to a logarithmic factor) without knowing the smoothness level of the regression function.


2012 ◽  
Vol 4 (2) ◽  
Author(s):  
Joanna Karpińska ◽  
Krzysztof Tchoń

For redundant robotic manipulators, we study the design problem of Jacobian inverse kinematics algorithms of desired performance. A specific instance of the problem is addressed, namely the optimal approximation of the Jacobian pseudo-inverse algorithm by the extended Jacobian algorithm. The approximation error functional is derived for the coordinate-free representation of the manipulator’s kinematics. A variational formulation of the problem is employed, and the approximation error is minimized by means of the Ritz method. The optimal extended Jacobian algorithm is designed for the 7 degrees of freedom (dof) POLYCRANK manipulator. It is concluded that the coordinate-free kinematics representation results in more accurate approximation than the coordinate expression of the kinematics.


2014 ◽  
Vol 26 (1) ◽  
pp. 158-184 ◽  
Author(s):  
Hongzhi Tong ◽  
Di-Rong Chen ◽  
Fenghong Yang

We consider a kind of kernel-based regression with general convex loss functions in a regularization scheme. The kernels used in the scheme are not necessarily symmetric and thus are not positive semidefinite; l1−norm of the coefficients in the kernel ensembles is taken as the regularizer. Our setting in this letter is quite different from the classical regularized regression algorithms such as regularized networks and support vector machines regression. Under an established error decomposition that consists of approximation error, hypothesis error, and sample error, we present a detailed mathematical analysis for this scheme and, in particular, its learning rate. A reweighted empirical process theory is applied to the analysis of produced learning algorithms, which plays a key role in deriving the explicit learning rate under some assumptions.


2021 ◽  
Vol 4 (6) ◽  
pp. 1-36
Author(s):  
Zeljko Kereta ◽  
◽  
Valeriya Naumova

<abstract><p>Despite recent advances in regularization theory, the issue of parameter selection still remains a challenge for most applications. In a recent work the framework of statistical learning was used to approximate the optimal Tikhonov regularization parameter from noisy data. In this work, we improve their results and extend the analysis to the elastic net regularization. Furthermore, we design a data-driven, automated algorithm for the computation of an approximate regularization parameter. Our analysis combines statistical learning theory with insights from regularization theory. We compare our approach with state-of-the-art parameter selection criteria and show that it has superior accuracy.</p></abstract>


Author(s):  
Youfa Li ◽  
Jing Shang ◽  
Gengrong Zhang ◽  
Pei Dang

By applying the multiscale method to the Möbius transformation function, we construct the multiscale analytic sampling approximation (MASA) to any function in the Hardy space [Formula: see text]. The approximation error is estimated, and it is proved that the MASA is robust to sample error. We prove that the MASA can be expressed by a Hankel matrix, making use of which, a fast algorithm is established to compute the MASA. Since what we acquire in practice may well be the samples on time domain instead of the analytic ones on the unit disc of the complex plane, we establish a fast algorithm for acquiring analytic samples. Numerical experiments are carried out to demonstrate the efficiency of the MASA.


2021 ◽  
Author(s):  
Shuo Yang ◽  
Songhua Wu ◽  
Tongliang Liu ◽  
Min Xu

A major gap between few-shot and many-shot learning is the data distribution empirically observed by the model during training. In few-shot learning, the learned model can easily become over-fitted based on the biased distribution formed by only a few training examples, while the ground-truth data distribution is more accurately uncovered in many-shot learning to learn a well-generalized model. In this paper, we propose to calibrate the distribution of these few-sample classes to be more unbiased to alleviate such an over-fitting problem. The distribution calibration is achieved by transferring statistics from the classes with sufficient examples to those few-sample classes. After calibration, an adequate number of examples can be sampled from the calibrated distribution to expand the inputs to the classifier. Extensive experiments on three datasets, miniImageNet, tieredImageNet, and CUB, show that a simple linear classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy by a large margin. We also establish a generalization error bound for the proposed distribution-calibration-based few-shot learning, which consists of the distribution assumption error, the distribution approximation error, and the estimation error. This generalization error bound theoretically justifies the effectiveness of the proposed method.


1996 ◽  
Vol 8 (4) ◽  
pp. 819-842 ◽  
Author(s):  
Partha Niyogi ◽  
Federico Girosi

Feedforward networks together with their training algorithms are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed into two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of learning functions belonging to certain Sobolev spaces with gaussian radial basis functions. Using the above-mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for radial basis functions, a number of observations deriving from it apply to any approximation technique. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem and the kinds of problems that can be effectively solved with finite resources, i.e., with a finite number of parameters and finite amounts of data.


2017 ◽  
Vol 55 (1) ◽  
pp. 32-56 ◽  
Author(s):  
Luitzen de Boer

Purpose The purpose of this paper is to present three heuristics for choosing supplier selection criteria. By considering the balance between the expected relative effort and benefit of using different selection criteria, the heuristics suggest which criteria should be prioritized. The heuristics serve to develop our understanding of the search and evaluation heuristics used in supplier selection and to facilitate further research. Design/methodology/approach The research is primarily theoretical, yet draws on empirical studies of supplier selection. The theoretical basis is Simon’s notion of procedural rationality (Simon, 1976). The author makes the general notion of procedural rationality more concrete for supplier selection by formally describing three heuristics for choosing selection criteria. The heuristics share the same logic but differ in terms of the precision of the input information required from the purchaser. The paper provides illustrations of the heuristics. Findings It appears that procedural rationality can be specified for the process of designing the supplier selection process by explicitly recognizing the cost and value of selection criteria. There is no one way of doing this, but at the most basic level, it requires an ordinal ranking of criteria. Already such a rudimentary, qualitative, assessment can help identifying suitable criteria. The heuristics developed appear compatible with established approaches for the subsequent selection of suppliers. Originality/value The paper addresses the early stage of supplier selection which has been largely ignored in the literature.


Sign in / Sign up

Export Citation Format

Share Document