We derive in this work new upper bounds for estimating the generalization error of kernel classifiers, that is the misclassification rate that the models will perform on new and previously unseen data. Though this paper is more targeted towards the error estimation topic, the generalization error can be obviously exploited, in practice, for model selection purposes as well. The derived bounds are based on Rademacher complexity and result to be particularly useful when a set of unlabeled samples are available, in addition to the (labeled) training examples: we will show that, by exploiting further unlabeled patterns, the confidence term of the conventional Rademacher complexity bound can be reduced by a factor of three. Moreover, the availability of unlabeled examples allows also to obtain further improvements by building localized versions of the hypothesis class containing the optimal classifier.
Unlabeled Patterns to Tighten Rademacher Complexity Error Bounds for Kernel Classifiers
L. Oneto;
2014-01-01
Abstract
We derive in this work new upper bounds for estimating the generalization error of kernel classifiers, that is the misclassification rate that the models will perform on new and previously unseen data. Though this paper is more targeted towards the error estimation topic, the generalization error can be obviously exploited, in practice, for model selection purposes as well. The derived bounds are based on Rademacher complexity and result to be particularly useful when a set of unlabeled samples are available, in addition to the (labeled) training examples: we will show that, by exploiting further unlabeled patterns, the confidence term of the conventional Rademacher complexity bound can be reduced by a factor of three. Moreover, the availability of unlabeled examples allows also to obtain further improvements by building localized versions of the hypothesis class containing the optimal classifier.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.