In this paper we propose a new approach to feature selection based on a modified fuzzy C-means algorithm with supervision (MFCMS). MFCMS completes the unsupervised learning of classical fuzzy C-means with labeled patterns. The labeled patterns allow MFCMS to accurately model the shape of each cluster and consequently to highlight the features which result to be particularly effective to characterize a cluster. These features are distinguished by a low variance of their values for the patterns with a high membership degree to the cluster. If, with respect to these features, the distance between the prototype of the cluster and the prototypes of the other clusters is high, then these features have the property of discriminating between the cluster and the other clusters. To take these two aspects into account, for each cluster and each feature, we introduce a purposely defined index: the higher the value of the index, the higher the discrimination capability of the feature for the cluster. We execute MFCMS on the training set considering all patterns as labeled. Then, we retain the features which are associated, at least for one cluster, with an index larger than a threshold τ. We applied MFCMS to several real-world pattern classification benchmarks. We used the well-known k-nearest neighbors as learning algorithm. We show that feature selection performed by MFCMS achieved an improvement in generalization on all data sets.
|Autori interni:||MARCELLONI, FRANCESCO|
|Titolo:||Feature Selection based on a Modified Fuzzy C-means Algorithm with Supervision|
|Anno del prodotto:||2003|
|Digital Object Identifier (DOI):||10.1016/S0020-0255(02)00402-4|
|Appare nelle tipologie:||1.1 Articolo in rivista|