Fuzzy rule-based models have been extensively used in regression problems. Besides high accuracy, one of the most appreciated characteristics of these models is their interpretability, which is generally measured in terms of complexity. Complexity is affected by the number of features used for generating the model: the lower the number of features, the lower the complexity. Feature selection can therefore considerably contribute not only to speed up the learning process, but also to improve the interpretability of the final model. Nevertheless, a very few methods for selecting features before learning regression models have been proposed in the literature. In this paper, we focus on these methods, which perform feature selection as pre-processing step. In particular, we have adapted two state-of-the-art feature selection algorithms, namely NMIFS and CFS, originally proposed for classification, to deal with regression. Further, we have proposed FMIFS, a novel forward sequential feature selection approach, based on the minimal-redundancy-maximal-relevance criterion, which can manage directly fuzzy partitions. The relevance and the redundancy of a feature are measured in terms of, respectively, the fuzzy mutual information between the feature and the output variable, and the average fuzzy mutual information between the feature and the just selected features. The stopping criterion for the sequential selection is based on the average values of relevance and redundancy of the just selected features. We have performed two experiments on twenty regression datasets. In the first experiment, we aimed to show the effectiveness of feature selection in fuzzy rule-based regression model generation by comparing the mean square errors achieved by the fuzzy rule-based models generated using all the features, and the features selected by FMIFS, NMIFS and CFS. In order to avoid possible biases related to the specific algorithm, we adopted the well-known Wang and Mendel algorithm for generating the fuzzy rule-based models. We present that the mean square errors obtained by models generated by using the features selected by FMIFS are on average similar to the values achieved by using all the features and lower than the ones obtained by employing the subset of features selected by NMIFS and CFS. In the second experiment, we intended to evaluate how feature selection can reduce the convergence time of the evolutionary fuzzy systems, which are probably the most effective fuzzy techniques for tackling regression problems. By using a state-of-the-art multi-objective evolutionary fuzzy system based on rule learning and membership function tuning, we show that the number of evaluations can be considerably reduced when pre-processing the dataset by feature selection.

On the influence of feature selection in fuzzy rule-based regression model generation

Ducange, Pietro;MARCELLONI, FRANCESCO;SEGATORI, ARMANDO
2016-01-01

Abstract

Fuzzy rule-based models have been extensively used in regression problems. Besides high accuracy, one of the most appreciated characteristics of these models is their interpretability, which is generally measured in terms of complexity. Complexity is affected by the number of features used for generating the model: the lower the number of features, the lower the complexity. Feature selection can therefore considerably contribute not only to speed up the learning process, but also to improve the interpretability of the final model. Nevertheless, a very few methods for selecting features before learning regression models have been proposed in the literature. In this paper, we focus on these methods, which perform feature selection as pre-processing step. In particular, we have adapted two state-of-the-art feature selection algorithms, namely NMIFS and CFS, originally proposed for classification, to deal with regression. Further, we have proposed FMIFS, a novel forward sequential feature selection approach, based on the minimal-redundancy-maximal-relevance criterion, which can manage directly fuzzy partitions. The relevance and the redundancy of a feature are measured in terms of, respectively, the fuzzy mutual information between the feature and the output variable, and the average fuzzy mutual information between the feature and the just selected features. The stopping criterion for the sequential selection is based on the average values of relevance and redundancy of the just selected features. We have performed two experiments on twenty regression datasets. In the first experiment, we aimed to show the effectiveness of feature selection in fuzzy rule-based regression model generation by comparing the mean square errors achieved by the fuzzy rule-based models generated using all the features, and the features selected by FMIFS, NMIFS and CFS. In order to avoid possible biases related to the specific algorithm, we adopted the well-known Wang and Mendel algorithm for generating the fuzzy rule-based models. We present that the mean square errors obtained by models generated by using the features selected by FMIFS are on average similar to the values achieved by using all the features and lower than the ones obtained by employing the subset of features selected by NMIFS and CFS. In the second experiment, we intended to evaluate how feature selection can reduce the convergence time of the evolutionary fuzzy systems, which are probably the most effective fuzzy techniques for tackling regression problems. By using a state-of-the-art multi-objective evolutionary fuzzy system based on rule learning and membership function tuning, we show that the number of evaluations can be considerably reduced when pre-processing the dataset by feature selection.
2016
Antonelli, Michela; Ducange, Pietro; Marcelloni, Francesco; Segatori, Armando
File in questo prodotto:
File Dimensione Formato  
Marcelloni_799455.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 517.72 kB
Formato Adobe PDF
517.72 kB Adobe PDF Visualizza/Apri
J7.pdf

solo utenti autorizzati

Tipologia: Versione finale editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 689.02 kB
Formato Adobe PDF
689.02 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/799455
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 37
  • ???jsp.display-item.citation.isi??? 32
social impact