One of the most appealing features of fuzzy rule-based classifiers is the capability of explaining how the conclusions are inferred. This feature is hard to preserve when fuzzy rules are extracted from a very large amount of data. In this paper, we propose a distributed version of PAES-RCS, a multiobjective evolutionary approach to learn concurrently the rule and data bases of fuzzy rule-based classifiers by maximizing accuracy and minimizing complexity. PAES-RCS has proven to be very efficient in obtaining satisfactory approximations of the Pareto front exploiting a limited number of iterations. We implemented the distributed version of PAES-RCS by using Apache Spark as data processing framework. We discuss the effectiveness of our approach in terms of classification rate and scalability by performing a number of experiments on three real-world big datasets. Further, we compare our approach with other well-known state-of-art algorithms in terms of both accuracy and complexity, and evaluate the achievable speedup on a small computer cluster. We show that the distributed version can efficiently extract compact rule bases with high accuracy and allows handling big datasets even with modest hardware support.
A Multi-objective evolutionary fuzzy system for big data
MARCELLONI, FRANCESCO;SEGATORI, ARMANDO
2016-01-01
Abstract
One of the most appealing features of fuzzy rule-based classifiers is the capability of explaining how the conclusions are inferred. This feature is hard to preserve when fuzzy rules are extracted from a very large amount of data. In this paper, we propose a distributed version of PAES-RCS, a multiobjective evolutionary approach to learn concurrently the rule and data bases of fuzzy rule-based classifiers by maximizing accuracy and minimizing complexity. PAES-RCS has proven to be very efficient in obtaining satisfactory approximations of the Pareto front exploiting a limited number of iterations. We implemented the distributed version of PAES-RCS by using Apache Spark as data processing framework. We discuss the effectiveness of our approach in terms of classification rate and scalability by performing a number of experiments on three real-world big datasets. Further, we compare our approach with other well-known state-of-art algorithms in terms of both accuracy and complexity, and evaluate the achievable speedup on a small computer cluster. We show that the distributed version can efficiently extract compact rule bases with high accuracy and allows handling big datasets even with modest hardware support.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.