CINECA IRIS Institutional Research Information System

Fuzzy associative classification has not been widely analyzed in the literature, although associative classifiers (ACs) have proved to be very effective in different real domain applications. The main reason is that learning fuzzy ACs is a very heavy task, especially when dealing with large datasets. To overcome this drawback, in this paper, we propose an efficient distributed fuzzy associative classification approach based on the MapReduce paradigm. The approach exploits a novel distributed discretizer based on fuzzy entropy for efficiently generating fuzzy partitions of the attributes. Then, a set of candidate fuzzy association rules is generated by employing a distributed fuzzy extension of the well-known FP-Growth algorithm. Finally, this set is pruned by using three purposely adapted types of pruning. We implemented our approach on the popular Hadoop framework. Hadoop allows distributing storage and processing of very large data sets on computer clusters built from commodity hardware. We have performed an extensive experimentation and a detailed analysis of the results using six very large datasets with up to 11 000 000 instances. We have also experimented different types of reasoning methods. Focusing on accuracy, model complexity, computation time, and scalability, we compare the results achieved by our approach with those obtained by two distributed nonfuzzy ACs recently proposed in the literature. We highlight that, although the accuracies result to be comparable, the complexity, evaluated in terms of number of rules, of the classifiers generated by the fuzzy distributed approach is lower than the one of the nonfuzzy classifiers.

A Distributed Fuzzy Associative Classifier for Big Data

Segatori, Armando;Bechini, Alessio;Ducange, Pietro;Marcelloni, Francesco

2018-01-01

Abstract

Fuzzy associative classification has not been widely analyzed in the literature, although associative classifiers (ACs) have proved to be very effective in different real domain applications. The main reason is that learning fuzzy ACs is a very heavy task, especially when dealing with large datasets. To overcome this drawback, in this paper, we propose an efficient distributed fuzzy associative classification approach based on the MapReduce paradigm. The approach exploits a novel distributed discretizer based on fuzzy entropy for efficiently generating fuzzy partitions of the attributes. Then, a set of candidate fuzzy association rules is generated by employing a distributed fuzzy extension of the well-known FP-Growth algorithm. Finally, this set is pruned by using three purposely adapted types of pruning. We implemented our approach on the popular Hadoop framework. Hadoop allows distributing storage and processing of very large data sets on computer clusters built from commodity hardware. We have performed an extensive experimentation and a detailed analysis of the results using six very large datasets with up to 11 000 000 instances. We have also experimented different types of reasoning methods. Focusing on accuracy, model complexity, computation time, and scalability, we compare the results achieved by our approach with those obtained by two distributed nonfuzzy ACs recently proposed in the literature. We highlight that, although the accuracies result to be comparable, the complexity, evaluated in terms of number of rules, of the classifiers generated by the fuzzy distributed approach is lower than the one of the nonfuzzy classifiers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TCYB.2017.2748225
			
	Tutti gli autori
	
						Segatori, Armando; Bechini, Alessio; Ducange, Pietro; Marcelloni, Francesco
					
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
DFAC_FFP_postprint.pdf accesso aperto Descrizione: post-print version of the manuscript Tipologia: Documento in Post-print Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.52 MB Formato Adobe PDF Visualizza/Apri	1.52 MB	Adobe PDF	Visualizza/Apri
J10.pdf solo utenti autorizzati Tipologia: Versione finale editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.51 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.51 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/940077

Citazioni

1

46

37

social impact