CINECA IRIS Institutional Research Information System

Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time.

A MapReduce solution for associative classification of big data

BECHINI, ALESSIO;MARCELLONI, FRANCESCO;SEGATORI, ARMANDO

2016-01-01

Abstract

Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ins.2015.10.041
			
	Tutti gli autori
	
						Bechini, Alessio; Marcelloni, Francesco; Segatori, Armando
					
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Marcelloni_799448.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 1.4 MB Formato Adobe PDF Visualizza/Apri	1.4 MB	Adobe PDF	Visualizza/Apri
R75.pdf solo utenti autorizzati Descrizione: Articolo principale - versione editoriale Tipologia: Versione finale editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.58 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.58 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/799448

Citazioni

ND

107

81

social impact