CINECA IRIS Institutional Research Information System

Big Data Mining, the availability of effective and efficient classifiers is a prime concern. Accurate classification results can be obtained with sophisticated models, e.g. using ensembling approaches and exploiting concepts of fuzzy set theory, but with an high computational cost. The quest for efficiency leads to the adoption of distributed versions of classification algorithms, and in this effort the support of proper cluster computing frameworks can be fundamental. In this paper it is proposed DFRF, a novel distributed fuzzy random forest induction algorithm, based on a fuzzy discretizer for continuous attributes. The described approach, although shaped on the MapReduce programming model, takes advantage of the implicit distribution of the computation provided by the Apache Spark framework. An extensive experimental characterization of the algorithm over Big Datasets, along with a comparison with other state-of-the-art fuzzy classification algorithms, shows that DFRF provides very competitive results; moreover, a scalability study carried out on a small computer cluster shows that the approach is well behaved with respect to an increment in the number of available computing units.

Implicitly Distributed Fuzzy Random Forests

Marco Barsacchi;Alessio Bechini;Francesco Marcelloni

2021-01-01

Abstract

Big Data Mining, the availability of effective and efficient classifiers is a prime concern. Accurate classification results can be obtained with sophisticated models, e.g. using ensembling approaches and exploiting concepts of fuzzy set theory, but with an high computational cost. The quest for efficiency leads to the adoption of distributed versions of classification algorithms, and in this effort the support of proper cluster computing frameworks can be fundamental. In this paper it is proposed DFRF, a novel distributed fuzzy random forest induction algorithm, based on a fuzzy discretizer for continuous attributes. The described approach, although shaped on the MapReduce programming model, takes advantage of the implicit distribution of the computation provided by the Apache Spark framework. An extensive experimental characterization of the algorithm over Big Datasets, along with a comparison with other state-of-the-art fuzzy classification algorithms, shows that DFRF provides very competitive results; moreover, a scalability study carried out on a small computer cluster shows that the approach is well behaved with respect to an increment in the number of available computing units.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2021
		
	Codice ISBN
	
			9781450381048
		
	Appare nelle tipologie:
	
			4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1066477

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact