Data stream mining has recently grown in popularity, thanks to an increasing number of applications which need continuous and fast analysis of streaming data. Such data are generally produced in application domains that require immediate reactions with strict temporal constraints. These particular characteristics make problematic the use of classical machine learning algorithms for mining knowledge from these fast data streams and call for appropriate techniques. In this paper, based on the well-known Hoeffding Decision Tree (HDT) for streaming data classification, we introduce FHDT, a fuzzy HDT that extends HDT with fuzziness, thus making HDT more robust to noisy and vague data. We tested FHDT on three synthetic datasets, usually adopted for analyzing concept drifts in data stream classification, and two real-world datasets, already exploited in some recent researches on fuzzy systems for streaming data. We show that FHDT outperforms HDT, especially in presence of concept drift. Furthermore, FHDT is characterized by a high level of interpretability, thanks to the linguistic rules that can be extracted from it.
Fuzzy hoeffding decision tree for data stream classification
Ducange P.
;Marcelloni F.;
2021-01-01
Abstract
Data stream mining has recently grown in popularity, thanks to an increasing number of applications which need continuous and fast analysis of streaming data. Such data are generally produced in application domains that require immediate reactions with strict temporal constraints. These particular characteristics make problematic the use of classical machine learning algorithms for mining knowledge from these fast data streams and call for appropriate techniques. In this paper, based on the well-known Hoeffding Decision Tree (HDT) for streaming data classification, we introduce FHDT, a fuzzy HDT that extends HDT with fuzziness, thus making HDT more robust to noisy and vague data. We tested FHDT on three synthetic datasets, usually adopted for analyzing concept drifts in data stream classification, and two real-world datasets, already exploited in some recent researches on fuzzy systems for streaming data. We show that FHDT outperforms HDT, especially in presence of concept drift. Furthermore, FHDT is characterized by a high level of interpretability, thanks to the linguistic rules that can be extracted from it.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.