In recent years, several clustering algorithms have been proposed with the aim of mining knowledge from streams of data generated at a high speed by a variety of hardware platforms and software applications. Among these algorithms, density-based approaches have proved to be particularly attractive, thanks to their capability of handling outliers and capturing clusters with arbitrary shapes. The streaming setting poses additional challenges that need to be addressed as well: data streams are potentially unbounded and affected by concept drift, i.e. a modification over time in the underlying data generation process. In this paper, we propose Temporal Streaming Fuzzy DBSCAN (TSF-DBSCAN), a novel fuzzy clustering algorithm for streaming data. TSF-DBSCAN is an extension of the well-known DBSCAN algorithm, one of the most popular density-based clustering approaches. Fuzziness is introduced in TSF-DBSCAN to model the uncertainty about the distance threshold that defines the neighborhood of an object. As a consequence, TSF-DBSCAN identifies clusters with fuzzy overlapping borders. A fading model, which makes objects less relevant as they become more remote in time, endows TSF-DBSCAN with the capability of adapting to evolving data streams. The integration of the model in a two-stage approach ensures computational and memory efficiency: during the online stage continuously arriving objects are organized in proper data structures that are later exploited in the offline stage to determine a fine-grained partition. An extensive experimental analysis on synthetic and real world datasets shows that TSF-DBSCAN yields competitive performance when compared to other clustering algorithms recently proposed for streaming data.

TSF-DBSCAN: a Novel Fuzzy Density-based Approach for Clustering Unbounded Data Streams

Bechini, Alessio;Marcelloni, Francesco;Renda, Alessandro
2022-01-01

Abstract

In recent years, several clustering algorithms have been proposed with the aim of mining knowledge from streams of data generated at a high speed by a variety of hardware platforms and software applications. Among these algorithms, density-based approaches have proved to be particularly attractive, thanks to their capability of handling outliers and capturing clusters with arbitrary shapes. The streaming setting poses additional challenges that need to be addressed as well: data streams are potentially unbounded and affected by concept drift, i.e. a modification over time in the underlying data generation process. In this paper, we propose Temporal Streaming Fuzzy DBSCAN (TSF-DBSCAN), a novel fuzzy clustering algorithm for streaming data. TSF-DBSCAN is an extension of the well-known DBSCAN algorithm, one of the most popular density-based clustering approaches. Fuzziness is introduced in TSF-DBSCAN to model the uncertainty about the distance threshold that defines the neighborhood of an object. As a consequence, TSF-DBSCAN identifies clusters with fuzzy overlapping borders. A fading model, which makes objects less relevant as they become more remote in time, endows TSF-DBSCAN with the capability of adapting to evolving data streams. The integration of the model in a two-stage approach ensures computational and memory efficiency: during the online stage continuously arriving objects are organized in proper data structures that are later exploited in the offline stage to determine a fine-grained partition. An extensive experimental analysis on synthetic and real world datasets shows that TSF-DBSCAN yields competitive performance when compared to other clustering algorithms recently proposed for streaming data.
2022
Bechini, Alessio; Marcelloni, Francesco; Renda, Alessandro
File in questo prodotto:
File Dimensione Formato  
TFS_2020_0031_TSFDBSCAN_preprint.pdf

solo utenti autorizzati

Descrizione: pre-print version
Tipologia: Documento in Pre-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.51 MB
Formato Adobe PDF
2.51 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
TSF-DBSCAN_A_Novel_Fuzzy_Density-Based_Approach_for_Clustering_Unbounded_Data_Streams.pdf

non disponibili

Tipologia: Versione finale editoriale
Licenza: NON PUBBLICO - accesso privato/ristretto
Dimensione 2.94 MB
Formato Adobe PDF
2.94 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
TSF_PostPrint_v2.pdf

accesso aperto

Descrizione: versione post-print
Tipologia: Documento in Post-print
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.28 MB
Formato Adobe PDF
3.28 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1064649
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 47
  • ???jsp.display-item.citation.isi??? 35
social impact