Clustering refers to the process of unsupervised partitioning of a data set based on a dissimilarity measure, which determines the cluster shape. Considering that cluster shapes may change from one cluster to another, it would be of the utmost importance to extract the dissimilarity measure directly from the data by means of a data model. On the other hand, a model construction requires some kind of supervision of the data structure, which is exactly what we look for during clustering. So, the lower the supervision degree used to build the data model, the more it makes sense to resort to a data model for clustering purposes. Conscious of this, we propose to exploit very few pairs of patterns with known dissimilarity to build a TS system which models the dissimilarity relation. Among other things, the rules of the TS system provide an intuitive description of the dissimilarity relation itself. Then we use the TS system to build a dissimilarity matrix which is fed as input to an unsupervised fuzzy relational clustering algorithm, denoted any relation clustering algorithm (ARCA), which partitions the data set based on the proximity of the vectors containing the dissimilarity values between each pattern and all the other patterns in the data set. We show that combining the TS system and the ARCA algorithm allows us to achieve high classification performance on a synthetic data set and on two real data sets. Further, we discuss how the rules of the TS system represent a sort of linguistic description of the dissimilarity relation.

A Novel Approach to Fuzzy Clustering based on a Dissimilarity Relation extracted from Data using a TS System

CIMINO, MARIO GIOVANNI COSIMO ANTONIO;LAZZERINI, BEATRICE;MARCELLONI, FRANCESCO
2006-01-01

Abstract

Clustering refers to the process of unsupervised partitioning of a data set based on a dissimilarity measure, which determines the cluster shape. Considering that cluster shapes may change from one cluster to another, it would be of the utmost importance to extract the dissimilarity measure directly from the data by means of a data model. On the other hand, a model construction requires some kind of supervision of the data structure, which is exactly what we look for during clustering. So, the lower the supervision degree used to build the data model, the more it makes sense to resort to a data model for clustering purposes. Conscious of this, we propose to exploit very few pairs of patterns with known dissimilarity to build a TS system which models the dissimilarity relation. Among other things, the rules of the TS system provide an intuitive description of the dissimilarity relation itself. Then we use the TS system to build a dissimilarity matrix which is fed as input to an unsupervised fuzzy relational clustering algorithm, denoted any relation clustering algorithm (ARCA), which partitions the data set based on the proximity of the vectors containing the dissimilarity values between each pattern and all the other patterns in the data set. We show that combining the TS system and the ARCA algorithm allows us to achieve high classification performance on a synthetic data set and on two real data sets. Further, we discuss how the rules of the TS system represent a sort of linguistic description of the dissimilarity relation.
2006
Cimino, MARIO GIOVANNI COSIMO ANTONIO; Lazzerini, Beatrice; Marcelloni, Francesco
File in questo prodotto:
File Dimensione Formato  
A novel approach_postprint.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/179403
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 13
social impact