Clustering refers to the process of unsupervised partitioning of a data set based on a dissimilarity measure, which determines the cluster shape. Considering that cluster shapes may change from one cluster to another, it would be of the utmost importance to extract the dissimilarity measure directly from the data by means of a data model. On the other hand, a model construction requires some kind of supervision of the data structure, which is exactly what we look for during clustering. So, the lower the supervision degree used to build the data model, the more it makes sense to resort to a data model for clustering purposes. Conscious of this, we propose to exploit very few pairs of patterns with known dissimilarity to build a TS system which models the dissimilarity relation. Among other things, the rules of the TS system provide an intuitive description of the dissimilarity relation itself. Then we use the TS system to build a dissimilarity matrix which is fed as input to an unsupervised fuzzy relational clustering algorithm, denoted any relation clustering algorithm (ARCA), which partitions the data set based on the proximity of the vectors containing the dissimilarity values between each pattern and all the other patterns in the data set. We show that combining the TS system and the ARCA algorithm allows us to achieve high classification performance on a synthetic data set and on two real data sets. Further, we discuss how the rules of the TS system represent a sort of linguistic description of the dissimilarity relation.
A Novel Approach to Fuzzy Clustering based on a Dissimilarity Relation extracted from Data using a TS System
CIMINO, MARIO GIOVANNI COSIMO ANTONIO;LAZZERINI, BEATRICE;MARCELLONI, FRANCESCO
2006-01-01
Abstract
Clustering refers to the process of unsupervised partitioning of a data set based on a dissimilarity measure, which determines the cluster shape. Considering that cluster shapes may change from one cluster to another, it would be of the utmost importance to extract the dissimilarity measure directly from the data by means of a data model. On the other hand, a model construction requires some kind of supervision of the data structure, which is exactly what we look for during clustering. So, the lower the supervision degree used to build the data model, the more it makes sense to resort to a data model for clustering purposes. Conscious of this, we propose to exploit very few pairs of patterns with known dissimilarity to build a TS system which models the dissimilarity relation. Among other things, the rules of the TS system provide an intuitive description of the dissimilarity relation itself. Then we use the TS system to build a dissimilarity matrix which is fed as input to an unsupervised fuzzy relational clustering algorithm, denoted any relation clustering algorithm (ARCA), which partitions the data set based on the proximity of the vectors containing the dissimilarity values between each pattern and all the other patterns in the data set. We show that combining the TS system and the ARCA algorithm allows us to achieve high classification performance on a synthetic data set and on two real data sets. Further, we discuss how the rules of the TS system represent a sort of linguistic description of the dissimilarity relation.File | Dimensione | Formato | |
---|---|---|---|
A novel approach_postprint.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.