Federated clustering lets multiple data owners collaborate in discovering patterns from distributed data without violating privacy requirements. The federated versions of traditional clustering algorithms proposed so far are, however, “lossy” since they fail to identify exactly the same clusters as the original versions executed on the merged data stored in a centralized server, as would happen if no privacy constraint occurred. In this paper, we propose federated procedures for losslessly executing the C-Means (CM) and the Fuzzy C-Means (FCM) algorithms in both horizontally and vertically partitioned data scenarios, while preserving data privacy. We formally prove that the proposed federated procedures identify the same clusters determined by applying the algorithms to the union of all local data. Further, we present an extensive experimental analysis for characterizing the behavior of the proposed approach in a typical federated learning scenario, that is, as the fraction of participants in the federation changes. We focus on the federated FCM and the horizontally partitioned data, which is the most interesting scenario. We show that the proposed procedure is effective and is able to achieve competitive performance with respect to two recently proposed versions of federated FCM for horizontally partitioned data.
Federated c-means and Fuzzy c-means Clustering Algorithms for Horizontally and Vertically Partitioned Data
Corcuera Barcena, J. L.;Marcelloni, F.;Renda, A.;Bechini, A.;Ducange, P.
2024-01-01
Abstract
Federated clustering lets multiple data owners collaborate in discovering patterns from distributed data without violating privacy requirements. The federated versions of traditional clustering algorithms proposed so far are, however, “lossy” since they fail to identify exactly the same clusters as the original versions executed on the merged data stored in a centralized server, as would happen if no privacy constraint occurred. In this paper, we propose federated procedures for losslessly executing the C-Means (CM) and the Fuzzy C-Means (FCM) algorithms in both horizontally and vertically partitioned data scenarios, while preserving data privacy. We formally prove that the proposed federated procedures identify the same clusters determined by applying the algorithms to the union of all local data. Further, we present an extensive experimental analysis for characterizing the behavior of the proposed approach in a typical federated learning scenario, that is, as the fraction of participants in the federation changes. We focus on the federated FCM and the horizontally partitioned data, which is the most interesting scenario. We show that the proposed procedure is effective and is able to achieve competitive performance with respect to two recently proposed versions of federated FCM for horizontally partitioned data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.