Federated Learning has witnessed an increasing popularity in the past few years for its ability to train Machine Learning models in critical contexts, using private data without moving them. Most of the approaches in the literature are focused on mobile environments, where mobile devices contain the data of single users, and typically deal with images or text data. In this paper, we define hcsfedavg, a novel federated learning approach tailored for training machine learning models on data distributed over federated organizations hierarchically organized. Our method focuses on the generalization capabilities of the neural network models, providing a new mechanism for selecting their best weights. In addition, it is tailored for tabular data. We empirically test the performance of our approach on two different tabular datasets, showing excellent results in terms of performance and generalization capabilities. Then, we also tackle the problem of assessing the privacy risk of users represented in the training data. In particular, we empirically show, by attacking the hcsfedavg models with the Membership Inference Attack, that the privacy of the users in the training data may have high risk.
A new approach for cross-silo federated learning and its privacy risks
Fontana, Michele;Naretto, Francesca;Monreale, Anna
2021-01-01
Abstract
Federated Learning has witnessed an increasing popularity in the past few years for its ability to train Machine Learning models in critical contexts, using private data without moving them. Most of the approaches in the literature are focused on mobile environments, where mobile devices contain the data of single users, and typically deal with images or text data. In this paper, we define hcsfedavg, a novel federated learning approach tailored for training machine learning models on data distributed over federated organizations hierarchically organized. Our method focuses on the generalization capabilities of the neural network models, providing a new mechanism for selecting their best weights. In addition, it is tailored for tabular data. We empirically test the performance of our approach on two different tabular datasets, showing excellent results in terms of performance and generalization capabilities. Then, we also tackle the problem of assessing the privacy risk of users represented in the training data. In particular, we empirically show, by attacking the hcsfedavg models with the Membership Inference Attack, that the privacy of the users in the training data may have high risk.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.