CINECA IRIS Institutional Research Information System

Federated Learning (FL) is a distributed optimization method in which multiple client nodes collaborate to train a machine learning model without sharing data with a central server. However, communication between numerous clients and the central aggregation server to share model parameters can cause several problems, including latency and network congestion. To address these issues, we propose a scalable communication infrastructure based on Information-Centric Networking built and tested on Apache Kafka®. The proposed architecture consists of a two-tier communication model. In the first layer, client updates are cached at the edge between clients and the server, while in the second layer, the server computes global model updates by aggregating the cached models. The data stored in the intermediate nodes at the edge enables reliable and effective data transmission and solves the problem of intermittent connectivity of mobile nodes. While many local model updates provided by clients can result in a more accurate global model in FL, they can also result in massive data traffic that negatively impacts congestion at the edge. For this reason, we couple a client selection procedure based on a congestion control mechanism at the edge for the given architecture of FL. The proposed algorithm selects a subset of clients based on their resources through a time-based backoff system to account for the time-averaged accuracy of FL while limiting the traffic load. Experiments show that our proposed architecture has an improvement of over 40% over the network-centric based FL architecture, i.e., Flower. The architecture also provides scalability and reliability in the case of mobile nodes. It also improves client resource utilization, avoids overflow, and ensures fairness in client selection. The experiments show that the proposed algorithm leads to the desired client selection patterns and is adaptable to changing network environments.

Artificial intelligence of things at the edge: Scalable and efficient distributed learning for massive scenarios

Bano S.;Tonellotto N.;Cassara P.;Gotta A.

2023-01-01

Abstract

Federated Learning (FL) is a distributed optimization method in which multiple client nodes collaborate to train a machine learning model without sharing data with a central server. However, communication between numerous clients and the central aggregation server to share model parameters can cause several problems, including latency and network congestion. To address these issues, we propose a scalable communication infrastructure based on Information-Centric Networking built and tested on Apache Kafka®. The proposed architecture consists of a two-tier communication model. In the first layer, client updates are cached at the edge between clients and the server, while in the second layer, the server computes global model updates by aggregating the cached models. The data stored in the intermediate nodes at the edge enables reliable and effective data transmission and solves the problem of intermittent connectivity of mobile nodes. While many local model updates provided by clients can result in a more accurate global model in FL, they can also result in massive data traffic that negatively impacts congestion at the edge. For this reason, we couple a client selection procedure based on a congestion control mechanism at the edge for the given architecture of FL. The proposed algorithm selects a subset of clients based on their resources through a time-based backoff system to account for the time-averaged accuracy of FL while limiting the traffic load. Experiments show that our proposed architecture has an improvement of over 40% over the network-centric based FL architecture, i.e., Flower. The architecture also provides scalability and reliability in the case of mobile nodes. It also improves client resource utilization, avoids overflow, and ensures fairness in client selection. The experiments show that the proposed algorithm leads to the desired client selection patterns and is adaptable to changing network environments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.comcom.2023.04.010
			
	Tutti gli autori
	
						Bano, S.; Tonellotto, N.; Cassara, P.; Gotta, A.

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1205590

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

3

social impact