Emotionrecognitionhasattractedalotofinterestinrecentyearsinvariousapplicationareassuchashealth- care and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiologicalsignals.However,recentstudiesarelookingatmultimodaltechniquesthatcombinedif- ferent modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative train- ingmethodbasedoncross-modaldistillation,i.e.,“FedCMD”(FederatedCross-ModalDistillation).Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main ad- vantageofFListhatonlylocaldataisusedfortraining,thusmaintainingprivacyandprovidingasecureand efficientemotionrecognitionsystem.ThelocalmodelinFListrainedforeachvehicledevicewithunlabeled videodatabyusingsensordataasaproxy.Specifically,foreachlocalmodel,weshowhowdriveremotional annotationscanbetransferredfromthesensordomaintothevisualdomainbyusingcross-modaldistillation. Thekeyideaisbasedontheobservationthatadriver’semotionalstateindicatedbyasensorcorrelateswith facial expressions shown in videos. The proposed “FedCMD” approach is tested on the multimodal dataset “BioVidEmoDB”andachievesstate-of-the-artperformance.Experimentalresultsshowthatourapproachis robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our modelismuchmorerobusttooverfitting,resultinginbettergeneralizationthantheotherexistingmethods.
FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers’ Emotion Recognition
Saira Bano;Nicola Tonellotto;
2024-01-01
Abstract
Emotionrecognitionhasattractedalotofinterestinrecentyearsinvariousapplicationareassuchashealth- care and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiologicalsignals.However,recentstudiesarelookingatmultimodaltechniquesthatcombinedif- ferent modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative train- ingmethodbasedoncross-modaldistillation,i.e.,“FedCMD”(FederatedCross-ModalDistillation).Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main ad- vantageofFListhatonlylocaldataisusedfortraining,thusmaintainingprivacyandprovidingasecureand efficientemotionrecognitionsystem.ThelocalmodelinFListrainedforeachvehicledevicewithunlabeled videodatabyusingsensordataasaproxy.Specifically,foreachlocalmodel,weshowhowdriveremotional annotationscanbetransferredfromthesensordomaintothevisualdomainbyusingcross-modaldistillation. Thekeyideaisbasedontheobservationthatadriver’semotionalstateindicatedbyasensorcorrelateswith facial expressions shown in videos. The proposed “FedCMD” approach is tested on the multimodal dataset “BioVidEmoDB”andachievesstate-of-the-artperformance.Experimentalresultsshowthatourapproachis robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our modelismuchmorerobusttooverfitting,resultinginbettergeneralizationthantheotherexistingmethods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.