Speech analysis has been proposed for the characterization of subjects' mood state. Specifically, prosodie features have been found to carry information about subjects' depression severity as well as subjects' status in bipolar disorders. In such applications, the subjects have to be monitored continuously, in naturalistic scenarios and not only in the clinical setting. For this reason, it is important to test the robustness of feature extraction approaches against noise as well as to assess their performances as applied to running speech. In this work, the performance of an algorithm designed to estimate speech features from running speech are evaluated on a speech database, containing an associated electroglottographic signal. The algorithm consists of an automatic segmentation step, to detect voiced segments at syllable level, and a speech feature estimation step based on a spectral matching approach. Relevant parameters pertaining voiced segments identification are optimized. The performance of the algorithm in estimating speech features is tested against different noise sources. The chosen speech features are those related to fundamental frequency and its variability, as jitter and standard deviation estimated at syllables level. The results show the good performance of the algorithm in estimating fundamental frequency related features also in noisy environments. Preliminary results on bipolar patients, recorded in different mood states, are shown. Pairwise statistical comparison between different mood states revealed significant differences in fundamental frequency and jitter. A significant effect of the speech task performed by the subjects is observed.
Analysis of running speech for the characterization of mood state in bipolar patients
GUIDI, ANDREA;SCILINGO, ENZO PASQUALE;LANDINI, LUIGI;VANELLO, NICOLA
2016-01-01
Abstract
Speech analysis has been proposed for the characterization of subjects' mood state. Specifically, prosodie features have been found to carry information about subjects' depression severity as well as subjects' status in bipolar disorders. In such applications, the subjects have to be monitored continuously, in naturalistic scenarios and not only in the clinical setting. For this reason, it is important to test the robustness of feature extraction approaches against noise as well as to assess their performances as applied to running speech. In this work, the performance of an algorithm designed to estimate speech features from running speech are evaluated on a speech database, containing an associated electroglottographic signal. The algorithm consists of an automatic segmentation step, to detect voiced segments at syllable level, and a speech feature estimation step based on a spectral matching approach. Relevant parameters pertaining voiced segments identification are optimized. The performance of the algorithm in estimating speech features is tested against different noise sources. The chosen speech features are those related to fundamental frequency and its variability, as jitter and standard deviation estimated at syllables level. The results show the good performance of the algorithm in estimating fundamental frequency related features also in noisy environments. Preliminary results on bipolar patients, recorded in different mood states, are shown. Pairwise statistical comparison between different mood states revealed significant differences in fundamental frequency and jitter. A significant effect of the speech task performed by the subjects is observed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.