Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality

2008 ◽  
Vol 50 (6) ◽  
pp. 531-543 ◽  
Author(s):  
Carlos Toshinori Ishi ◽  
Hiroshi Ishiguro ◽  
Norihiro Hagita
2014 ◽  
Vol 281 (1787) ◽  
pp. 20140480 ◽  
Author(s):  
Michelle J. Spierings ◽  
Carel ten Cate

Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception.


2016 ◽  
Vol 59 (2) ◽  
pp. 216-229 ◽  
Author(s):  
Theresa Schölderle ◽  
Anja Staiger ◽  
Renée Lampe ◽  
Katrin Strecker ◽  
Wolfram Ziegler

Purpose Although dysarthria affects the large majority of individuals with cerebral palsy (CP) and can substantially complicate everyday communication, previous research has provided an incomplete picture of its clinical features. We aimed to comprehensively describe characteristics of dysarthria in adults with CP and to elucidate the impact of dysarthric symptoms on parameters relevant for communication. Method Forty-two adults with CP underwent speech assessment by means of standardized auditory rating scales. Listening experiments were conducted to obtain communication-related parameters—that is, intelligibility and naturalness—as well as age and gender estimates. Results The majority of adults with CP showed moderate to severe dysarthria with symptoms on all dimensions of speech, most prominently voice quality, respiration, and prosody. Regression analyses revealed that articulatory, respiratory, and prosodic features were the strongest predictors of intelligibility and naturalness of speech. Listeners' estimates of the speakers' age and gender were predominantly determined by voice parameters. Conclusion This study provides an overview on the clinical presentation of dysarthria in a convenience sample of adults with CP. The complexity of the functional impairment described and the consequences on the individuals' communication call for a stronger consideration of dysarthria in CP both in clinical care and in research.


2017 ◽  
Vol 7 (2) ◽  
pp. 137-162
Author(s):  
Jenny Yau-ni Wan

Abstract The call centre conversation is a telephonic exchange of voices between the customer and the customer service representative (CSR). Both lexicogrammatical and prosodic features are used to construe emotional and attitudinal recognition. Studying these features can investigate how the call centre discourse is construed, and how the interpersonal meaning takes shape through the text. The spoken data are constructed by Filipino CSRs and American English-speaking customers. The findings show that participants tend to make specific paralinguistic voice quality choices to express their emotions in dialogue. This article first discusses the voice quality framework for its semiotic features in relation to interpersonal meaning, reviews previous voice quality studies and later delineates how voice quality relates to interpersonal meaning in the calls.


2020 ◽  
Vol 63 (8) ◽  
pp. 2578-2588
Author(s):  
Shanpeng Li ◽  
Wentao Gu ◽  
Lei Liu ◽  
Ping Tang

Purpose Sarcasm is a specialized speech act in daily vocal communication usually characterized by unique prosodic features, but the role of voice quality in expressing sarcasm has not been explored much. The goal of this study is to explore the voice quality features of Mandarin sarcastic speech in comparison to sincere speech. Method Fifteen male and 15 female native speakers of Mandarin uttered 31 target sentences with both sincere and sarcastic attitudes. Nine voice quality parameters extracted from the acoustic and electroglottographic signals were analyzed using a linear mixed model, and a classification analysis using a random forest algorithm was conducted to identify the relative contribution of these parameters to the differentiation between sincere and sarcastic utterances. Results In comparison to sincere speech, sarcastic speech had a creakier voice, which was characterized by a lower fundamental frequency, a greater degree of vocal fold adduction (i.e., higher contact quotient), lesser noise (i.e., higher harmonics-to-noise ratio), and more multiple pulsing (i.e., higher subharmonic-to-harmonic ratio). The interaction effect revealed a gender difference in the use of creakier voice to express sarcasm in Mandarin. The classification analysis using the random forest algorithm showed that the nine voice quality parameters resulted in 84.0% and 83.7% identification rates for sarcastic and sincere utterances, respectively. Conclusions The results of this preliminary study support the role of voice quality in expressing sarcasm in Mandarin speech. Using a set of voice quality parameters, sarcastic and sincere utterances can be effectively identified. Furthermore, there is a gender difference in the use of creakier voice in expressing Mandarin sarcastic speech. Supplemental Material https://doi.org/10.23641/asha.12743780


2019 ◽  
Vol 4 (1) ◽  
pp. 42
Author(s):  
Robert Xu

This study examines how prosodic features evoke the spacial aspects of interactional meanings of well-known social types in Mainland China. Prosodic features (duration, pitch, voice quality) of the scripted performances of 18 prominent social types in China were measured acoustically and grouped by cluster analysis. Commonalities among types within each group were identified through a detailed analysis of meta-linguistic commentary collected from the internet. This paper focuses on three meaningful clusters: powerful bureaucratic types, disembodied voices, and “in-your-face” types. Members of each cluster share prosodic combinations and social profiles. More importantly, character types within each cluster index a specific interactional locale. Appropriation of their associated features could reproduce the social dynamics that is typical in that locale. The results highlight the situated use of sociolinguistic variables, and show that the prosodic features pattern structurally in the performances while indexing the historical-spatial settings of social interactions. This paper also considers place as an interactional and relational product of meaning making by these prosodic features.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Tino Haderlein ◽  
Cornelia Schwemmle ◽  
Michael Döllinger ◽  
Václav Matoušek ◽  
Martin Ptok ◽  
...  

Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men;48.7±17.8years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners’ ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r=0.71,ρ=0.57). These correlations were approximately the same as the interrater agreement among human raters (r=0.65,ρ=0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.


Sign in / Sign up

Export Citation Format

Share Document