scholarly journals A Study of Acoustic Features for Emotional Speaker Recognition in I-Vector Representation

2015 ◽  
Vol 15 (2) ◽  
pp. 15-20 ◽  
Author(s):  
Lenka Macková ◽  
Anton Čižmár ◽  
Jozef Juhár
2007 ◽  
Vol 14 (3) ◽  
pp. 181-184 ◽  
Author(s):  
Nengheng Zheng ◽  
Tan Lee ◽  
P. C. Ching

2013 ◽  
Vol 38 (1) ◽  
pp. 63-73
Author(s):  
Maciej Karpiński

Abstract Filled pauses (FPs) have proved to be more than valuable cues to speech production processes and important units in discourse analysis. Some aspects of their form and occurrence patterns have been shown to be speaker- and language-specific. In the present study, basic acoustic properties of FPs in Polish task-oriented dialogues are explored. A set of FPs was extracted from a corpus of twenty task- oriented dialogues on the basis of available annotations. After initial scrutiny and selection, a subset of the signals underwent a series of pitch, formant frequency and voice quality analyses. A significant amount of variation found in the realisations of FPs justifies their potential application in speaker recognition systems. Regular monosegmental FPs were confirmed to show relatively stable basic acoustic parameters, which allows for their easy identification and measurements but it may result in less significant differences among the speakers.


Author(s):  
Halim Sayoud ◽  
Siham Ouamour

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.


Sign in / Sign up

Export Citation Format

Share Document