Computation of the acoustic characteristics of vocal-tract models with geometrical perturbation

Author(s):  
Kunitoshi Motoki ◽  
Hiroki Matsuzaki
Author(s):  
Jesús Bernardino Alonso Hernández ◽  
Patricia Henríquez Rodríguez

It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.


2020 ◽  
Vol 5 (5) ◽  
pp. 1339-1346
Author(s):  
Christina Akbari ◽  
Katsura Aoyama

Purpose This study was designed to further investigate epenthetic vowels produced by Persian second language speakers of English. Specifically, the purpose was to compare epenthetic and phonemic vowels to determine if acoustic differences existed or if the epenthetic vowels were quantitative “copies” of their phonemic counterparts. Method Twenty Persian speakers each produced 120 target words. The target words were composed of two different double cluster compositions (obstruent + glide and obstruent + liquid) as well as obstruent + liquid triple clusters and obstruent + glide triple cluster combinations. The target words occurred in a phonetic environment that was either preceded by a consonant /t/ or occurred in isolation. This resulted in 2400 tokens. The tokens underwent Linear Predictive Coding to determine the F1 and F2 formant measurements as well as the durations of the epenthetic and phonemic vowels. Formants are the resonance of the vocal tract. F1 is the lowest-frequency formant while F2 is the next highest ( Kent & Read, 2002 ). Linear Predictive Coding allows for the acoustic signal to be represented spectrally for analysis. Results A total of 236 epenthetic voamp'wels and their phonemic counterparts were acoustically analyzed. The phonemic vowels were found to be significantly longer than the epenthetic vowels. The epenthetic vowels were also found to have significantly lower F1 values. As a group, the mean F2 values were not significantly different from the F2 values of the phonemic vowels. However, significant differences in F2 values were found when specific vowel comparisons were made. Conclusions The data indicate that prothetic epenthetic vowels are not copies of the phonemic vowels that they precede. They differ quantitatively in terms of durations, F1, and F2 values. The findings of this study coincide with the findings of other researchers concerning the acoustic characteristics of anaptyctic epenthetic vowels. These results indicate similarities between prothetic and anaptyctic epenthetic vowels.


2011 ◽  
pp. 1008-1016
Author(s):  
Jesús Bernardino Alonso Hernández ◽  
Patricia Henríquez Rodríguez

It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.


2016 ◽  
Vol 366 ◽  
pp. 556-570 ◽  
Author(s):  
Vojtěch Radolf ◽  
Jaromír Horáček ◽  
Pavel Dlask ◽  
Zdeněk Otčenášek ◽  
Ahmed Geneid ◽  
...  

2001 ◽  
Vol 44 (1) ◽  
pp. 118-127 ◽  
Author(s):  
Michael P. Robb ◽  
Yang Chen ◽  
Harvey R. Gilbert ◽  
Jay W. Lerman

Acoustic characteristics of the vowels /i,u,α/ produced by adult females and males during normal phonation were compared with the same vowels produced on deliberate ingressive airflow (i.e., "reverse" phonation). Results of the analysis revealed the average fundamental frequency (F 0 ) of reverse phonation to be significantly higher than the corresponding normal phonations. There were no significant differences noted in the vocal tract resonance (F1 and F2 frequency) values for /i/ during normal and reverse phonation. However, the F1 values for /α/ were significantly lower, and the F2 values for /u/ significantly higher, during reverse phonation. The results are discussed with regard to differences in the articulatory control of the speech mechanism during reverse phonation as compared to normal expiratory phonation. Also discussed are the implications of using reverse phonation as a voice management technique.


Author(s):  
Brad Story

Precise control of the vocal tract configuration is of critical importance for producing the desired acoustic characteristics of singing. The pattern of acoustic resonances generated by a given vocal tract shape influences vowel identity, voice quality (timbre), and, to some degree, the spectral characteristics of the voice excitation source itself. This chapter is broadly focused on how the vocal tract shape can be tuned (i.e., modified) in subtle ways to enhance the signal radiated from a singer to an audience. In particular, the vocal tract shape contributions to the “singing formant,” the enhancement of vibrato, and harmonic/formant alignment are discussed.


Sign in / Sign up

Export Citation Format

Share Document