Speaker‐independent vowel classification based on fundamental frequency and formant frequencies

1987 ◽  
Vol 81 (S1) ◽  
pp. S93-S93
Author(s):  
James Hillenbrand ◽  
Robert T. Gayvert
ALQALAM ◽  
2015 ◽  
Vol 32 (2) ◽  
pp. 284
Author(s):  
Muhammad Subali ◽  
Miftah Andriansyah ◽  
Christanto Sinambela

This article aims to look at the similarities and differences in the fundamental frequency and formant frequencies using the autocorrelation function and LPCfunction in GUI MATLAB 2012b on sound hijaiyah letters for adult male speaker beginner and expert based on makhraj pronunciation and both of speaker will be analysis on matching distance of the sound use DTW method on cepstrum. Subject for speech beginner makhraj pronunciation are taken from college student of Universitas Gunadarma and SITC aged 22 years old Data of the speech beginner makhraj pronunciation is recorded using MATLAB algorithm on GUI Subject for speech expert makhraj pronunciation are taken from previous research. They are 20-30 years old from the time of taking data. The sound will be extracted to get the value of the fundamental frequency and formant frequency. After getting both frequencies, it will be obtained analysis of the similarities and differences in the fundamental frequency and formant frequencies of speech beginner and expert and it will shows matching distance of both speech. The result is all of speech beginner and expert based on makhraj pronunciation have different values of fundamental frequency and formant frequency. Then the results of the analysis matching distance using method DTW showed that obtained in the range of 28.9746 to 136.4 between speech beginner and expert based on makhraj pronunciation. Keywords: fundamental frequency, formant frequency, hijaiyah letters, makhraj


Animals ◽  
2018 ◽  
Vol 8 (10) ◽  
pp. 167 ◽  
Author(s):  
Anton Baotic ◽  
Maxime Garcia ◽  
Markus Boeckle ◽  
Angela Stoeger

African savanna elephants live in dynamic fission–fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the ‘rumble’, with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral–temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant’s natural habitat.


Author(s):  
Yeptain Leung ◽  
Jennifer Oates ◽  
Siew-Pang Chan ◽  
Viktória Papp

Purpose The aim of the study was to examine associations between speaking fundamental frequency ( f os ), vowel formant frequencies ( F ), listener perceptions of speaker gender, and vocal femininity–masculinity. Method An exploratory study was undertaken to examine associations between f os , F 1 – F 3 , listener perceptions of speaker gender (nominal scale), and vocal femininity–masculinity (visual analog scale). For 379 speakers of Australian English aged 18–60 years, f os mode and F 1 – F 3 (12 monophthongs; total of 36 F s) were analyzed on a standard reading passage. Seventeen listeners rated speaker gender and vocal femininity–masculinity on randomized audio recordings of these speakers. Results Model building using principal component analysis suggested the 36 F s could be succinctly reduced to seven principal components (PCs). Generalized structural equation modeling (with the seven PCs of F and f os as predictors) suggested that only F 2 and f os predicted listener perceptions of speaker gender (male, female, unable to decide). However, listener perceptions of vocal femininity–masculinity behaved differently and were predicted by F 1 , F 3 , and the contrast between monophthongs at the extremities of the F 1 acoustic vowel space, in addition to F 2 and f os . Furthermore, listeners' perceptions of speaker gender also influenced ratings of vocal femininity–masculinity substantially. Conclusion Adjusted odds ratios highlighted the substantially larger contribution of F to listener perceptions of speaker gender and vocal femininity–masculinity relative to f os than has previously been reported.


1992 ◽  
Vol 35 (1) ◽  
pp. 88-95 ◽  
Author(s):  
John Ryalls ◽  
Annie Larouche

Ten normally hearing and 10 age-matched subjects with moderate-to-severe hearing impairment were recorded producing a protocol of 18 basic syllables [/pi/,/pa/,/pu/; /bi/,/ba/,/bu/; /ti/,/ta/,/tu/; /di/,/da/,/du/; /ki/,/ka/,/ku/; /gi/,/ga/,/gu/] repeated five times. The resulting 90 syllables were digitized and measured for (a) total duration; (b) voice-onset time (VOT) of the initial consonant; (c) fundamental frequency (F 0 ) at midpoint of vowel; and (d) formant frequencies (F 1 , F 2 , F 3 ), also measured at midpoint of vowel. Statistical comparisons were conducted on (a) average values for each syllable, and (b) standard deviations. Although there were numerical differences between normally hearing and hearing-impaired groups, few differences were statistically significant.


1998 ◽  
Vol 87 (2) ◽  
pp. 595-600 ◽  
Author(s):  
S. P. Whiteside

This experiment assessed whether fundamental frequency or formant frequencies have more perceptual salience in the identification of the sex of the speaker from synthesized vowels. Four sets of ten vowels were synthesized by combining fundamental frequencies and formant frequencies with different permutations 50 listeners took part in a listening test. Analysis of the listening test scores suggested that for 36 vowels, the fundamental frequency (F0) was probably the most salient perceptual cue. For the remaining four vowels, however, this was not the case as either the formant frequencies or the onset-offset patterns of the F0 appeared to have some perceptual salience.


2020 ◽  
Vol 63 (6) ◽  
pp. 1658-1674
Author(s):  
Lucie Ménard ◽  
Amélie Prémont ◽  
Pamela Trudeau-Fisette ◽  
Christine Turgeon ◽  
Mark Tiede

Objective We aimed to investigate the production of contrastive emphasis in French-speaking 4-year-olds and adults. Based on previous work, we predicted that, due to their immature motor control abilities, preschool-aged children would produce smaller articulatory differences between emphasized and neutral syllables than adults. Method Ten 4-year-old children and 10 adult French speakers were recorded while repeating /bib/, /bub/, and /bab/ sequences in neutral and contrastive emphasis conditions. Synchronous recordings of tongue movements, lip and jaw positions, and speech signals were made. Lip positions and tongue shapes were analyzed; formant frequencies, amplitude, fundamental frequency, and duration were extracted from the acoustic signals; and between-vowel contrasts were calculated. Results Emphasized vowels were higher in pitch, intensity, and duration than their neutral counterparts in all participants. However, the effect of contrastive emphasis on lip position was smaller in children. Prosody did not affect tongue position in children, whereas it did in adults. As a result, children's productions were perceived less accurately than those of adults. Conclusion These findings suggest that 4-year-old children have not yet learned to produce hypoarticulated forms of phonemic goals to allow them to successfully contrast syllables and enhance prosodic saliency.


2017 ◽  
Vol 39 (1) ◽  
pp. 67-87 ◽  
Author(s):  
KJELLRUN T. ENGLUND

ABSTRACTAn established finding in research on infant-directed speech (IDS) is that vowels are hyperarticulated compared to adult-directed speech (ADS). Studies showing this investigate point vowels, leaving us with a rather weak foundation for concluding whether IDS vowels are hyperarticulated within a particular language. The aim of this study was to investigate a large sample of vowels in IDS and to elicit speech in a natural situation for mother and infant. Acoustical and statistical analyses for /æ:, æ, ø:, ɵ, o:, ɔ, y:, y, ʉ:, ʉ, e:, ɛ/ show a selective increase in formant frequencies for some vowel qualities. In addition, vowels had higher fundamental frequency and were generally longer in IDS, but the difference between long and short vowels were comparable between IDS and ADS. With an additional front articulation and less lip protrusion in IDS compared to ADS, it is argued that IDS is hypoarticulated.


1993 ◽  
Vol 36 (4) ◽  
pp. 694-700 ◽  
Author(s):  
James Hillenbrand ◽  
Robert T. Gayvert

A quadratic discriminant classification technique was used to classify spectral measurements from vowels spoken by men, women, and children. The parameters used to train the discriminant classifier consisted of various combinations of fundamental frequency and the three lowest formant frequencies. Several nonlinear auditory transforms were evaluated. Unlike previous studies using a linear discriminant classifier, there was no advantage in category separability for any of the nonlinear auditory transforms over a linear frequency scale, and no advantage for spectral distances over absolute frequencies. However, it was found that parameter sets using nonlinear transforms and spectral differences reduced the differences between phonetically equivalent tokens produced by different groups of talkers.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Santiago Barreda

Abstract Fast Track is a formant tracker implemented in Praat that attempts to automatically select the best analysis from a set of candidates. The best track is selected by modeling smooth formant contours across the entirety of the sound, providing the researcher with rich information about static and dynamic formant properties. Fast Track returns text files containing acoustic information (formant frequencies, formant bandwidths, fundamental frequency, etc.) sampled every 2 ms, generates images showing the winning analysis and comparing alternate analyses, and creates log files detailing analysis information for each file. Fast Track features a modular workflow that allows for analysis steps to be run (and re-run) independently as necessary, and is designed to allow for easy correction of tracking errors by allowing the user to override the automatic analysis, or manually edit tracks where necessary. In addition, Fast Track includes tools to aggregate data across tokens, and to easily create vowel plots of mean values or time-varying formant contours. The design and use of Fast Track are outlined using a re-analysis of the Hillenbrand et al. (1995) dataset, which suggests that Fast Track can be very accurate in cases where signal properties allow for reliable formant estimates.


Sign in / Sign up

Export Citation Format

Share Document