Voice activity detection based on sequential Gaussian mixture model with maximum likelihood criterion

Author(s):  
Zhan Shen ◽  
Jianguo Wei ◽  
Wenhuan Lu ◽  
Jianwu Dang
2011 ◽  
Vol 58-60 ◽  
pp. 1847-1853 ◽  
Author(s):  
Yan Zhang ◽  
Cun Bao Chen ◽  
Li Zhao

In this paper, Gaussian Mixture model (GMM) as specific method is applied to noise classification. On this basis, a modified Gaussian Mixture Model with an embedded Auto-Associate Neural Network (AANN) is proposed. It integrates the merits of GMM and AANN. We train GMM and AANN as a whole and they are trained by means of Maximum Likelihood (ML). In the process of training, the parameter of GMM and AANN are updated alternately. AANN reshapes the distribution of the data and improves the similarity of the feature data in the same distribution type of noise. Experiments show that the GMM with embedded AANN improves accuracy rate of noise classification against baseline GMM.


2014 ◽  
Vol 23 (4) ◽  
pp. 359-378
Author(s):  
M. S. Rudramurthy ◽  
V. Kamakshi Prasad ◽  
R. Kumaraswamy

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.


2017 ◽  
Vol 16 (1) ◽  
pp. 7567-7572
Author(s):  
Sukhvinder Kaur ◽  
J. S. Sohal

In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concluded that neural network based SADcomparatively better than Gaussian mixture model based SAD.


Sign in / Sign up

Export Citation Format

Share Document