noisy speech
Recently Published Documents


TOTAL DOCUMENTS

615
(FIVE YEARS 64)

H-INDEX

29
(FIVE YEARS 4)

Author(s):  
Youming Wang ◽  
Jiali Han ◽  
Tianqi Zhang ◽  
Didi Qing

AbstractSpeech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.


2021 ◽  
Author(s):  
Ana Vrankovic ◽  
Ivo Ipsic ◽  
Jonatan Lerga
Keyword(s):  

Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

AbstractThe minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than the speech signal— which lacks the ability to track the highly non-stationary noise sources. Moreover, the performance of the MMSE-based noise PSD estimator largely depends upon the accuracy of the a priori SNR estimation in practice. In this paper, we introduce a noise PSD estimation algorithm using a derivative-based high-pass filter in non-stationary noise conditions. The proposed method processes the silent and speech frames of the noisy speech differently to estimate the noise PSD. It is due to the non-stationary noise that can be mixed with silent and speech-dominated frames non-uniformly. We first introduce a spectral-flatness-based adaptive thresholding technique to detect the speech activity of the noisy speech frames. Since the silent frame of the noisy speech is completely filled with noise, the noise periodogram is directly computed from it without applying any filtering. Conversely, a 4th order derivative-based high-pass filter is applied during speech activity of the noisy speech frame to filter out the clean speech components while leaving behind mostly the noise. The noise periodogram is computed from the filtered signal—which counteracts the leaking of clean speech power. The noise PSD estimate is obtained by recursively averaging the previously estimated noise PSD and the current estimate of the noise periodogram. The proposed method is found to be effective in tracking the rapidly changing as well as the slowly varying noise PSD than the competing methods in non-stationary noise conditions for a wide range of signal-to-noise ratio (SNR) levels. Extensive objective and subjective scores on the NOIZEUS corpus demonstrate that the application of the proposed noise PSD with MMSE-based speech enhancement methods produce higher quality and intelligible enhanced speech than the competing methods.


Signals ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 434-455
Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.


2021 ◽  
Author(s):  
Youming Wang ◽  
Jiali Han ◽  
Tianqi Zhang ◽  
Didi Qing

Abstract Speech is easily interfered by the external environment in reality, which will lose the important features. Deep learning method has become the mainstream method of speech enhancement because of its superior potential in complex nonlinear mapping problems. However, there are some problems are exist such as the deficiency for the learning the important information from previous time steps and long-term event dependencies. Due to the lack of the correlation in the same layer of Deep Neural Networks (DNNs), which is an existing typical intelligent deep model of speech signal, it is difficult to capture the long-term dependence between the time-series data. To overcome this problem, we propose a novel speech enhancement method from fused features based on deep neural network and gated recurrent unit network. The method takes advantage of both deep neural network and recurrent neural network to reduce the number of parameters and simultaneously improve speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of gated recurrent unit (GRU) network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53% respectively compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36% respectively. The advantage of the proposed method is that it uses of the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.


Sign in / Sign up

Export Citation Format

Share Document