Wiener Gain and Deep Neural Networks: A Well-Balanced Pair For Speech Enhancement
Abstract This paper proposes a Deep Neural Network (DNN)-based Wiener gain estimator for speech enhancement. The proposal is in the framework of the classical spectral-domain speech enhancement algorithms. In this case, we used the Optimal Modified Log-Spectral Amplitude (OMLSA), but consider that this proposal could fit many alternative speech estimation algorithms. We determined the best usage of the DNN approach at learning a robust instance of the Wiener gain estimator according to the characteristics of the SNR estimation and the gain function. To design a DNN architecture adjusted for the speech enhancement task, we study various configuration issues frequently used in DNN-based solutions, including speech representations, residual connections, and causal vs. non-causal designs. Thus, we provide conclusions for the use of DNN architectures with the enhancement purpose. Experiments show that the proposal provides results on the state-of-the-art. But beyond the objective quality measures, there are examples of noisy vs. enhanced speech available for listening to demonstrate in practice the skills of the method in real audio.