scholarly journals DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66\% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR, over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.

2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66\% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR, over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.


2021 ◽  
Author(s):  
Sujan Kumar Roy ◽  
Aaron Nicolson ◽  
Kuldip K. Paliwal

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network (MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. Here, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70\% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous method.


Signals ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 434-455
Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.


SLEEP ◽  
2021 ◽  
Author(s):  
Samaneh Nasiri ◽  
Gari D Clifford

Abstract Current approaches to automated sleep staging from the electroencephalogram (EEG) rely on constructing a large labeled training and test corpora by aggregating data from different individuals. However, many of the subjects in the training set may exhibit changes in the EEG that are very different from the subjects in the test set. Training an algorithm on such data without accounting for this diversity can cause underperformance. Moreover, test data may have unexpected sensor misplacement or different instrument noise and spectral responses. This work proposes a novel method to learn relevant individuals based on their similarities effectively. The proposed method embeds all training patients into a shared and robust feature space. Individuals that share strong statistical relationships and are similar based on their EEG signals are clustered in this feature space before being passed to a deep learning framework for classification. Using 994 patient EEGs from the 2018 Physionet Challenge (≈ 6,561 hours of recording), we demonstrate that the clustering approach significantly boosts performance compared to state-of-the-art deep learning approaches. The proposed method improves, on average, a precision score from 0.72 to 0.81, a sensitivity score from 0.74 to 0.82, and a Cohen’s Kappa coefficient from 0.64 to 0.75 under 10-fold cross-validation.


2021 ◽  
Vol 12 ◽  
Author(s):  
Alexander C. Constantino ◽  
Nathaniel D. Sisterson ◽  
Naoir Zaher ◽  
Alexandra Urban ◽  
R. Mark Richardson ◽  
...  

Background: Decision-making in epilepsy surgery is strongly connected to the interpretation of the intracranial EEG (iEEG). Although deep learning approaches have demonstrated efficiency in processing extracranial EEG, few studies have addressed iEEG seizure detection, in part due to the small number of seizures per patient typically available from intracranial investigations. This study aims to evaluate the efficiency of deep learning methodology in detecting iEEG seizures using a large dataset of ictal patterns collected from epilepsy patients implanted with a responsive neurostimulation system (RNS).Methods: Five thousand two hundred and twenty-six ictal events were collected from 22 patients implanted with RNS. A convolutional neural network (CNN) architecture was created to provide personalized seizure annotations for each patient. Accuracy of seizure identification was tested in two scenarios: patients with seizures occurring following a period of chronic recording (scenario 1) and patients with seizures occurring immediately following implantation (scenario 2). The accuracy of the CNN in identifying RNS-recorded iEEG ictal patterns was evaluated against human neurophysiology expertise. Statistical performance was assessed via the area-under-precision-recall curve (AUPRC).Results: In scenario 1, the CNN achieved a maximum mean binary classification AUPRC of 0.84 ± 0.19 (95%CI, 0.72–0.93) and mean regression accuracy of 6.3 ± 1.0 s (95%CI, 4.3–8.5 s) at 30 seed samples. In scenario 2, maximum mean AUPRC was 0.80 ± 0.19 (95%CI, 0.68–0.91) and mean regression accuracy was 6.3 ± 0.9 s (95%CI, 4.8–8.3 s) at 20 seed samples. We obtained near-maximum accuracies at seed size of 10 in both scenarios. CNN classification failures can be explained by ictal electro-decrements, brief seizures, single-channel ictal patterns, highly concentrated interictal activity, changes in the sleep-wake cycle, and progressive modulation of electrographic ictal features.Conclusions: We developed a deep learning neural network that performs personalized detection of RNS-derived ictal patterns with expert-level accuracy. These results suggest the potential for automated techniques to significantly improve the management of closed-loop brain stimulation, including during the initial period of recording when the device is otherwise naïve to a given patient's seizures.


2020 ◽  
Vol 12 (1) ◽  
pp. 90-108
Author(s):  
Mahmoud Kalash ◽  
Mrigank Rochan ◽  
Noman Mohammed ◽  
Neil Bruce ◽  
Yang Wang ◽  
...  

In this article, the authors propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses serious security threats to financial institutions, businesses, and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples. Nowadays, machine learning approaches are becoming popular for malware classification. However, most of these approaches are based on shallow learning algorithms (e.g. SVM). Recently, convolutional neural networks (CNNs), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Inspired by this, the authors propose a CNN-based architecture to classify malware samples. They convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, namely Malimg and Microsoft, demonstrate that their method outperforms competing state-of-the-art algorithms.


Author(s):  
Sujan Kumar Roy ◽  
Kuldip K. Paliwal

The inaccurate estimates of linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrades speech enhancement performance. The existing methods proposed a tuning of the biased Kalman gain particularly in stationary noise condition. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then construct a whitening filter (with its coefficients computed from the estimated noise) and employed to the noisy speech, yielding a pre-whitened speech, from where the speech LPC parameters are computed. Then construct KF with the estimated parameters, where the robustness metric offsets the bias in Kalman gain during speech absence to that of the sensitivity metric during speech presence to achieve better noise reduction. Where the noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on NOIZEUS corpus demonstrates that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.


Android OS, which is the most prevalent operating system (OS), has enjoyed immense popularity for smart phones over the past few years. Seizing this opportunity, cybercrime will occur in the form of piracy and malware. Traditional detection does not suffice to combat newly created advanced malware. So, there is a need for smart malware detection systems to reduce malicious activities risk. Machine learning approaches have been showing promising results in classifying malware where most of the method are shallow learners like Random Forest (RF) in recent years. In this paper, we propose Deep-Droid as a deep learning framework, for detection Android malware. Hence, our Deep-Droid model is a deep learner that outperforms exiting cutting-edge machine learning approaches. All experiments performed on two datasets (Drebin-215 & Malgenome-215) to assess our Deep-Droid model. The results of experiments show the effectiveness and robustness of Deep-Droid. Our Deep-Droid model achieved accuracy over 98.5%.


Whatever the modern achievement of deep learning for several terminology processing tasks, single-microphone, speaker-independent speech separation remains difficult for just two main things. The rest point is that the arbitrary arrangement of the goal and masker speakers in the combination (permutation problem), and also the following is the unidentified amount of speakers in the mix (output issue). We suggest a publication profound learning framework for speech modification, which handles both issues. We work with a neural network to project the specific time-frequency representation with the mixed-signal to a high-dimensional categorizing region. The time-frequency embeddings of the speaker have then made to an audience around corresponding attractor stage that is employed to figure out the time-frequency assignment with this speaker identifying a speaker using a blend of speakers together with the aid of neural networks employing deep learning. The purpose function for your machine is standard sign renovation error that allows finishing functioning throughout both evaluation and training periods. We assessed our system with all the voices of users three and two speaker mixes and also document similar or greater performance when compared with another advanced level, deep learning approaches for speech separation.


Sign in / Sign up

Export Citation Format

Share Document