Neural Comb Filtering using Sliding Window Attention Network for Speech Enhancement

Author(s):  
Venkatesh Parvathala ◽  
Sri Rama Murty Kodukula ◽  
Siva Ganesh Andhavarapu

<div>In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in deep neural network (DNN) based speech enhancement. We propose a sliding-window attention network to regress the spectral magnitude mask (SMM) from the noisy speech signal. Even though the network parameters can be estimated by minimizing the mask loss, it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. The network parameters are estimated using a combination of the mask loss and cepstral loss. The proposed network architecture functions like an adaptive comb filter on voiced segments, and emphasizes the pitch harmonics in the speech spectrum. The proposed approach achieves comparable performance with the state-of-the-art methods with much lesser computational complexity.</div>

2021 ◽  
Author(s):  
Venkatesh Parvathala ◽  
Sri Rama Murty Kodukula ◽  
Siva Ganesh Andhavarapu

<div>In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in deep neural network (DNN) based speech enhancement. We propose a sliding-window attention network to regress the spectral magnitude mask (SMM) from the noisy speech signal. Even though the network parameters can be estimated by minimizing the mask loss, it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. The network parameters are estimated using a combination of the mask loss and cepstral loss. The proposed network architecture functions like an adaptive comb filter on voiced segments, and emphasizes the pitch harmonics in the speech spectrum. The proposed approach achieves comparable performance with the state-of-the-art methods with much lesser computational complexity.</div>


2020 ◽  
Author(s):  
Rocío Mercado ◽  
Tobias Rastemo ◽  
Edvard Lindelöf ◽  
Günter Klambauer ◽  
Ola Engkvist ◽  
...  

Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.<br>


2020 ◽  
Author(s):  
Rocío Mercado ◽  
Tobias Rastemo ◽  
Edvard Lindelöf ◽  
Günter Klambauer ◽  
Ola Engkvist ◽  
...  

Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.<br>


Author(s):  
Daniel Ray ◽  
Tim Collins ◽  
Prasad Ponnapalli

Extracting accurate heart rate estimations from wrist-worn photoplethysmography (PPG) devices is challenging due to the signal containing artifacts from several sources. Deep Learning approaches have shown very promising results outperforming classical methods with improvements of 21% and 31% on two state-of-the-art datasets. This paper provides an analysis of several data-driven methods for creating deep neural network architectures with hopes of further improvements.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5113
Author(s):  
Runxuan Si ◽  
Jing Zhao ◽  
Yuhua Tang ◽  
Shaowu Yang

One-shot person Re-identification, which owns one labeled sample among numerous unlabeled data for each identity, is proposed to tackle the problem of the shortage of labeled data. Considering the scenarios without sufficient labeled data, it is very challenging to keep abreast of the performance of the supervised task in which sufficient labeled samples are available. In this paper, we propose a relation-based attention network with hybrid memory, which can make full use of the global information to pay attention to the identity features for model training with the relation-based attention network. Importantly, our specially designed network architecture effectively reduces the interference of environmental noise. Moreover, we propose a hybrid memory to train the one-shot data and unlabeled data in a unified framework, which notably contributes to the performance of person Re-identification. In particular, our designed one-shot feature update mode effectively alleviates the problem of overfitting, which is caused by the lack of supervised information during the training process. Compared with state-of-the-art unsupervised and one-shot algorithms for person Re-identification, our method achieves considerable improvements of 6.7%, 4.6%, and 11.5% on Market-1501, DukeMTMC-reID, and MSMT17 datasets, respectively, and becomes the new state-of-the-art method for one-shot person Re-identification.


2019 ◽  
Vol 11 (24) ◽  
pp. 2921 ◽  
Author(s):  
Jingyu Li ◽  
Ying Li ◽  
Yayuan Xiao ◽  
Yunpeng Bai

In order to remove speckle noise from original synthetic aperture radar (SAR) images effectively and efficiently, this paper proposes a hybrid dilated residual attention network (HDRANet) with residual learning for SAR despeckling. Firstly, HDRANet employs the hybrid dilated convolution (HDC) in lightweight network architecture to enlarge the receptive field and aggregate global information. Then, a simple yet effective attention module, convolutional block attention module (CBAM), is integrated into the proposed model to constitute a residual HDC attention block through skip connection, which further enhances representation power and performance of the model. Extensive experimental results on the synthetic and real SAR images demonstrate the superior performance of HDRANet over the state-of-the-art methods in terms of quantitative metrics and visual quality.


2020 ◽  
Vol 34 (07) ◽  
pp. 10672-10679
Author(s):  
Qi Chu ◽  
Wanli Ouyang ◽  
Bin Liu ◽  
Feng Zhu ◽  
Nenghai Yu

In this paper, we propose an online multi-object tracking (MOT) approach that integrates data association and single object tracking (SOT) with a unified convolutional network (ConvNet), named DASOTNet. The intuition behind integrating data association and SOT is that they can complement each other. Following Siamese network architecture, DASOTNet consists of the shared feature ConvNet, the data association branch and the SOT branch. Data association is treated as a special re-identification task and solved by learning discriminative features for different targets in the data association branch. To handle the problem that the computational cost of SOT grows intolerably as the number of tracked objects increases, we propose an efficient two-stage tracking method in the SOT branch, which utilizes the merits of correlation features and can simultaneously track all the existing targets within one forward propagation. With feature sharing and the interaction between them, data association branch and the SOT branch learn to better complement each other. Using a multi-task objective, the whole network can be trained end-to-end. Compared with state-of-the-art online MOT methods, our method is much faster while maintaining a comparable performance.


2021 ◽  
Author(s):  
Dayana Ribas ◽  
Antonio Miguel ◽  
Alfonso Ortega ◽  
Eduardo Lleida

Abstract This paper proposes a Deep Neural Network (DNN)-based Wiener gain estimator for speech enhancement. The proposal is in the framework of the classical spectral-domain speech enhancement algorithms. In this case, we used the Optimal Modified Log-Spectral Amplitude (OMLSA), but consider that this proposal could fit many alternative speech estimation algorithms. We determined the best usage of the DNN approach at learning a robust instance of the Wiener gain estimator according to the characteristics of the SNR estimation and the gain function. To design a DNN architecture adjusted for the speech enhancement task, we study various configuration issues frequently used in DNN-based solutions, including speech representations, residual connections, and causal vs. non-causal designs. Thus, we provide conclusions for the use of DNN architectures with the enhancement purpose. Experiments show that the proposal provides results on the state-of-the-art. But beyond the objective quality measures, there are examples of noisy vs. enhanced speech available for listening to demonstrate in practice the skills of the method in real audio.


Sign in / Sign up

Export Citation Format

Share Document