Adversarial Deep Structural Networks for Mammographic Mass Segmentation

Mapping Intimacies ◽

10.1101/095786 ◽

2016 ◽

Cited By ~ 13

Author(s):

Wentao Zhu ◽

Xiaohui Xie

Keyword(s):

Conditional Random Fields ◽

Model Potential ◽

Natural Image ◽

Structural Learning ◽

Mass Detection ◽

Convolutional Network ◽

Adversarial Training ◽

Mass Segmentation ◽

End To End ◽

Public Datasets

AbstractMass segmentation is an important task in mammogram analysis, providing effective morphological features and regions of interest (ROI) for mass detection and classification. Inspired by the success of using deep convolutional features for natural image analysis and conditional random fields (CRF) for structural learning, we propose an end-to-end network for mammographic mass segmentation. The network employs a fully convolutional network (FCN) to model potential function, followed by a CRF to perform structural learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with position priori for the task. Due to the small size of mammogram datasets, we use adversarial training to control over-fitting. Four models with different convolutional kernels are further fused to improve the segmentation results. Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.

Download Full-text

An end-to-end convolutional network for joint detecting and denoising adversarial perturbations in vehicle classification

Computational Visual Media ◽

10.1007/s41095-021-0202-3 ◽

2021 ◽

Author(s):

Peng Liu ◽

Huiyuan Fu ◽

Huadong Ma

Keyword(s):

Classification Model ◽

Vehicle Classification ◽

Joint Detection ◽

Deep Convolutional Neural Networks ◽

Convolutional Network ◽

Processing Step ◽

Adversarial Examples ◽

Adversarial Attack ◽

End To End ◽

Public Datasets

AbstractDeep convolutional neural networks (DCNNs) have been widely deployed in real-world scenarios. However, DCNNs are easily tricked by adversarial examples, which present challenges for critical applications, such as vehicle classification. To address this problem, we propose a novel end-to-end convolutional network for joint detection and removal of adversarial perturbations by denoising (DDAP). It gets rid of adversarial perturbations using the DDAP denoiser based on adversarial examples discovered by the DDAP detector. The proposed method can be regarded as a pre-processing step—it does not require modifying the structure of the vehicle classification model and hardly affects the classification results on clean images. We consider four kinds of adversarial attack (FGSM, BIM, DeepFool, PGD) to verify DDAP’s capabilities when trained on BIT-Vehicle and other public datasets. It provides better defense than other state-of-the-art defensive methods.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287667 ◽

2021 ◽

Author(s):

Duowei Tang ◽

Peter Kuppens ◽

Luc Geurts ◽

Toon van Waterschoot

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Convolutional Network ◽

End To End

Download Full-text

An Improved Full Convolutional Network Combined with Conditional Random Fields for Brain MR Image Segmentation Algorithm and its 3D Visualization Analysis

Journal of Medical Systems ◽

10.1007/s10916-019-1424-0 ◽

2019 ◽

Vol 43 (9) ◽

Cited By ~ 1

Author(s):

Jiemin Zhai ◽

Huiqi Li

Keyword(s):

Image Segmentation ◽

Random Fields ◽

Conditional Random Fields ◽

3D Visualization ◽

Segmentation Algorithm ◽

Mr Image ◽

Convolutional Network ◽

Visualization Analysis ◽

Mr Image Segmentation ◽

Image Segmentation Algorithm

Download Full-text

A Convolutional Network with Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3093859 ◽

2021 ◽

pp. 1-1

Author(s):

Xiang Xiaoxiao ◽

Zhang Xiaojuan ◽

Chen Haozhe

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Convolutional Network ◽

Multi Scale ◽

End To End

Download Full-text

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683602 ◽

2019 ◽

Cited By ~ 5

Author(s):

Alexander H. Liu ◽

Hung-yi Lee ◽

Lin-shan Lee

Keyword(s):

Speech Recognition ◽

Language Model ◽

Adversarial Training ◽

End To End

Download Full-text

Consistency-Check Edge Refinement for Deep Stereo Matching

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200719 ◽

2020 ◽

Author(s):

Fangrui Wu ◽

Menglong Yang

Keyword(s):

Computational Efficiency ◽

Stereo Matching ◽

Information Aggregation ◽

Experimental Results ◽

Global Information ◽

Consistency Check ◽

Filtering Method ◽

Tightly Coupled ◽

End To End ◽

Public Datasets

Recent end-to-end CNN-based stereo matching algorithms obtain disparities through regression from a cost volume, which is formed by concatenating the features of stereo pairs. Some downsampling steps are often embedded in constructing cost volume for global information aggregation and computational efficiency. However, many edge details are hard to recover due to the imprudent upsampling process and ambiguous boundary predictions. To tackle this problem without training another edge prediction sub-network, we developed a novel tightly-coupled edge refinement pipeline composed of two modules. The first module implements a gentle upsampling process by a cascaded cost volume filtering method, aggregating global information without losing many details. On this basis, the second module concentrates on generating a disparity residual map for boundary pixels by sub-pixel disparity consistency check, to further recover the edge details. The experimental results on public datasets demonstrate the effectiveness of the proposed method.

Download Full-text

Multi-Pig Part Detection and Association with a Fully-Convolutional Network

Sensors ◽

10.3390/s19040852 ◽

2019 ◽

Vol 19 (4) ◽

pp. 852 ◽

Cited By ~ 12

Author(s):

Eric Psota ◽

Mateusz Mittek ◽

Lance Pérez ◽

Ty Schmidt ◽

Benny Mote

Keyword(s):

Body Part ◽

Training Set ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Non Invasive ◽

Significant Challenge ◽

Lighting Conditions ◽

Computer Vision Systems ◽

Public Datasets ◽

Private Datasets

Computer vision systems have the potential to provide automated, non-invasive monitoring of livestock animals, however, the lack of public datasets with well-defined targets and evaluation metrics presents a significant challenge for researchers. Consequently, existing solutions often focus on achieving task-specific objectives using relatively small, private datasets. This work introduces a new dataset and method for instance-level detection of multiple pigs in group-housed environments. The method uses a single fully-convolutional neural network to detect the location and orientation of each animal, where both body part locations and pairwise associations are represented in the image space. Accompanying this method is a new dataset containing 2000 annotated images with 24,842 individually annotated pigs from 17 different locations. The proposed method achieves over 99% precision and over 96% recall when detecting pigs in environments previously seen by the network during training. To evaluate the robustness of the trained network, it is also tested on environments and lighting conditions unseen in the training set, where it achieves 91% precision and 67% recall. The dataset is publicly available for download.

Download Full-text

3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition

Future Internet ◽

10.3390/fi11020042 ◽

2019 ◽

Vol 11 (2) ◽

pp. 42 ◽

Cited By ~ 5

Author(s):

Sheeraz Arif ◽

Jing Wang ◽

Tehseen Ul Hassan ◽

Zesong Fei

Keyword(s):

Short Term Memory ◽

Research Work ◽

Video Frame ◽

Feature Maps ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatio Temporal ◽

3D Cnn ◽

Public Datasets ◽

Motion Map

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.

Download Full-text

Improved Fully Convolutional Network with Conditional Random Fields for Building Extraction

Remote Sensing ◽

10.3390/rs10071135 ◽

2018 ◽

Vol 10 (7) ◽

pp. 1135 ◽

Cited By ~ 30

Author(s):

Sanjeevan Shrestha ◽

Leonardo Vanneschi

Keyword(s):

Performance Measures ◽

Random Fields ◽

Conditional Random Fields ◽

Remotely Sensed ◽

High Accuracy ◽

Large Network ◽

Building Extraction ◽

Deep Convolutional Neural Networks ◽

Convolutional Network ◽

Fully Convolutional Network

Building extraction from remotely sensed imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Several published contributions dedicated to the applications of deep convolutional neural networks (DCNN) for building extraction using aerial/satellite imagery exists. However, in all these contributions, high accuracy is always obtained at the price of extremely complex and large network architectures. In this paper, we present an enhanced fully convolutional network (FCN) framework that is designed for building extraction of remotely sensed images by applying conditional random fields (CRFs). The main objective is to propose a methodology selecting a framework that balances high accuracy with low network complexity. A modern activation function, namely, the exponential linear unit (ELU), is applied to improve the performance of the fully convolutional network (FCN), thereby resulting in more accurate building prediction. To further reduce the noise (falsely classified buildings) and to sharpen the boundaries of the buildings, a post-processing conditional random fields (CRFs) is added at the end of the adopted convolutional neural network (CNN) framework. The experiments were conducted on Massachusetts building aerial imagery. The results show that our proposed framework outperformed the fully convolutional network (FCN), which is the existing baseline framework for semantic segmentation, in terms of performance measures such as the F1-score and IoU measure. Additionally, the proposed method outperformed a pre-existing classifier for building extraction using the same dataset in terms of the performance measures and network complexity.

Download Full-text