scholarly journals Adversarial Deep Structural Networks for Mammographic Mass Segmentation

2016 ◽  
Author(s):  
Wentao Zhu ◽  
Xiaohui Xie

AbstractMass segmentation is an important task in mammogram analysis, providing effective morphological features and regions of interest (ROI) for mass detection and classification. Inspired by the success of using deep convolutional features for natural image analysis and conditional random fields (CRF) for structural learning, we propose an end-to-end network for mammographic mass segmentation. The network employs a fully convolutional network (FCN) to model potential function, followed by a CRF to perform structural learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with position priori for the task. Due to the small size of mammogram datasets, we use adversarial training to control over-fitting. Four models with different convolutional kernels are further fused to improve the segmentation results. Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.

Author(s):  
Peng Liu ◽  
Huiyuan Fu ◽  
Huadong Ma

AbstractDeep convolutional neural networks (DCNNs) have been widely deployed in real-world scenarios. However, DCNNs are easily tricked by adversarial examples, which present challenges for critical applications, such as vehicle classification. To address this problem, we propose a novel end-to-end convolutional network for joint detection and removal of adversarial perturbations by denoising (DDAP). It gets rid of adversarial perturbations using the DDAP denoiser based on adversarial examples discovered by the DDAP detector. The proposed method can be regarded as a pre-processing step—it does not require modifying the structure of the vehicle classification model and hardly affects the classification results on clean images. We consider four kinds of adversarial attack (FGSM, BIM, DeepFool, PGD) to verify DDAP’s capabilities when trained on BIT-Vehicle and other public datasets. It provides better defense than other state-of-the-art defensive methods.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Author(s):  
Fangrui Wu ◽  
Menglong Yang

Recent end-to-end CNN-based stereo matching algorithms obtain disparities through regression from a cost volume, which is formed by concatenating the features of stereo pairs. Some downsampling steps are often embedded in constructing cost volume for global information aggregation and computational efficiency. However, many edge details are hard to recover due to the imprudent upsampling process and ambiguous boundary predictions. To tackle this problem without training another edge prediction sub-network, we developed a novel tightly-coupled edge refinement pipeline composed of two modules. The first module implements a gentle upsampling process by a cascaded cost volume filtering method, aggregating global information without losing many details. On this basis, the second module concentrates on generating a disparity residual map for boundary pixels by sub-pixel disparity consistency check, to further recover the edge details. The experimental results on public datasets demonstrate the effectiveness of the proposed method.


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 852 ◽  
Author(s):  
Eric Psota ◽  
Mateusz Mittek ◽  
Lance Pérez ◽  
Ty Schmidt ◽  
Benny Mote

Computer vision systems have the potential to provide automated, non-invasive monitoring of livestock animals, however, the lack of public datasets with well-defined targets and evaluation metrics presents a significant challenge for researchers. Consequently, existing solutions often focus on achieving task-specific objectives using relatively small, private datasets. This work introduces a new dataset and method for instance-level detection of multiple pigs in group-housed environments. The method uses a single fully-convolutional neural network to detect the location and orientation of each animal, where both body part locations and pairwise associations are represented in the image space. Accompanying this method is a new dataset containing 2000 annotated images with 24,842 individually annotated pigs from 17 different locations. The proposed method achieves over 99% precision and over 96% recall when detecting pigs in environments previously seen by the network during training. To evaluate the robustness of the trained network, it is also tested on environments and lighting conditions unseen in the training set, where it achieves 91% precision and 67% recall. The dataset is publicly available for download.


2019 ◽  
Vol 11 (2) ◽  
pp. 42 ◽  
Author(s):  
Sheeraz Arif ◽  
Jing Wang ◽  
Tehseen Ul Hassan ◽  
Zesong Fei

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.


2018 ◽  
Vol 10 (7) ◽  
pp. 1135 ◽  
Author(s):  
Sanjeevan Shrestha ◽  
Leonardo Vanneschi

Building extraction from remotely sensed imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Several published contributions dedicated to the applications of deep convolutional neural networks (DCNN) for building extraction using aerial/satellite imagery exists. However, in all these contributions, high accuracy is always obtained at the price of extremely complex and large network architectures. In this paper, we present an enhanced fully convolutional network (FCN) framework that is designed for building extraction of remotely sensed images by applying conditional random fields (CRFs). The main objective is to propose a methodology selecting a framework that balances high accuracy with low network complexity. A modern activation function, namely, the exponential linear unit (ELU), is applied to improve the performance of the fully convolutional network (FCN), thereby resulting in more accurate building prediction. To further reduce the noise (falsely classified buildings) and to sharpen the boundaries of the buildings, a post-processing conditional random fields (CRFs) is added at the end of the adopted convolutional neural network (CNN) framework. The experiments were conducted on Massachusetts building aerial imagery. The results show that our proposed framework outperformed the fully convolutional network (FCN), which is the existing baseline framework for semantic segmentation, in terms of performance measures such as the F1-score and IoU measure. Additionally, the proposed method outperformed a pre-existing classifier for building extraction using the same dataset in terms of the performance measures and network complexity.


Sign in / Sign up

Export Citation Format

Share Document