MSegnet, a Practical Network for Building Detection from High Spatial Resolution Images

Bo Yu; Fang Chen; Ying Dong; Lei Wang; Ning Wang; Aqiang Yang

doi:10.14358/pers.21-00016r2

MSegnet, a Practical Network for Building Detection from High Spatial Resolution Images

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.21-00016r2 ◽

2021 ◽

Vol 87 (12) ◽

pp. 901-906

Author(s):

Bo Yu ◽

Fang Chen ◽

Ying Dong ◽

Lei Wang ◽

Ning Wang ◽

...

Keyword(s):

Multiple Scales ◽

Feature Learning ◽

Semantic Segmentation ◽

Single Shot ◽

Building Detection ◽

Data Set ◽

Multi Scale ◽

Aspect Ratios ◽

The Matrix ◽

Multiple Aspect

Building detection in big earth data by remote sensing is crucial for urban development. However, improving its accuracy remains challenging due to complicated background objects and different viewing angles from various remotely sensed images. The hereto proposed methods predominantly focus on multi-scale feature learning, which omits features in multiple aspect ratios. Moreover, postprocessing is required to refine the segmentation performance. We propose modified semantic segmentation (MSegnet), a single-shot semantic segmentation model based on a matrix of convolution layers to extract features in multiple scales and aspect ratios. MSegnet consists of two modules: backbone feature learning and matrix convolution to conduct vertical and horizontal learning. The matrix convolution comprises a set of convolution operations with different aspect ratios. MSegnet is applied to a public building data set that is widely used for evaluation and shown to achieve satisfactory accuracy, compared with the published single-shot methods.

Download Full-text

MDRNet: a lightweight network for real-time semantic segmentation in street scenes

Assembly Automation ◽

10.1108/aa-06-2021-0078 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yingpeng Dai ◽

Junzheng Wang ◽

Jiehao Li ◽

Jing Li

Keyword(s):

Multiple Scales ◽

Semantic Segmentation ◽

Environmental Information ◽

Environmental Perception ◽

Computational Time ◽

Small Scale ◽

Feature Maps ◽

Data Set ◽

Content Type ◽

Multi Scale

Purpose This paper aims to focus on the environmental perception of unmanned platform under complex street scenes. Unmanned platform has a strict requirement both on accuracy and inference speed. So how to make a trade-off between accuracy and inference speed during the extraction of environmental information becomes a challenge. Design/methodology/approach In this paper, a novel multi-scale depth-wise residual (MDR) module is proposed. This module makes full use of depth-wise separable convolution, dilated convolution and 1-dimensional (1-D) convolution, which is able to extract local information and contextual information jointly while keeping this module small-scale and shallow. Then, based on MDR module, a novel network named multi-scale depth-wise residual network (MDRNet) is designed for fast semantic segmentation. This network could extract multi-scale information and maintain feature maps with high spatial resolution to mitigate the existence of objects at multiple scales. Findings Experiments on Camvid data set and Cityscapes data set reveal that the proposed MDRNet produces competitive results both in terms of computational time and accuracy during inference. Specially, the authors got 67.47 and 68.7% Mean Intersection over Union (MIoU) on Camvid data set and Cityscapes data set, respectively, with only 0.84 million parameters and quicker speed on a single GTX 1070Ti card. Originality/value This research can provide the theoretical and engineering basis for environmental perception on the unmanned platform. In addition, it provides environmental information to support the subsequent works.

Download Full-text

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12050872 ◽

2020 ◽

Vol 12 (5) ◽

pp. 872 ◽

Cited By ~ 3

Author(s):

Ronghua Shang ◽

Jiyu Zhang ◽

Licheng Jiao ◽

Yangyang Li ◽

Naresh Marturi ◽

...

Keyword(s):

Remote Sensing ◽

Multiple Scales ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Features ◽

Remote Sensing Images ◽

Global Features ◽

Global Average ◽

Multi Scale ◽

Context Extraction

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Download Full-text

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Semantic Segmentation

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412965 ◽

2021 ◽

Author(s):

Zhuoying Wang ◽

Yongtao Wang ◽

Zhi Tang ◽

Yangyan Li ◽

Ying Chen ◽

...

Keyword(s):

Feature Learning ◽

Semantic Segmentation ◽

Scale Feature ◽

Multi Scale ◽

Transfer Operation

Download Full-text

SPMF-Net: Weakly Supervised Building Segmentation by Combining Superpixel Pooling and Multi-Scale Feature Fusion

Remote Sensing ◽

10.3390/rs12061049 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1049 ◽

Cited By ~ 2

Author(s):

Jie Chen ◽

Fen He ◽

Yi Zhang ◽

Geng Sun ◽

Min Deng

Keyword(s):

Feature Fusion ◽

Semantic Segmentation ◽

Building Detection ◽

Segmentation Method ◽

Scale Feature ◽

Multi Scale ◽

Semantic Labeling ◽

Supervised Methods ◽

Boundary Information ◽

Weakly Supervised

The lack of pixel-level labeling limits the practicality of deep learning-based building semantic segmentation. Weakly supervised semantic segmentation based on image-level labeling results in incomplete object regions and missing boundary information. This paper proposes a weakly supervised semantic segmentation method for building detection. The proposed method takes the image-level label as supervision information in a classification network that combines superpixel pooling and multi-scale feature fusion structures. The main advantage of the proposed strategy is its ability to improve the intactness and boundary accuracy of a detected building. Our method achieves impressive results on two 2D semantic labeling datasets, which outperform some competing weakly supervised methods and are close to the result of the fully supervised method.

Download Full-text

Matrix SegNet: A Practical Deep Learning Framework for Landslide Mapping from Images of Different Areas with Different Spatial Resolutions

Remote Sensing ◽

10.3390/rs13163158 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3158

Author(s):

Bo Yu ◽

Fang Chen ◽

Chong Xu ◽

Lei Wang ◽

Ning Wang

Keyword(s):

Deep Learning ◽

Change Detection ◽

Large Scale ◽

Multiple Scales ◽

Google Earth ◽

Radiometric Correction ◽

Learning Framework ◽

Aspect Ratios ◽

Starting Point ◽

The Matrix

Practical landslide inventory maps covering large-scale areas are essential in emergency response and geohazard analysis. Recently proposed techniques in landslide detection generally focused on landslides in pure vegetation backgrounds and image radiometric correction. There are still challenges in regard to robust methods that automatically detect landslides from images with multiple platforms and without radiometric correction. It is a significant issue in practical application. In order to detect landslides from images over different large-scale areas with different spatial resolutions, this paper proposes a two-branch Matrix SegNet to semantically segment input images by change detection. The Matrix SegNet learns landslide features in multiple scales and aspect ratios. The pre- and post- event images are captured directly from Google Earth, without radiometric correction. To evaluate the proposed framework, we conducted landslide detection in four study areas with two different spatial resolutions. Moreover, two other widely used frameworks: U-Net and SegNet, were adapted to detect landslides via the same data by change detection. The experiments show that our model improves the performance largely in terms of recall, precision, F1-score, and IOU. It is a good starting point to develop a practical, deep learning landslide detection framework for large scale application, using images from different areas, with different spatial resolutions.

Download Full-text

SA-Net: A scale-attention network for medical image segmentation

PLoS ONE ◽

10.1371/journal.pone.0247388 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0247388

Author(s):

Jingfei Hu ◽

Hua Wang ◽

Jie Wang ◽

Yunqi Wang ◽

Fang He ◽

...

Keyword(s):

Deep Learning ◽

Medical Image ◽

Multiple Scales ◽

Medical Images ◽

Semantic Segmentation ◽

Medical Image Segmentation ◽

Retinal Images ◽

Multi Scale ◽

Multiple Datasets ◽

Deep Learning Network

Semantic segmentation of medical images provides an important cornerstone for subsequent tasks of image analysis and understanding. With rapid advancements in deep learning methods, conventional U-Net segmentation networks have been applied in many fields. Based on exploratory experiments, features at multiple scales have been found to be of great importance for the segmentation of medical images. In this paper, we propose a scale-attention deep learning network (SA-Net), which extracts features of different scales in a residual module and uses an attention module to enforce the scale-attention capability. SA-Net can better learn the multi-scale features and achieve more accurate segmentation for different medical image. In addition, this work validates the proposed method across multiple datasets. The experiment results show SA-Net achieves excellent performances in the applications of vessel detection in retinal images, lung segmentation, artery/vein(A/V) classification in retinal images and blastocyst segmentation. To facilitate SA-Net utilization by the scientific community, the code implementation will be made publicly available.

Download Full-text

Asphalt pavement crack detection based on multi-scale full convolutional network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-191105 ◽

2021 ◽

Vol 40 (1) ◽

pp. 1495-1508

Author(s):

Yangxu Wu ◽

Wanting Yang ◽

Jinxiao Pan ◽

Ping Chen

Keyword(s):

Crack Detection ◽

Asphalt Pavement ◽

Semantic Segmentation ◽

Svm Classifier ◽

Convolutional Network ◽

Data Set ◽

Important Indicator ◽

Standard Data ◽

Multi Scale ◽

Pavement Crack Detection

Pavement crack assessment is an important indicator for evaluating road health. However, due to the dark color of the asphalt pavement and the texture characteristics of the pavement, current asphalt pavement crack detection technology cannot meet the requirements of accuracy and efficiency. In this paper, we propose an end-to-end multi-scale full convolutional neural network to achieve the semantic segmentation of cracks in road images by learning the crack characteristics in the complex fine grain background of asphalt pavement. The method uses DenseNet and deconvolution network framework to achieve pixel-level detection and fuses features learned from different scales of convolutional kernels through a full convolutional network to obtain richer information on multi-scale features, allowing more detailed representation of crack features in high-resolution images. And the back end joins the SVM classifier to achieve crack classification after crack segmentation. Then we create a road test standard data set containing 12 cracks and evaluate it on the data. The experimental results show that the method achieves good segmentation effect for 12 types of cracks, and the crack segmentation for asphalt pavement is better than the most advanced methods.

Download Full-text

BIDIRECTIONAL MULTI-SCALE ATTENTION NETWORKS FOR SEMANTIC SEGMENTATION OF OBLIQUE UAV IMAGERY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2021-75-2021 ◽

2021 ◽

Vol V-2-2021 ◽

pp. 75-82

Author(s):

Y. Lyu ◽

G. Vosselman ◽

G.-S. Xia ◽

M. Y. Yang

Keyword(s):

Multiple Scales ◽

Semantic Segmentation ◽

The Novel ◽

Oblique View ◽

Attention Networks ◽

Multi Scale ◽

Single Scale ◽

Aerial Platforms ◽

Oblique Images ◽

Scale Variation

Abstract. Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation. Most of the semantic segmentation research focused on scenes captured in nadir view, in which objects have relatively smaller scale variation compared with scenes captured in oblique view. The huge scale variation of objects in oblique images limits the performance of deep neural networks (DNN) that process images in a single scale fashion. In order to tackle the scale variation issue, in this paper, we propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction. The experiments are conducted on the UAVid2020 dataset and have shown the effectiveness of our method. Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.

Download Full-text

Intelligent crack detection based on attention mechanism in convolution neural network

Advances in Structural Engineering ◽

10.1177/1369433220986638 ◽

2021 ◽

pp. 136943322098663

Author(s):

Xiaoning Cui ◽

Qicai Wang ◽

Jinpeng Dai ◽

Yanjin Xue ◽

Yun Duan

Keyword(s):

Neural Network ◽

Crack Detection ◽

Semantic Segmentation ◽

Attention Mechanism ◽

Data Set ◽

Multi Scale ◽

Critical Areas ◽

Intelligent Detection ◽

Segmentation Task ◽

Segmentation Models

The intelligent detection of distress in concrete is a research hotspot in structural health monitoring. In this study, Att-Unet, an improved attention-mechanism fully convolutional neural network model, was proposed to realize end-to-end pixel-level crack segmentation. Att-Unet consists of three parts: encoding module, decoding module, and AG (Attention Gate) module. The benefits associated with this module can effectively extract multi-scale features of cracks, focus on critical areas, and reconstruct semantics, to significantly improve the crack segmentation capability of the Att-Unet model. On the same data set, the mainstream semantic segmentation models (FCN and Unet) were trained simultaneously. Upon comparing and analyzing the calculated results of Att-Unet model with those of FCN and Unet, the results are as follows: for crack images under different conditions, Att-Unet achieved better results in accuracy, precision and F1-scores. Besides, Att-Unet showed higher feature extraction accuracy and better generalization ability in the crack segmentation task.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text