BIDIRECTIONAL MULTI-SCALE ATTENTION NETWORKS FOR SEMANTIC SEGMENTATION OF OBLIQUE UAV IMAGERY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2021-75-2021 ◽

2021 ◽

Vol V-2-2021 ◽

pp. 75-82

Author(s):

Y. Lyu ◽

G. Vosselman ◽

G.-S. Xia ◽

M. Y. Yang

Keyword(s):

Multiple Scales ◽

Semantic Segmentation ◽

The Novel ◽

Oblique View ◽

Attention Networks ◽

Multi Scale ◽

Single Scale ◽

Aerial Platforms ◽

Oblique Images ◽

Scale Variation

Abstract. Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation. Most of the semantic segmentation research focused on scenes captured in nadir view, in which objects have relatively smaller scale variation compared with scenes captured in oblique view. The huge scale variation of objects in oblique images limits the performance of deep neural networks (DNN) that process images in a single scale fashion. In order to tackle the scale variation issue, in this paper, we propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction. The experiments are conducted on the UAVid2020 dataset and have shown the effectiveness of our method. Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.

Download Full-text

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12050872 ◽

2020 ◽

Vol 12 (5) ◽

pp. 872 ◽

Cited By ~ 3

Author(s):

Ronghua Shang ◽

Jiyu Zhang ◽

Licheng Jiao ◽

Yangyang Li ◽

Naresh Marturi ◽

...

Keyword(s):

Remote Sensing ◽

Multiple Scales ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Features ◽

Remote Sensing Images ◽

Global Features ◽

Global Average ◽

Multi Scale ◽

Context Extraction

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Download Full-text

Multi-Scale Classification Based on Remote Sensing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.580-583.2853 ◽

2014 ◽

Vol 580-583 ◽

pp. 2853-2859

Author(s):

Peng Li Li ◽

Wei Ping Ti ◽

Jia Chun Li

Keyword(s):

Remote Sensing ◽

Multiple Scales ◽

Know How ◽

Multi Scale ◽

Single Scale ◽

The Difference ◽

Feature Values ◽

Nearest Neighbour Classifier ◽

Scale Classification

Due to the broadly application of remote sensing imagery, there is an eager need for the classification of objects in the images. The multi-scale classification based on object oriented analysis is not a usual approach for image classification because the users of multi-scale classification do not know how to use the information from multiple scales to do multi-scale classification. Many users rely on some easily accessible tools. nearest neighbour classifier, to do multi-scale classification. The multi-scale classification classifies the images from different scales. The feature values of the object vary from different scales and they may have some trends against scales. These trends may help us to understand multi-scale classification better. This is the scale dependency of features. The difference between multi-scale classification and single-scale classification is not only multiple scales, but also the use of information from different scales. In order to explore the connection between different scales, the research of new features is necessary.

Download Full-text

A Taxonomic Analysis of Perspectives in Generating Space-Time Research Questions in Environmental Sciences

International Journal of Applied Geospatial Research ◽

10.4018/ijagr.2016040104 ◽

2016 ◽

Vol 7 (2) ◽

pp. 50-60 ◽

Cited By ~ 2

Author(s):

Xinyue Ye ◽

Bing She ◽

Huanyang Zhao ◽

Xiaoyan Zhou

Keyword(s):

Environmental Science ◽

Multiple Scales ◽

Global Scale ◽

Space Time ◽

Three Dimensions ◽

Workflow Systems ◽

Multi Scale ◽

Taxonomic Analysis ◽

Single Scale ◽

Research Questions

Research questions in environment science can be decomposed into three basic dimensions: space, time and statistics. The combinations of these three dimensions reflect the diverse perspectives of observations across multiple scales. One can classify these scales into four types: individual, local, meso, and global. Following this multi-dimensional and multi-scale framework, this paper conducts a taxonomic analysis that systematically classifies research questions in environmental science. This taxonomic analysis includes papers from a leading environmental science journal. The results show that the majority of research questions are directed at local and global scale analyses. Studies that incorporate many scales of analysis are not necessarily more sophisticated than studies that investigate a single scale. Nonetheless, it's beneficial to explore more possibilities by investigating data at different perspectives. This taxonomy could help generating research questions and providing guidance for building analytic workflow systems to fill the gaps in future scientific endeavors.

Download Full-text

MSegnet, a Practical Network for Building Detection from High Spatial Resolution Images

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.21-00016r2 ◽

2021 ◽

Vol 87 (12) ◽

pp. 901-906

Author(s):

Bo Yu ◽

Fang Chen ◽

Ying Dong ◽

Lei Wang ◽

Ning Wang ◽

...

Keyword(s):

Multiple Scales ◽

Feature Learning ◽

Semantic Segmentation ◽

Single Shot ◽

Building Detection ◽

Data Set ◽

Multi Scale ◽

Aspect Ratios ◽

The Matrix ◽

Multiple Aspect

Building detection in big earth data by remote sensing is crucial for urban development. However, improving its accuracy remains challenging due to complicated background objects and different viewing angles from various remotely sensed images. The hereto proposed methods predominantly focus on multi-scale feature learning, which omits features in multiple aspect ratios. Moreover, postprocessing is required to refine the segmentation performance. We propose modified semantic segmentation (MSegnet), a single-shot semantic segmentation model based on a matrix of convolution layers to extract features in multiple scales and aspect ratios. MSegnet consists of two modules: backbone feature learning and matrix convolution to conduct vertical and horizontal learning. The matrix convolution comprises a set of convolution operations with different aspect ratios. MSegnet is applied to a public building data set that is widely used for evaluation and shown to achieve satisfactory accuracy, compared with the published single-shot methods.

Download Full-text

MDRNet: a lightweight network for real-time semantic segmentation in street scenes

Assembly Automation ◽

10.1108/aa-06-2021-0078 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yingpeng Dai ◽

Junzheng Wang ◽

Jiehao Li ◽

Jing Li

Keyword(s):

Multiple Scales ◽

Semantic Segmentation ◽

Environmental Information ◽

Environmental Perception ◽

Computational Time ◽

Small Scale ◽

Feature Maps ◽

Data Set ◽

Content Type ◽

Multi Scale

Purpose This paper aims to focus on the environmental perception of unmanned platform under complex street scenes. Unmanned platform has a strict requirement both on accuracy and inference speed. So how to make a trade-off between accuracy and inference speed during the extraction of environmental information becomes a challenge. Design/methodology/approach In this paper, a novel multi-scale depth-wise residual (MDR) module is proposed. This module makes full use of depth-wise separable convolution, dilated convolution and 1-dimensional (1-D) convolution, which is able to extract local information and contextual information jointly while keeping this module small-scale and shallow. Then, based on MDR module, a novel network named multi-scale depth-wise residual network (MDRNet) is designed for fast semantic segmentation. This network could extract multi-scale information and maintain feature maps with high spatial resolution to mitigate the existence of objects at multiple scales. Findings Experiments on Camvid data set and Cityscapes data set reveal that the proposed MDRNet produces competitive results both in terms of computational time and accuracy during inference. Specially, the authors got 67.47 and 68.7% Mean Intersection over Union (MIoU) on Camvid data set and Cityscapes data set, respectively, with only 0.84 million parameters and quicker speed on a single GTX 1070Ti card. Originality/value This research can provide the theoretical and engineering basis for environmental perception on the unmanned platform. In addition, it provides environmental information to support the subsequent works.

Download Full-text

Dynamic Item Block and Prediction Enhancing Block for Sequential Recommendation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/190 ◽

2019 ◽

Cited By ~ 4

Author(s):

Guibing Guo ◽

Shichang Ouyang ◽

Xiaodong He ◽

Fajie Yuan ◽

Xiaohua Liu

Keyword(s):

Multiple Scales ◽

Building Blocks ◽

User Preferences ◽

Inner Product ◽

Time Step ◽

Multi Scale ◽

Basic Model ◽

Single Scale ◽

Series Of Experiments ◽

Item Representation

Sequential recommendation systems have become a research hotpot recently to suggest users with the next item of interest (to interact with). However, existing approaches suffer from two limitations: (1) The representation of an item is relatively static and fixed for all users. We argue that even a same item should be represented distinctively with respect to different users and time steps. (2) The generation of a prediction for a user over an item is computed in a single scale (e.g., by their inner product), ignoring the nature of multi-scale user preferences. To resolve these issues, in this paper we propose two enhancing building blocks for sequential recommendation. Specifically, we devise a Dynamic Item Block (DIB) to learn dynamic item representation by aggregating the embeddings of those who rated the same item before that time step. Then, we come up with a Prediction Enhancing Block (PEB) to project user representation into multiple scales, based on which many predictions can be made and attentively aggregated for enhanced learning. Each prediction is generated by a softmax over a sampled itemset rather than the whole item space for efficiency. We conduct a series of experiments on four real datasets, and show that even a basic model can be greatly enhanced with the involvement of DIB and PEB in terms of ranking accuracy. The code and datasets can be obtained from https://github.com/ouououououou/DIB-PEB-Sequential-RS

Download Full-text

Multi-Scale Shapelets Discovery for Time-Series Classification

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622020500133 ◽

2020 ◽

Vol 19 (03) ◽

pp. 721-739

Author(s):

Borui Cai ◽

Guangyan Huang ◽

Yong Xiang ◽

Maia Angelova ◽

Limin Guo ◽

...

Keyword(s):

Time Series ◽

Classification Accuracy ◽

Multiple Scales ◽

Quality Measurement ◽

Time Series Classification ◽

High Quality ◽

Major Task ◽

Multi Scale ◽

Single Scale ◽

Local Patterns

Shapelets are subsequences of time-series that represent local patterns and can improve the accuracy and the interpretability of time-series classification. The major task of time-series classification using shapelets is to discover high quality shapelets. However, this is challenging since local patterns may have various scales/lengths rather than a unified scale. In this paper, we resolve this problem by discovering shapelets with multiple scales. We propose a novel Multi-Scale Shapelet Discovery (MSSD) algorithm to discover expressive multi-scale shapelets by extending initial single-scale shapelets (i.e., shapelets with a unified scale). MSSD adopts a bi-directional extension process and is robust to extend single-shapelets obtained by different methods. A supervised shapelet quality measurement is further developed to qualify the extension of shapelets. Comprehensive experiments conducted on 25 UCR time-series datasets show that multi-scale shapelets discovered by MSSD improve classification accuracy by around 10% (in average), compared with single-scale shapelets discovered by counterpart methods.

Download Full-text

Pyramid Attention Aggregation Network for Semantic Segmentation of Surgical Instruments

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6850 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11782-11790

Author(s):

Zhen-Liang Ni ◽

Gui-Bin Bian ◽

Guan-An Wang ◽

Xiao-Hu Zhou ◽

Zeng-Guang Hou ◽

...

Keyword(s):

Specular Reflection ◽

Critical Role ◽

Receptive Fields ◽

Computer Assisted Surgery ◽

Semantic Segmentation ◽

Surgical Instruments ◽

Computer Assisted ◽

Multi Scale ◽

Semantic Dependencies ◽

Scale Variation

Semantic segmentation of surgical instruments plays a critical role in computer-assisted surgery. However, specular reflection and scale variation of instruments are likely to occur in the surgical environment, undesirably altering visual features of instruments, such as color and shape. These issues make semantic segmentation of surgical instruments more challenging. In this paper, a novel network, Pyramid Attention Aggregation Network, is proposed to aggregate multi-scale attentive features for surgical instruments. It contains two critical modules: Double Attention Module and Pyramid Upsampling Module. Specifically, the Double Attention Module includes two attention blocks (i.e., position attention block and channel attention block), which model semantic dependencies between positions and channels by capturing joint semantic information and global contexts, respectively. The attentive features generated by the Double Attention Module can distinguish target regions, contributing to solving the specular reflection issue. Moreover, the Pyramid Upsampling Module extracts local details and global contexts by aggregating multi-scale attentive features. It learns the shape and size features of surgical instruments in different receptive fields and thus addresses the scale variation issue. The proposed network achieves state-of-the-art performance on various datasets. It achieves a new record of 97.10% mean IOU on Cata7. Besides, it comes first in the MICCAI EndoVis Challenge 2017 with 9.90% increase on mean IOU.

Download Full-text

SA-Net: A scale-attention network for medical image segmentation

PLoS ONE ◽

10.1371/journal.pone.0247388 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0247388

Author(s):

Jingfei Hu ◽

Hua Wang ◽

Jie Wang ◽

Yunqi Wang ◽

Fang He ◽

...

Keyword(s):

Deep Learning ◽

Medical Image ◽

Multiple Scales ◽

Medical Images ◽

Semantic Segmentation ◽

Medical Image Segmentation ◽

Retinal Images ◽

Multi Scale ◽

Multiple Datasets ◽

Deep Learning Network

Semantic segmentation of medical images provides an important cornerstone for subsequent tasks of image analysis and understanding. With rapid advancements in deep learning methods, conventional U-Net segmentation networks have been applied in many fields. Based on exploratory experiments, features at multiple scales have been found to be of great importance for the segmentation of medical images. In this paper, we propose a scale-attention deep learning network (SA-Net), which extracts features of different scales in a residual module and uses an attention module to enforce the scale-attention capability. SA-Net can better learn the multi-scale features and achieve more accurate segmentation for different medical image. In addition, this work validates the proposed method across multiple datasets. The experiment results show SA-Net achieves excellent performances in the applications of vessel detection in retinal images, lung segmentation, artery/vein(A/V) classification in retinal images and blastocyst segmentation. To facilitate SA-Net utilization by the scientific community, the code implementation will be made publicly available.

Download Full-text

A New Water Environmental Load and Allocation Modeling Framework at the Medium–Large Basin Scale

Water ◽

10.3390/w11112398 ◽

2019 ◽

Vol 11 (11) ◽

pp. 2398

Author(s):

Qiankun Liu ◽

Jingang Jiang ◽

Changwei Jing ◽

Zhong Liu ◽

Jiaguo Qi

Keyword(s):

Local Governments ◽

Multiple Scales ◽

Economic Benefits ◽

Modeling Framework ◽

Analytic Hierarchy ◽

Gini Coefficients ◽

Multi Scale ◽

Single Scale ◽

Basin Scale ◽

Waste Load

Waste load allocation (WLA), as a well-known total pollutant control strategy, is designed to distribute pollution responsibilities among polluters to alleviate environmental problems, but the current policy is unfair and limited to single scale or single pollution types. In this paper, a new, alternative, multi-scale, and multi-pollution WLA modeling framework was developed, with a goal of producing optimal and fair allocation quotas at multiple scales. The new WLA modeling framework integrates multi-constrained environmental Gini coefficients (EGCs) and Delphi-analytic hierarchy process (Delphi-AHP) optimization models to achieve the stated goal. The new WLA modeling framework was applied in a case study in the Xian-jiang watershed in Zhejiang Province, China, in order to test its validity and usefulness. The results, in comparison with existing practices by the local governments, suggest that the simulated pollutant load quota at the watershed scale is much fairer than the existing policies and even has some environmental economic benefits at the pollutant source scale. As the new WLA is a process-based modeling framework, it should be possible to adopt this approach in other similar geographic areas.

Download Full-text