scholarly journals BIDIRECTIONAL MULTI-SCALE ATTENTION NETWORKS FOR SEMANTIC SEGMENTATION OF OBLIQUE UAV IMAGERY

Author(s):  
Y. Lyu ◽  
G. Vosselman ◽  
G.-S. Xia ◽  
M. Y. Yang

Abstract. Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation. Most of the semantic segmentation research focused on scenes captured in nadir view, in which objects have relatively smaller scale variation compared with scenes captured in oblique view. The huge scale variation of objects in oblique images limits the performance of deep neural networks (DNN) that process images in a single scale fashion. In order to tackle the scale variation issue, in this paper, we propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction. The experiments are conducted on the UAVid2020 dataset and have shown the effectiveness of our method. Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.

2020 ◽  
Vol 12 (5) ◽  
pp. 872 ◽  
Author(s):  
Ronghua Shang ◽  
Jiyu Zhang ◽  
Licheng Jiao ◽  
Yangyang Li ◽  
Naresh Marturi ◽  
...  

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.


2014 ◽  
Vol 580-583 ◽  
pp. 2853-2859
Author(s):  
Peng Li Li ◽  
Wei Ping Ti ◽  
Jia Chun Li

Due to the broadly application of remote sensing imagery, there is an eager need for the classification of objects in the images. The multi-scale classification based on object oriented analysis is not a usual approach for image classification because the users of multi-scale classification do not know how to use the information from multiple scales to do multi-scale classification. Many users rely on some easily accessible tools. nearest neighbour classifier, to do multi-scale classification. The multi-scale classification classifies the images from different scales. The feature values of the object vary from different scales and they may have some trends against scales. These trends may help us to understand multi-scale classification better. This is the scale dependency of features. The difference between multi-scale classification and single-scale classification is not only multiple scales, but also the use of information from different scales. In order to explore the connection between different scales, the research of new features is necessary.


2016 ◽  
Vol 7 (2) ◽  
pp. 50-60 ◽  
Author(s):  
Xinyue Ye ◽  
Bing She ◽  
Huanyang Zhao ◽  
Xiaoyan Zhou

Research questions in environment science can be decomposed into three basic dimensions: space, time and statistics. The combinations of these three dimensions reflect the diverse perspectives of observations across multiple scales. One can classify these scales into four types: individual, local, meso, and global. Following this multi-dimensional and multi-scale framework, this paper conducts a taxonomic analysis that systematically classifies research questions in environmental science. This taxonomic analysis includes papers from a leading environmental science journal. The results show that the majority of research questions are directed at local and global scale analyses. Studies that incorporate many scales of analysis are not necessarily more sophisticated than studies that investigate a single scale. Nonetheless, it's beneficial to explore more possibilities by investigating data at different perspectives. This taxonomy could help generating research questions and providing guidance for building analytic workflow systems to fill the gaps in future scientific endeavors.


2021 ◽  
Vol 87 (12) ◽  
pp. 901-906
Author(s):  
Bo Yu ◽  
Fang Chen ◽  
Ying Dong ◽  
Lei Wang ◽  
Ning Wang ◽  
...  

Building detection in big earth data by remote sensing is crucial for urban development. However, improving its accuracy remains challenging due to complicated background objects and different viewing angles from various remotely sensed images. The hereto proposed methods predominantly focus on multi-scale feature learning, which omits features in multiple aspect ratios. Moreover, postprocessing is required to refine the segmentation performance. We propose modified semantic segmentation (MSegnet), a single-shot semantic segmentation model based on a matrix of convolution layers to extract features in multiple scales and aspect ratios. MSegnet consists of two modules: backbone feature learning and matrix convolution to conduct vertical and horizontal learning. The matrix convolution comprises a set of convolution operations with different aspect ratios. MSegnet is applied to a public building data set that is widely used for evaluation and shown to achieve satisfactory accuracy, compared with the published single-shot methods.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yingpeng Dai ◽  
Junzheng Wang ◽  
Jiehao Li ◽  
Jing Li

Purpose This paper aims to focus on the environmental perception of unmanned platform under complex street scenes. Unmanned platform has a strict requirement both on accuracy and inference speed. So how to make a trade-off between accuracy and inference speed during the extraction of environmental information becomes a challenge. Design/methodology/approach In this paper, a novel multi-scale depth-wise residual (MDR) module is proposed. This module makes full use of depth-wise separable convolution, dilated convolution and 1-dimensional (1-D) convolution, which is able to extract local information and contextual information jointly while keeping this module small-scale and shallow. Then, based on MDR module, a novel network named multi-scale depth-wise residual network (MDRNet) is designed for fast semantic segmentation. This network could extract multi-scale information and maintain feature maps with high spatial resolution to mitigate the existence of objects at multiple scales. Findings Experiments on Camvid data set and Cityscapes data set reveal that the proposed MDRNet produces competitive results both in terms of computational time and accuracy during inference. Specially, the authors got 67.47 and 68.7% Mean Intersection over Union (MIoU) on Camvid data set and Cityscapes data set, respectively, with only 0.84 million parameters and quicker speed on a single GTX 1070Ti card. Originality/value This research can provide the theoretical and engineering basis for environmental perception on the unmanned platform. In addition, it provides environmental information to support the subsequent works.


Author(s):  
Guibing Guo ◽  
Shichang Ouyang ◽  
Xiaodong He ◽  
Fajie Yuan ◽  
Xiaohua Liu

Sequential recommendation systems have become a research hotpot recently to suggest users with the next item of interest (to interact with). However, existing approaches suffer from two limitations: (1) The representation of an item is relatively static and fixed for all users. We argue that even a same item should be represented distinctively with respect to different users and time steps. (2) The generation of a prediction for a user over an item is computed in a single scale (e.g., by their inner product), ignoring the nature of multi-scale user preferences. To resolve these issues, in this paper we propose two enhancing building blocks for sequential recommendation. Specifically, we devise a Dynamic Item Block (DIB) to learn dynamic item representation by aggregating the embeddings of those who rated the same item before that time step. Then, we come up with a Prediction Enhancing Block (PEB) to project user representation into multiple scales, based on which many predictions can be made and attentively aggregated for enhanced learning. Each prediction is generated by a softmax over a sampled itemset rather than the whole item space for efficiency. We conduct a series of experiments on four real datasets, and show that even a basic model can be greatly enhanced with the involvement of DIB and PEB in terms of ranking accuracy. The code and datasets can be obtained from https://github.com/ouououououou/DIB-PEB-Sequential-RS


2020 ◽  
Vol 19 (03) ◽  
pp. 721-739
Author(s):  
Borui Cai ◽  
Guangyan Huang ◽  
Yong Xiang ◽  
Maia Angelova ◽  
Limin Guo ◽  
...  

Shapelets are subsequences of time-series that represent local patterns and can improve the accuracy and the interpretability of time-series classification. The major task of time-series classification using shapelets is to discover high quality shapelets. However, this is challenging since local patterns may have various scales/lengths rather than a unified scale. In this paper, we resolve this problem by discovering shapelets with multiple scales. We propose a novel Multi-Scale Shapelet Discovery (MSSD) algorithm to discover expressive multi-scale shapelets by extending initial single-scale shapelets (i.e., shapelets with a unified scale). MSSD adopts a bi-directional extension process and is robust to extend single-shapelets obtained by different methods. A supervised shapelet quality measurement is further developed to qualify the extension of shapelets. Comprehensive experiments conducted on 25 UCR time-series datasets show that multi-scale shapelets discovered by MSSD improve classification accuracy by around 10% (in average), compared with single-scale shapelets discovered by counterpart methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 11782-11790
Author(s):  
Zhen-Liang Ni ◽  
Gui-Bin Bian ◽  
Guan-An Wang ◽  
Xiao-Hu Zhou ◽  
Zeng-Guang Hou ◽  
...  

Semantic segmentation of surgical instruments plays a critical role in computer-assisted surgery. However, specular reflection and scale variation of instruments are likely to occur in the surgical environment, undesirably altering visual features of instruments, such as color and shape. These issues make semantic segmentation of surgical instruments more challenging. In this paper, a novel network, Pyramid Attention Aggregation Network, is proposed to aggregate multi-scale attentive features for surgical instruments. It contains two critical modules: Double Attention Module and Pyramid Upsampling Module. Specifically, the Double Attention Module includes two attention blocks (i.e., position attention block and channel attention block), which model semantic dependencies between positions and channels by capturing joint semantic information and global contexts, respectively. The attentive features generated by the Double Attention Module can distinguish target regions, contributing to solving the specular reflection issue. Moreover, the Pyramid Upsampling Module extracts local details and global contexts by aggregating multi-scale attentive features. It learns the shape and size features of surgical instruments in different receptive fields and thus addresses the scale variation issue. The proposed network achieves state-of-the-art performance on various datasets. It achieves a new record of 97.10% mean IOU on Cata7. Besides, it comes first in the MICCAI EndoVis Challenge 2017 with 9.90% increase on mean IOU.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0247388
Author(s):  
Jingfei Hu ◽  
Hua Wang ◽  
Jie Wang ◽  
Yunqi Wang ◽  
Fang He ◽  
...  

Semantic segmentation of medical images provides an important cornerstone for subsequent tasks of image analysis and understanding. With rapid advancements in deep learning methods, conventional U-Net segmentation networks have been applied in many fields. Based on exploratory experiments, features at multiple scales have been found to be of great importance for the segmentation of medical images. In this paper, we propose a scale-attention deep learning network (SA-Net), which extracts features of different scales in a residual module and uses an attention module to enforce the scale-attention capability. SA-Net can better learn the multi-scale features and achieve more accurate segmentation for different medical image. In addition, this work validates the proposed method across multiple datasets. The experiment results show SA-Net achieves excellent performances in the applications of vessel detection in retinal images, lung segmentation, artery/vein(A/V) classification in retinal images and blastocyst segmentation. To facilitate SA-Net utilization by the scientific community, the code implementation will be made publicly available.


Water ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 2398
Author(s):  
Qiankun Liu ◽  
Jingang Jiang ◽  
Changwei Jing ◽  
Zhong Liu ◽  
Jiaguo Qi

Waste load allocation (WLA), as a well-known total pollutant control strategy, is designed to distribute pollution responsibilities among polluters to alleviate environmental problems, but the current policy is unfair and limited to single scale or single pollution types. In this paper, a new, alternative, multi-scale, and multi-pollution WLA modeling framework was developed, with a goal of producing optimal and fair allocation quotas at multiple scales. The new WLA modeling framework integrates multi-constrained environmental Gini coefficients (EGCs) and Delphi-analytic hierarchy process (Delphi-AHP) optimization models to achieve the stated goal. The new WLA modeling framework was applied in a case study in the Xian-jiang watershed in Zhejiang Province, China, in order to test its validity and usefulness. The results, in comparison with existing practices by the local governments, suggest that the simulated pollutant load quota at the watershed scale is much fairer than the existing policies and even has some environmental economic benefits at the pollutant source scale. As the new WLA is a process-based modeling framework, it should be possible to adopt this approach in other similar geographic areas.


Sign in / Sign up

Export Citation Format

Share Document