scholarly journals EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning

2021 ◽  
Vol 11 (19) ◽  
pp. 9119
Author(s):  
Seokyong Shin ◽  
Sanghun Lee ◽  
Hyunho Han

Segmentation of street scenes is a key technology in the field of autonomous vehicles. However, conventional segmentation methods achieve low accuracy because of the complexity of street landscapes. Therefore, we propose an efficient atrous residual network (EAR-Net) to improve accuracy while maintaining computation costs. First, we performed feature extraction and restoration, utilizing depthwise separable convolution (DSConv) and interpolation. Compared with conventional methods, DSConv and interpolation significantly reduce computation costs while minimizing performance degradation. Second, we utilized residual learning and atrous spatial pyramid pooling (ASPP) to achieve high accuracy. Residual learning increases the ability to extract context information by preventing the problem of feature and gradient losses. In addition, ASPP extracts additional context information while maintaining the resolution of the feature map. Finally, to alleviate the class imbalance between the image background and objects and to improve learning efficiency, we utilized focal loss. We evaluated EAR-Net on the Cityscapes dataset, which is commonly used for street scene segmentation studies. Experimental results showed that the EAR-Net had better segmentation results and similar computation costs as the conventional methods. We also conducted an ablation study to analyze the contributions of the ASPP and DSConv in the EAR-Net.

Author(s):  
Mhafuzul Islam ◽  
Mashrur Chowdhury ◽  
Hongda Li ◽  
Hongxin Hu

Vision-based navigation of autonomous vehicles primarily depends on the deep neural network (DNN) based systems in which the controller obtains input from sensors/detectors, such as cameras, and produces a vehicle control output, such as a steering wheel angle to navigate the vehicle safely in a roadway traffic environment. Typically, these DNN-based systems in the autonomous vehicle are trained through supervised learning; however, recent studies show that a trained DNN-based system can be compromised by perturbation or adverse inputs. Similarly, this perturbation can be introduced into the DNN-based systems of autonomous vehicles by unexpected roadway hazards, such as debris or roadblocks. In this study, we first introduce a hazardous roadway environment that can compromise the DNN-based navigational system of an autonomous vehicle, and produce an incorrect steering wheel angle, which could cause crashes resulting in fatality or injury. Then, we develop a DNN-based autonomous vehicle driving system using object detection and semantic segmentation to mitigate the adverse effect of this type of hazard, which helps the autonomous vehicle to navigate safely around such hazards. We find that our developed DNN-based autonomous vehicle driving system, including hazardous object detection and semantic segmentation, improves the navigational ability of an autonomous vehicle to avoid a potential hazard by 21% compared with the traditional DNN-based autonomous vehicle driving system.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1402
Author(s):  
Taehee Lee ◽  
Yeohwan Yoon ◽  
Chanjun Chun ◽  
Seungki Ryu

Poor road-surface conditions pose a significant safety risk to vehicle operation, especially in the case of autonomous vehicles. Hence, maintenance of road surfaces will become even more important in the future. With the development of deep learning-based computer image processing technology, artificial intelligence models that evaluate road conditions are being actively researched. However, as the lighting conditions of the road surface vary depending on the weather, the model performance may degrade for an image whose brightness falls outside the range of the learned image, even for the same road. In this study, a semantic segmentation model with an autoencoder structure was developed for detecting road surface along with a CNN-based image preprocessing model. This setup ensures better road-surface crack detection by adjusting the image brightness before it is input into the road-crack detection model. When the preprocessing model was applied, the road-crack segmentation model exhibited consistent performance even under varying brightness values.


2020 ◽  
Vol 13 (1) ◽  
pp. 71
Author(s):  
Zhiyong Xu ◽  
Weicun Zhang ◽  
Tianxiang Zhang ◽  
Jiangyun Li

Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay little attention to global context information. Moreover, the imbalance of category scale and uncertain boundary information meanwhile exists in RSIs, which also brings a challenging problem to the semantic segmentation task. To overcome these problems, a high-resolution context extraction network (HRCNet) based on a high-resolution network (HRNet) is proposed in this paper. In this approach, the HRNet structure is adopted to keep the spatial information. Moreover, the light-weight dual attention (LDA) module is designed to obtain global context information in the feature extraction stage and the feature enhancement feature pyramid (FEFP) structure is promoted and employed to fuse the contextual information of different scales. In addition, to achieve the boundary information, we design the boundary aware (BA) module combined with the boundary aware loss (BAloss) function. The experimental results evaluated on Potsdam and Vaihingen datasets show that the proposed approach can significantly improve the boundary and segmentation performance up to 92.0% and 92.3% on overall accuracy scores, respectively. As a consequence, it is envisaged that the proposed HRCNet model will be an advantage in remote sensing images segmentation.


2020 ◽  
Vol 1682 ◽  
pp. 012077
Author(s):  
Tingting Li ◽  
Chunshan Jiang ◽  
Zhenqi Bian ◽  
Mingchang Wang ◽  
Xuefeng Niu

2021 ◽  
Vol 10 (3) ◽  
pp. 125
Author(s):  
Junqing Huang ◽  
Liguo Weng ◽  
Bingyu Chen ◽  
Min Xia

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


Sign in / Sign up

Export Citation Format

Share Document