scholarly journals Multi scale switchable atrous convolution for target detection based on feature pyramid

2022 ◽  
Vol 355 ◽  
pp. 03011
Author(s):  
Cheng Fang ◽  
Ziqiang Hao ◽  
Jiaxin Chen

Repeated observation mechanism can effectively solve the problem of low efficiency of feature extraction. By extracting features for many times to strengthen target features, this paper proposed a multi-scale switchable atrous convolution based on feature pyramid, SPC. The head of the detector adopted pyramid convolution mode, constructs 3-D convolution in the feature pyramid, and detected the same target in different pyramid levels by using the shared convolution with different stride changes, which realized the repeated observation of target features on multi-scale. The module optimized the convolution layer, extracted the features of the same image by convolution check of different sizes, and then selected and integrated the extracted results by using switch function, which effectively expanded the field of view of convolution kernel. In this paper, we choosed retinanet as the baseline network, and improved the loss function of focal loss proposed by retinanet to further solved the problem of unbalanced number of samples and sample distribution in the network model. The proposed method performed well on MS coco data set, improved the average accuracy of 9.8% on the basis of retinanet to 48.9%, and achieved FPS of 5.1 in 1333 * 800 images.

Information ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 278
Author(s):  
Sanlong Jiang ◽  
Shaobo Li ◽  
Qiang Bai ◽  
Jing Yang ◽  
Yanming Miao ◽  
...  

A reasonable grasping strategy is a prerequisite for the successful grasping of a target, and it is also a basic condition for the wide application of robots. Presently, mainstream grippers on the market are divided into two-finger grippers and three-finger grippers. According to human grasping experience, the stability of three-finger grippers is much better than that of two-finger grippers. Therefore, this paper’s focus is on the three-finger grasping strategy generation method based on the DeepLab V3+ algorithm. DeepLab V3+ uses the atrous convolution kernel and the atrous spatial pyramid pooling (ASPP) architecture based on atrous convolution. The atrous convolution kernel can adjust the field-of-view of the filter layer by changing the convolution rate. In addition, ASPP can effectively capture multi-scale information, based on the parallel connection of multiple convolution rates of atrous convolutional layers, so that the model performs better on multi-scale objects. The article innovatively uses the DeepLab V3+ algorithm to generate the grasp strategy of a target and optimizes the atrous convolution parameter values of ASPP. This study used the Cornell Grasp dataset to train and verify the model. At the same time, a smaller and more complex dataset of 60 was produced according to the actual situation. Upon testing, good experimental results were obtained.


2022 ◽  
Vol 14 (2) ◽  
pp. 382
Author(s):  
Yafei Jing ◽  
Yuhuan Ren ◽  
Yalan Liu ◽  
Dacheng Wang ◽  
Linjun Yu

Efficiently and automatically acquiring information on earthquake damage through remote sensing has posed great challenges because the classical methods of detecting houses damaged by destructive earthquakes are often both time consuming and low in accuracy. A series of deep-learning-based techniques have been developed and recent studies have demonstrated their high intelligence for automatic target extraction for natural and remote sensing images. For the detection of small artificial targets, current studies show that You Only Look Once (YOLO) has a good performance in aerial and Unmanned Aerial Vehicle (UAV) images. However, less work has been conducted on the extraction of damaged houses. In this study, we propose a YOLOv5s-ViT-BiFPN-based neural network for the detection of rural houses. Specifically, to enhance the feature information of damaged houses from the global information of the feature map, we introduce the Vision Transformer into the feature extraction network. Furthermore, regarding the scale differences for damaged houses in UAV images due to the changes in flying height, we apply the Bi-Directional Feature Pyramid Network (BiFPN) for multi-scale feature fusion to aggregate features with different resolutions and test the model. We took the 2021 Yangbi earthquake with a surface wave magnitude (Ms) of 6.4 in Yunan, China, as an example; the results show that the proposed model presents a better performance, with the average precision (AP) being increased by 9.31% and 1.23% compared to YOLOv3 and YOLOv5s, respectively, and a detection speed of 80 FPS, which is 2.96 times faster than YOLOv3. In addition, the transferability test for five other areas showed that the average accuracy was 91.23% and the total processing time was 4 min, while 100 min were needed for professional visual interpreters. The experimental results demonstrate that the YOLOv5s-ViT-BiFPN model can automatically detect damaged rural houses due to destructive earthquakes in UAV images with a good performance in terms of accuracy and timeliness, as well as being robust and transferable.


Author(s):  
Xing Chen ◽  
Wenhai Zhang ◽  
Yu Hou ◽  
Lin Yang

Aiming at the low matching accuracy of local stereo matching algorithm in weak texture or discontinuous disparity areas, a stereo matching algorithm combining multi-scale fusion of convolutional neural network (CNN) and feature pyramid structure (FPN) is proposed. The feature pyramid is applied on the basis of the convolutional neural network to realize the multi-scale feature extraction and fusion of the image, which improves the matching similarity of the image blocks. The guide graph filter is used to quickly and effectively complete the cost aggregation. The disparity selection stage adapts the improvement dynamic programming algorithm to obtain the initial disparity map. The initial disparity map is refined so as to obtain the final disparity map. The algorithm is trained and tested on the image provided by Middlebury data set, and the result shows that the disparity map obtained by the algorithm has good effect.


Water ◽  
2021 ◽  
Vol 13 (17) ◽  
pp. 2420
Author(s):  
Pengfei Shi ◽  
Xiwang Xu ◽  
Jianjun Ni ◽  
Yuanxue Xin ◽  
Weisheng Huang ◽  
...  

Underwater organisms are an important part of the underwater ecological environment. More and more attention has been paid to the perception of underwater ecological environment by intelligent means, such as machine vision. However, many objective reasons affect the accuracy of underwater biological detection, such as the low-quality image, different sizes or shapes, and overlapping or occlusion of underwater organisms. Therefore, this paper proposes an underwater biological detection algorithm based on improved Faster-RCNN. Firstly, the ResNet is used as the backbone feature extraction network of Faster-RCNN. Then, BiFPN (Bidirectional Feature Pyramid Network) is used to build a ResNet–BiFPN structure which can improve the capability of feature extraction and multi-scale feature fusion. Additionally, EIoU (Effective IoU) is used to replace IoU to reduce the proportion of redundant bounding boxes in the training data. Moreover, K-means++ clustering is used to generate more suitable anchor boxes to improve detection accuracy. Finally, the experimental results show that the detection accuracy of underwater biological detection algorithm based on improved Faster-RCNN on URPC2018 dataset is improved to 88.94%, which is 8.26% higher than Faster-RCNN. The results fully prove the effectiveness of the proposed algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Guangjun Liu ◽  
Xiaoping Xu ◽  
Xiangjia Yu ◽  
Feng Wang

The unique physical properties of graphite enable it to be applied in various fields of the national economy and people’s livelihood, which has very important industrial value. Many countries have listed graphite as a key mineral. To promote the transformation of the mining industry to informatization and intelligence, the realization of the intelligent recognition of graphite is particularly critical. Aiming at the problems of long time and low efficiency in manually identifying graphite, an improved AlexNet convolution neural network is proposed for graphite image recognition. First, we perform image preprocessing on the data set by means of random cropping, horizontal flipping according to probability, and normalization processing to achieve the purpose of data enhancement. Then we use the activation function ReLU6 to compress the dynamic range to make the algorithm more robust, using the batch standardization algorithm for normalization to speed up the convergence speed, modifying the size of the convolution kernel to enhance the generalization ability, and adding dropout regularization to the fully connected layer to further prevent overfitting. Finally, in the simulation experiment, compared with the existing method, the given method reduces the loss value and improves the average accuracy of identifying graphite.


2020 ◽  
Vol 27 (4) ◽  
pp. 313-320 ◽  
Author(s):  
Xuan Xiao ◽  
Wei-Jie Chen ◽  
Wang-Ren Qiu

Background: The information of quaternary structure attributes of proteins is very important because it is closely related to the biological functions of proteins. With the rapid development of new generation sequencing technology, we are facing a challenge: how to automatically identify the four-level attributes of new polypeptide chains according to their sequence information (i.e., whether they are formed as just as a monomer, or as a hetero-oligomer, or a homo-oligomer). Objective: In this article, our goal is to find a new way to represent protein sequences, thereby improving the prediction rate of protein quaternary structure. Methods: In this article, we developed a prediction system for protein quaternary structural type in which a protein sequence was expressed by combining the Pfam functional-domain and gene ontology. turn protein features into digital sequences, and complete the prediction of quaternary structure through specific machine learning algorithms and verification algorithm. Results: Our data set contains 5495 protein samples. Through the method provided in this paper, we classify proteins into monomer, or as a hetero-oligomer, or a homo-oligomer, and the prediction rate is 74.38%, which is 3.24% higher than that of previous studies. Through this new feature extraction method, we can further classify the four-level structure of proteins, and the results are also correspondingly improved. Conclusion: After the applying the new prediction system, compared with the previous results, we have successfully improved the prediction rate. We have reason to believe that the feature extraction method in this paper has better practicability and can be used as a reference for other protein classification problems.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1906
Author(s):  
Jia-Zheng Jian ◽  
Tzong-Rong Ger ◽  
Han-Hua Lai ◽  
Chi-Ming Ku ◽  
Chiung-An Chen ◽  
...  

Diverse computer-aided diagnosis systems based on convolutional neural networks were applied to automate the detection of myocardial infarction (MI) found in electrocardiogram (ECG) for early diagnosis and prevention. However, issues, particularly overfitting and underfitting, were not being taken into account. In other words, it is unclear whether the network structure is too simple or complex. Toward this end, the proposed models were developed by starting with the simplest structure: a multi-lead features-concatenate narrow network (N-Net) in which only two convolutional layers were included in each lead branch. Additionally, multi-scale features-concatenate networks (MSN-Net) were also implemented where larger features were being extracted through pooling the signals. The best structure was obtained via tuning both the number of filters in the convolutional layers and the number of inputting signal scales. As a result, the N-Net reached a 95.76% accuracy in the MI detection task, whereas the MSN-Net reached an accuracy of 61.82% in the MI locating task. Both networks give a higher average accuracy and a significant difference of p < 0.001 evaluated by the U test compared with the state-of-the-art. The models are also smaller in size thus are suitable to fit in wearable devices for offline monitoring. In conclusion, testing throughout the simple and complex network structure is indispensable. However, the way of dealing with the class imbalance problem and the quality of the extracted features are yet to be discussed.


Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 924
Author(s):  
Zhenzhen Huang ◽  
Qiang Niu ◽  
Ilsun You ◽  
Giovanni Pau

Wearable devices used for human body monitoring has broad applications in smart home, sports, security and other fields. Wearable devices provide an extremely convenient way to collect a large amount of human motion data. In this paper, the human body acceleration feature extraction method based on wearable devices is studied. Firstly, Butterworth filter is used to filter the data. Then, in order to ensure the extracted feature value more accurately, it is necessary to remove the abnormal data in the source. This paper combines Kalman filter algorithm with a genetic algorithm and use the genetic algorithm to code the parameters of the Kalman filter algorithm. We use Standard Deviation (SD), Interval of Peaks (IoP) and Difference between Adjacent Peaks and Troughs (DAPT) to analyze seven kinds of acceleration. At last, SisFall data set, which is a globally available data set for study and experiments, is used for experiments to verify the effectiveness of our method. Based on simulation results, we can conclude that our method can distinguish different activity clearly.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 319
Author(s):  
Yi Wang ◽  
Xiao Song ◽  
Guanghong Gong ◽  
Ni Li

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.


Sign in / Sign up

Export Citation Format

Share Document