Efficient attention-based deep encoder and decoder for automatic crack segmentation

Structural Health Monitoring ◽

10.1177/14759217211053776 ◽

2021 ◽

pp. 147592172110537

Author(s):

Dong H Kang ◽

Young-Jin Cha

Keyword(s):

Neural Networks ◽

Loss Function ◽

Processing Speed ◽

Evaluation Method ◽

Ground Truth ◽

Activation Function ◽

Deep Convolutional Neural Networks ◽

Ground Truth Data ◽

Complex Scenes ◽

Fast Processing

Recently, crack segmentation studies have been investigated using deep convolutional neural networks. However, significant deficiencies remain in the preparation of ground truth data, consideration of complex scenes, development of an object-specific network for crack segmentation, and use of an evaluation method, among other issues. In this paper, a novel semantic transformer representation network (STRNet) is developed for crack segmentation at the pixel level in complex scenes in a real-time manner. STRNet is composed of a squeeze and excitation attention-based encoder, a multi head attention-based decoder, coarse upsampling, a focal-Tversky loss function, and a learnable swish activation function to design the network concisely by keeping its fast-processing speed. A method for evaluating the level of complexity of image scenes was also proposed. The proposed network is trained with 1203 images with further extensive synthesis-based augmentation, and it is investigated with 545 testing images (1280 × 720, 1024 × 512); it achieves 91.7%, 92.7%, 92.2%, and 92.6% in terms of precision, recall, F1 score, and mIoU (mean intersection over union), respectively. Its performance is compared with those of recently developed advanced networks (Attention U-net, CrackSegNet, Deeplab V3+, FPHBN, and Unet++), with STRNet showing the best performance in the evaluation metrics-it achieves the fastest processing at 49.2 frames per second.

Download Full-text

A Face Tracking Method in Videos Based on Convolutional Neural Networks

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418560177 ◽

2018 ◽

Vol 32 (12) ◽

pp. 1856017 ◽

Cited By ~ 1

Author(s):

Zihan Ren ◽

Jianwei Li ◽

Xiaoying Zhang ◽

Shuangyuan Yang ◽

Fuhao Zou

Keyword(s):

Neural Networks ◽

Kalman Filter ◽

Convolutional Neural Networks ◽

Processing Speed ◽

Face Tracking ◽

Light Change ◽

Complex Scenes ◽

Fast Processing ◽

Kalman Filter Algorithm ◽

Realistic Significance

Face tracking in surveillance videos is one of the important issues in the field of computer vision and has realistic significance. In this paper, a new face tracking framework in videos based on convolutional neural networks (CNNs) and Kalman filter algorithm is proposed. The framework uses a rough-to-fine CNN to detect faces in each frame of the video. The rough-to-fine CNN method has a higher accuracy in complex scenes such as face rotation, light change and occlusion. When face tracking fails due to severe occlusion or significant rotation, the framework uses Kalman filter to predict face position. The experimental results show that the proposed method has high precision and fast processing speed.

Download Full-text

Social Video Advertisement Replacement and its Evaluation in Convolutional Neural Networks

ELCVIA Electronic Letters on Computer Vision and Image Analysis ◽

10.5565/rev/elcvia.1347 ◽

2021 ◽

Vol 20 (1) ◽

pp. 117-136

Author(s):

Cheng Yang ◽

Xiang Yu ◽

Arun Kumar ◽

G.G. Md. Nawaz Ali ◽

Peter Han Joo Chong ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Evaluation Method ◽

Learning Algorithm ◽

Ground Truth ◽

Recurrent Network ◽

Human Being ◽

Deep Convolutional Neural Networks ◽

Deep Learning Algorithm ◽

Curve Fitting Method

This paper introduces a method to use deep convolutional neural networks (CNNs) to automatically replace advertisement (AD) photo on social (or self-media) videos and provides the suitable evaluation method to compare different CNNs. An AD photo can replace a picture inside a video. However, if a human being occludes the replaced picture in the original video, the newly pasted AD photo will block the human occluded part. The deep learning algorithm is implemented to segment the human being from the video. The segmented human pixels are then pasted back to the occluded area, so that the AD photo replacement becomes natural and perfect appearance in the video. This process requires the predicted occlusion edge to be closed to the ground truth occlusion edge, so that the AD photo can be occluded naturally. Therefore, this research introduces a curve fitting method to measure the predicted occlusion edge’s error. By using this method, three CNN methods are applied and compared for the AD replacement. They are mask of regions convolutional neural network (Mask RCNN), a recurrent network for video object segmentation (ROVS) and DeeplabV3. The experimental results show the comparative segmentation accuracy of the different models and DeeplabV3 shows the best performance.

Download Full-text

Automatic onset detection using convolutional neural networks

10.5753/sbcm.2019.10446 ◽

2019 ◽

Author(s):

Willy Cornelissen ◽

Maurício Loureiro

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Ground Truth ◽

Onset Detection ◽

Ground Truth Data ◽

Music Transcription ◽

Automatic Music Transcription ◽

Information Research ◽

Music Research ◽

Music Information

A very significant task for music research is to estimate instants when meaningful events begin (onset) and when they end (offset). Onset detection is widely applied in many fields: electrocardiograms, seismographic data, stock market results and many Music Information Research(MIR) tasks, such as Automatic Music Transcription, Rhythm Detection, Speech Recognition, etc. Automatic Onset Detection(AOD) received, recently, a huge contribution coming from Artificial Intelligence (AI) methods, mainly Machine Learning and Deep Learning. In this work, the use of Convolutional Neural Networks (CNN) is explored by adapting its original architecture in order to apply the approach to automatic onset detection on audio musical signals. We used a CNN network for onset detection on a very general dataset, well acknowledged by the MIR community, and examined the accuracy of the method by comparison to ground truth data published by the dataset. The results are promising and outperform another methods of musical onset detection.

Download Full-text

Water Level Estimation in Sewer Pipes Using Deep Convolutional Neural Networks

Water ◽

10.3390/w12123412 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3412

Author(s):

Joakim Bruslund Haurum ◽

Chris H. Bahnsen ◽

Malte Pedersen ◽

Thomas B. Moeslund

Keyword(s):

Neural Networks ◽

Water Level ◽

Convolutional Neural Networks ◽

Contextual Information ◽

Ground Truth ◽

Water Levels ◽

Deep Convolutional Neural Networks ◽

Regression Problem ◽

Still Images ◽

Visual Appearance

Sewer pipe inspections are currently conducted by professionals who remotely control a robot from above ground. This expensive and slow approach is prone to human mistakes. Therefore, there is both an economic and scientific interest in automating the inspection process by creating systems able to recognize sewer defects. However, the extent of research put into automatic water level estimation in sewers has been limited despite being a prerequisite for further analysis of the pipe as only sections above the water level can be visually inspected. In this work, we utilize a dataset of still images obtained from over 5000 inspections carried out for three different Danish water utilities companies. This dataset is used for training and testing decision tree methods and convolutional neural networks (CNNs) for automatic water level estimation. We pose the estimation problem as a classification and regression problem, and compare the results of both approaches. Furthermore, we compare the effect of using different inspection standards for labeling the ground truth water level. By treating the problem as a classification task and using the 2015 Danish sewer inspection standard, where water levels are clustered based on visual appearance, we achieve an averaged F1 score of 79.29% using a fine-tuned ResNet-50 CNN. This shows the potential of using CNNs for water level estimation. We believe including temporal and contextual information will improve the results further.

Download Full-text

Adversarial Erasing method based on graph neural network

Journal of Physics Conference Series ◽

10.1088/1742-6596/2083/4/042083 ◽

2021 ◽

Vol 2083 (4) ◽

pp. 042083

Author(s):

Shuhan Liu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

Ground Truth ◽

Semantic Segmentation ◽

Data Sets ◽

Recent Developments ◽

Weakly Supervised ◽

Network Weight

Abstract Semantic segmentation is a traditional task that requires a large number of pixel-level ground truth label data sets, which is time-consuming and expensive. Recent developments in weakly-supervised settings have shown that reasonable performance can be obtained using only image-level labels. Classification is often used as an agent task to train deep neural networks and extract attention maps from them. The classification task only needs less supervision information to obtain the most discriminative part of the object. For this purpose, we propose a new end-to-end counter-wipe network. Compared with the baseline network, we propose a method to apply the graph neural network to obtain the first CAM. It is proposed to train the joint loss function to avoid the network weight sharing and cause the network to fall into a saddle point. Our experiments on the Pascal VOC2012 dataset show that 64.9% segmentation performance is obtained, which is an improvement of 2.1% compared to our baseline.

Download Full-text

Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic

Sensors ◽

10.3390/s20051515 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1515 ◽

Cited By ~ 3

Author(s):

Marco Cococcioni ◽

Federico Rossi ◽

Emanuele Ruffaldi ◽

Sergio Saponara

Keyword(s):

Neural Networks ◽

Real Time ◽

High Speed ◽

Deep Neural Networks ◽

Number System ◽

Activation Function ◽

Floating Point ◽

Arithmetic Properties ◽

Fast Processing ◽

Point Representation

With increasing real-time constraints being put on the use of Deep Neural Networks (DNNs) by real-time scenarios, there is the need to review information representation. A very challenging path is to employ an encoding that allows a fast processing and hardware-friendly representation of information. Among the proposed alternatives to the IEEE 754 standard regarding floating point representation of real numbers, the recently introduced Posit format has been theoretically proven to be really promising in satisfying the mentioned requirements. However, with the absence of proper hardware support for this novel type, this evaluation can be conducted only through a software emulation. While waiting for the widespread availability of the Posit Processing Units (the equivalent of the Floating Point Unit (FPU)), we can already exploit the Posit representation and the currently available Arithmetic-Logic Unit (ALU) to speed up DNNs by manipulating the low-level bit string representations of Posits. As a first step, in this paper, we present new arithmetic properties of the Posit number system with a focus on the configuration with 0 exponent bits. In particular, we propose a new class of Posit operators called L1 operators, which consists of fast and approximated versions of existing arithmetic operations or functions (e.g., hyperbolic tangent (TANH) and extended linear unit (ELU)) only using integer arithmetic. These operators introduce very interesting properties and results: (i) faster evaluation than the exact counterpart with a negligible accuracy degradation; (ii) an efficient ALU emulation of a number of Posits operations; and (iii) the possibility to vectorize operations in Posits, using existing ALU vectorized operations (such as the scalable vector extension of ARM CPUs or advanced vector extensions on Intel CPUs). As a second step, we test the proposed activation function on Posit-based DNNs, showing how 16-bit down to 10-bit Posits represent an exact replacement for 32-bit floats while 8-bit Posits could be an interesting alternative to 32-bit floats since their performances are a bit lower but their high speed and low storage properties are very appealing (leading to a lower bandwidth demand and more cache-friendly code). Finally, we point out how small Posits (i.e., up to 14 bits long) are very interesting while PPUs become widespread, since Posit operations can be tabulated in a very efficient way (see details in the text).

Download Full-text

Research on Boat Identification Based on Improved Loss Function of Deep Convolutional Neural Networks

2019 WRC Symposium on Advanced Robotics and Automation (WRC SARA) ◽

10.1109/wrc-sara.2019.8931939 ◽

2019 ◽

Author(s):

Yanhua Guo ◽

Shuyu Wang ◽

Hong He ◽

Lei Sun ◽

Shichao Ma

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Loss Function ◽

Deep Convolutional Neural Networks

Download Full-text

Deep vs. shallow networks: An approximation theory perspective

Analysis and Applications ◽

10.1142/s0219530516400042 ◽

2016 ◽

Vol 14 (06) ◽

pp. 829-848 ◽

Cited By ~ 77

Author(s):

H. N. Mhaskar ◽

T. Poggio

Keyword(s):

Neural Networks ◽

Function Approximation ◽

Function Class ◽

Activation Function ◽

Paper Briefly ◽

Deep Convolutional Neural Networks ◽

Gaussian Networks ◽

Hierarchical Architectures ◽

Hidden Layer ◽

Definition Of

The paper briefly reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The paper announces new results for a non-smooth activation function — the ReLU function — used in present-day neural networks, as well as for the Gaussian networks. We propose a new definition of relative dimension to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning.

Download Full-text

Double Additive Margin Softmax Loss for Face Recognition

Applied Sciences ◽

10.3390/app10010060 ◽

2019 ◽

Vol 10 (1) ◽

pp. 60 ◽

Cited By ~ 1

Author(s):

Shengwei Zhou ◽

Caikou Chen ◽

Guojiang Han ◽

Xielian Hou

Keyword(s):

Neural Networks ◽

Face Recognition ◽

Loss Function ◽

State Of The Art ◽

Feature Learning ◽

Loss Functions ◽

Deep Convolutional Neural Networks ◽

Large Margin ◽

Face Features ◽

Geometrical Explanation

Learning large-margin face features whose intra-class variance is small and inter-class diversity is one of important challenges in feature learning applying Deep Convolutional Neural Networks (DCNNs) for face recognition. Recently, an appealing line of research is to incorporate an angular margin in the original softmax loss functions for obtaining discriminative deep features during the training of DCNNs. In this paper we propose a novel loss function, termed as double additive margin Softmax loss (DAM-Softmax). The presented loss has a clearer geometrical explanation and can obtain highly discriminative features for face recognition. Extensive experimental evaluation of several recent state-of-the-art softmax loss functions are conducted on the relevant face recognition benchmarks, CASIA-Webface, LFW, CALFW, CPLFW, and CFP-FP. We show that the proposed loss function consistently outperforms the state-of-the-art.

Download Full-text

Mis-Classified Vector Guided Softmax Loss for Face Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6906 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12241-12248 ◽

Cited By ~ 3

Author(s):

Xiaobo Wang ◽

Shifeng Zhang ◽

Shuo Wang ◽

Tianyu Fu ◽

Hailin Shi ◽

...

Keyword(s):

Face Recognition ◽

Loss Function ◽

State Of The Art ◽

Feature Learning ◽

Ground Truth ◽

Significant Progress ◽

Deep Convolutional Neural Networks ◽

Face Features ◽

Discriminative Feature ◽

Feature Mining

Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (e.g., angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. Our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.

Download Full-text