scholarly journals Multi-attribute self-attention guided vehicle local region detection based on convolutional neural network architecture

2020 ◽  
Vol 17 (4) ◽  
pp. 172988142094434
Author(s):  
Jingbo Chen ◽  
Shengyong Chen ◽  
Linjie Bian

Many pieces of information are included in the front region of a vehicle, especially in windshield and bumper regions. Thus, windshield or bumper region detection is making sense to extract useful information. But the existing windshield and bumper detection methods based on traditional artificial features are not robust enough. Those features may become invalid in many real situations (e.g. occlude, illumination change, viewpoint change.). In this article, we propose a multi-attribute-guided vehicle discriminately region detection method based on convolutional neural network and not rely on bounding box regression. We separate the net into two branches, respectively, for identification (ID) and Model attributes training. Therefore, the feature spaces of different attributes become more independent. Additionally, we embed a self-attention block into our framework to improve the performance of local region detection. We train our model on PKU_VD data set which has a huge number of images inside. Furthermore, we labeled the handcrafted bounding boxes on 5000 randomly picked testing images, and 1020 of them are used for evaluation and 3980 as the training data for YOLOv3. We use Intersection over Union for quantitative evaluation. Experiments were conducted in three different latest convolutional neural network trunks to illustrate the detection performance of the proposed method. Simultaneously, in terms of quantitative evaluation, the performance of our method is close to YOLOv3 even without handcrafted bounding boxes.

2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Yinjie Xie ◽  
Wenxin Dai ◽  
Zhenxin Hu ◽  
Yijing Liu ◽  
Chuan Li ◽  
...  

Among many improved convolutional neural network (CNN) architectures in the optical image classification, only a few were applied in synthetic aperture radar (SAR) automatic target recognition (ATR). One main reason is that direct transfer of these advanced architectures for the optical images to the SAR images easily yields overfitting due to its limited data set and less features relative to the optical images. Thus, based on the characteristics of the SAR image, we proposed a novel deep convolutional neural network architecture named umbrella. Its framework consists of two alternate CNN-layer blocks. One block is a fusion of six 3-layer paths, which is used to extract diverse level features from different convolution layers. The other block is composed of convolution layers and pooling layers are mainly utilized to reduce dimensions and extract hierarchical feature information. The combination of the two blocks could extract rich features from different spatial scale and simultaneously alleviate overfitting. The performance of the umbrella model was validated by the Moving and Stationary Target Acquisition and Recognition (MSTAR) benchmark data set. This architecture could achieve higher than 99% accuracy for the classification of 10-class targets and higher than 96% accuracy for the classification of 8 variants of the T72 tank, even in the case of diverse positions located by targets. The accuracy of our umbrella is superior to the current networks applied in the classification of MSTAR. The result shows that the umbrella architecture possesses a very robust generalization capability and will be potential for SAR-ART.


2020 ◽  
Vol 10 (7) ◽  
pp. 1494-1505
Author(s):  
Hyo-Hun Kim ◽  
Byung-Woo Hong

In this work, we present an image segmentation algorithm based on the convolutional neural network framework where the scale space theory is incorporated in the course of training procedure. The construction of data augmentation is designed to apply the scale space to the training data in order to effectively deal with the variability of regions of interest in geometry and appearance such as shape and contrast. The proposed data augmentation algorithm via scale space is aimed to improve invariant features with respect to both geometry and appearance by taking into consideration of their diffusion process. We develop a segmentation algorithm based on the convolutional neural network framework where the network architecture consists of encoding and decoding substructures in combination with the data augmentation scheme via the scale space induced by the heat equation. The quantitative analysis using the cardiac MRI dataset indicates that the proposed algorithm achieves better accuracy in the delineation of the left ventricles, which demonstrates the potential of the algorithm in the application of the whole heart segmentation as a compute-aided diagnosis system for the cardiac diseases.


2020 ◽  
Vol 9 (4) ◽  
pp. 1430-1437
Author(s):  
Mohammad Arif Rasyidi ◽  
Taufiqotul Bariyah

Batik is one of Indonesia's cultures that is well-known worldwide. Batik is a fabric that is painted using canting and liquid wax so that it forms patterns of high artistic value. In this study, we applied the convolutional neural network (CNN) to identify six batik patterns, namely Banji, Ceplok, Kawung, Mega Mendung, Parang, and Sekar Jagad. 994 images from the 6 categories were collected and then divided into training and test data with a ratio of 8:2. Image augmentation was also done to provide variations in training data as well as to prevent overfitting. Experimental results on the test data showed that CNN produced an excellent performance as indicated by accuracy of 94% and top-2 accuracy of 99% which was obtained using the DenseNet network architecture.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lixing Huang ◽  
Jietao Diao ◽  
Hongshan Nie ◽  
Wei Wang ◽  
Zhiwei Li ◽  
...  

The memristor-based convolutional neural network (CNN) gives full play to the advantages of memristive devices, such as low power consumption, high integration density, and strong network recognition capability. Consequently, it is very suitable for building a wearable embedded application system and has broad application prospects in image classification, speech recognition, and other fields. However, limited by the manufacturing process of memristive devices, high-precision weight devices are currently difficult to be applied in large-scale. In the same time, high-precision neuron activation function also further increases the complexity of network hardware implementation. In response to this, this paper proposes a configurable full-binary convolutional neural network (CFB-CNN) architecture, whose inputs, weights, and neurons are all binary values. The neurons are proportionally configured to two modes for different non-ideal situations. The architecture performance is verified based on the MNIST data set, and the influence of device yield and resistance fluctuations under different neuron configurations on network performance is also analyzed. The results show that the recognition accuracy of the 2-layer network is about 98.2%. When the yield rate is about 64% and the hidden neuron mode is configured as −1 and +1, namely ±1 MD, the CFB-CNN architecture achieves about 91.28% recognition accuracy. Whereas the resistance variation is about 26% and the hidden neuron mode configuration is 0 and 1, namely 01 MD, the CFB-CNN architecture gains about 93.43% recognition accuracy. Furthermore, memristors have been demonstrated as one of the most promising devices in neuromorphic computing for its synaptic plasticity. Therefore, the CFB-CNN architecture based on memristor is SNN-compatible, which is verified using the number of pulses to encode pixel values in this paper.


MRS Advances ◽  
2019 ◽  
Vol 4 (19) ◽  
pp. 1109-1117 ◽  
Author(s):  
Pankaj Rajak ◽  
Rajiv K. Kalia ◽  
Aiichiro Nakano ◽  
Priya Vashishta

AbstractDynamic fracture of a two-dimensional MoWSe2 membrane is studied with molecular dynamics (MD) simulation. The system consists of a random distribution of WSe2 patches in a pre-cracked matrix of MoSe2. Under strain, the system shows toughening due to crack branching, crack closure and strain-induced structural phase transformation from 2H to 1T crystal structures. Different structures generated during MD simulation are analyzed using a three-layer, feed-forward neural network (NN) model. A training data set of 36,000 atoms is created where each atom is represented by a 50-dimension feature vector consisting of radial and angular symmetry functions. Hyper parameters of the symmetry functions and network architecture are tuned to minimize model complexity with high predictive power using feature learning, which shows an increase in model accuracy from 67% to 95%. The NN model classifies each atom in one of the six phases which are either as transition metal or chalcogen atoms in 2H phase, 1T phase and defects. Further t-SNE analyses of learned representation of these phases in the hidden layers of the NN model show that separation of all phases become clearer in the third layer than in layers 1 and 2.


2021 ◽  
Vol 22 (8) ◽  
pp. 4023
Author(s):  
Huimin Shen ◽  
Youzhi Zhang ◽  
Chunhou Zheng ◽  
Bing Wang ◽  
Peng Chen

Accurate prediction of binding affinity between protein and ligand is a very important step in the field of drug discovery. Although there are many methods based on different assumptions and rules do exist, prediction performance of protein–ligand binding affinity is not satisfactory so far. This paper proposes a new cascade graph-based convolutional neural network architecture by dealing with non-Euclidean irregular data. We represent the molecule as a graph, and use a simple linear transformation to deal with the sparsity problem of the one-hot encoding of original data. The first stage adopts ARMA graph convolutional neural network to learn the characteristics of atomic space in the protein–ligand complex. In the second stage, one variant of the MPNN graph convolutional neural network is introduced with chemical bond information and interactive atomic features. Finally, the architecture passes through the global add pool and the fully connected layer, and outputs a constant value as the predicted binding affinity. Experiments on the PDBbind v2016 data set showed that our method is better than most of the current methods. Our method is also comparable to the state-of-the-art method on the data set, and is more intuitive and simple.


The project “Disease Prediction Model” focuses on predicting the type of skin cancer. It deals with constructing a Convolutional Neural Network(CNN) sequential model in order to find the type of a skin cancer which takes a huge troll on mankind well-being. Since development of programmed methods increases the accuracy at high scale for identifying the type of skin cancer, we use Convolutional Neural Network, CNN algorithm in order to build our model . For this we make use of a sequential model. The data set that we have considered for this project is collected from NCBI, which is well known as HAM10000 dataset, it consists of massive amounts of information regarding several dermatoscopic images of most trivial pigmented lesions of skin which are collected from different sufferers. Once the dataset is collected, cleaned, it is split into training and testing data sets. We used CNN to build our model and using the training data we trained the model , later using the testing data we tested the model. Once the model is implemented over the testing data, plots are made in order to analyze the relation between the echos and loss function. It is also used to analyse accuracy and echos for both training and testing data.


2020 ◽  
Vol 13 (6) ◽  
pp. 2631-2644 ◽  
Author(s):  
Georgy Ayzel ◽  
Tobias Scheffer ◽  
Maik Heistermann

Abstract. In this study, we present RainNet, a deep convolutional neural network for radar-based precipitation nowcasting. Its design was inspired by the U-Net and SegNet families of deep learning models, which were originally designed for binary segmentation tasks. RainNet was trained to predict continuous precipitation intensities at a lead time of 5 min, using several years of quality-controlled weather radar composites provided by the German Weather Service (DWD). That data set covers Germany with a spatial domain of 900 km×900 km and has a resolution of 1 km in space and 5 min in time. Independent verification experiments were carried out on 11 summer precipitation events from 2016 to 2017. In order to achieve a lead time of 1 h, a recursive approach was implemented by using RainNet predictions at 5 min lead times as model inputs for longer lead times. In the verification experiments, trivial Eulerian persistence and a conventional model based on optical flow served as benchmarks. The latter is available in the rainymotion library and had previously been shown to outperform DWD's operational nowcasting model for the same set of verification events. RainNet significantly outperforms the benchmark models at all lead times up to 60 min for the routine verification metrics mean absolute error (MAE) and the critical success index (CSI) at intensity thresholds of 0.125, 1, and 5 mm h−1. However, rainymotion turned out to be superior in predicting the exceedance of higher intensity thresholds (here 10 and 15 mm h−1). The limited ability of RainNet to predict heavy rainfall intensities is an undesirable property which we attribute to a high level of spatial smoothing introduced by the model. At a lead time of 5 min, an analysis of power spectral density confirmed a significant loss of spectral power at length scales of 16 km and below. Obviously, RainNet had learned an optimal level of smoothing to produce a nowcast at 5 min lead time. In that sense, the loss of spectral power at small scales is informative, too, as it reflects the limits of predictability as a function of spatial scale. Beyond the lead time of 5 min, however, the increasing level of smoothing is a mere artifact – an analogue to numerical diffusion – that is not a property of RainNet itself but of its recursive application. In the context of early warning, the smoothing is particularly unfavorable since pronounced features of intense precipitation tend to get lost over longer lead times. Hence, we propose several options to address this issue in prospective research, including an adjustment of the loss function for model training, model training for longer lead times, and the prediction of threshold exceedance in terms of a binary segmentation task. Furthermore, we suggest additional input data that could help to better identify situations with imminent precipitation dynamics. The model code, pretrained weights, and training data are provided in open repositories as an input for such future studies.


Author(s):  
Cansu Görürgöz ◽  
Kaan Orhan ◽  
Ibrahim Sevki Bayrakdar ◽  
Özer Çelik ◽  
Elif Bilgir ◽  
...  

Objectives: The present study aimed to evaluate the performance of a Faster Region-based Convolutional Neural Network (R-CNN) algorithm for tooth detection and numbering on periapical images. Methods: The data sets of 1686 randomly selected periapical radiographs of patients were collected retrospectively. A pre-trained model (GoogLeNet Inception v3 CNN) was employed for pre-processing, and transfer learning techniques were applied for data set training. The algorithm consisted of: (1) the Jaw classification model, (2) Region detection models, and (3) the Final algorithm using all models. Finally, an analysis of the latest model has been integrated alongside the others. The sensitivity, precision, true-positive rate, and false-positive/negative rate were computed to analyze the performance of the algorithm using a confusion matrix. Results: An artificial intelligence algorithm (CranioCatch, Eskisehir-Turkey) was designed based on R-CNN inception architecture to automatically detect and number the teeth on periapical images. Of 864 teeth in 156 periapical radiographs, 668 were correctly numbered in the test data set. The F1 score, precision, and sensitivity were 0.8720, 0.7812, and 0.9867, respectively. Conclusion: The study demonstrated the potential accuracy and efficiency of the CNN algorithm for detecting and numbering teeth. The deep learning-based methods can help clinicians reduce workloads, improve dental records, and reduce turnaround time for urgent cases. This architecture might also contribute to forensic science.


Geophysics ◽  
2019 ◽  
Vol 85 (1) ◽  
pp. V33-V43 ◽  
Author(s):  
Min Jun Park ◽  
Mauricio D. Sacchi

Velocity analysis can be a time-consuming task when performed manually. Methods have been proposed to automate the process of velocity analysis, which, however, typically requires significant manual effort. We have developed a convolutional neural network (CNN) to estimate stacking velocities directly from the semblance. Our CNN model uses two images as one input data for training. One is an entire semblance (guide image), and the other is a small patch (target image) extracted from the semblance at a specific time step. Labels for each input data set are the root mean square velocities. We generate the training data set using synthetic data. After training the CNN model with synthetic data, we test the trained model with another synthetic data that were not used in the training step. The results indicate that the model can predict a consistent velocity model. We also noticed that when the input data are extremely different from those used for the training, the CNN model will hardly pick the correct velocities. In this case, we adopt transfer learning to update the trained model (base model) with a small portion of the target data to improve the accuracy of the predicted velocity model. A marine data set from the Gulf of Mexico is used for validating our new model. The updated model performed a reasonable velocity analysis in seconds.


Sign in / Sign up

Export Citation Format

Share Document