scholarly journals An Efficient Convolutional Neural Network Model Combined with Attention Mechanism for Inverse Halftoning

Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1574
Author(s):  
Linhao Shao ◽  
Erhu Zhang ◽  
Mei Li

Inverse halftoning acting as a special image restoration problem is an ill-posed problem. Although it has been studied in the last several decades, the existing solutions can’t restore fine details and texture accurately from halftone images. Recently, the attention mechanism has shown its powerful effects in many fields, such as image processing, pattern recognition and computer vision. However, it has not yet been used in inverse halftoning. To better solve the problem of detail restoration of inverse halftoning, this paper proposes a simple yet effective deep learning model combined with the attention mechanism, which can better guide the network to remove noise dot-patterns and restore image details, and improve the network adaptation ability. The whole model is designed in an end-to-end manner, including feature extraction stage and reconstruction stage. In the feature extraction stage, halftone image features are extracted and halftone noises are removed. The reconstruction stage is employed to restore continuous-tone images by fusing the feature information extracted in the first stage and the output of the residual channel attention block. In this stage, the attention block is firstly introduced to the field of inverse halftoning, which can make the network focus on informative features and further enhance the discriminative ability of the network. In addition, a multi-stage loss function is proposed to accelerate the network optimization, which is conducive to better reconstruction of the global image. To demonstrate the generalization performance of the network for different types of halftone images, the experiment results confirm that the network can restore six different types of halftone image well. Furthermore, experimental results show that our method outperforms the state-of-the-art methods, especially in the restoration of details and textures.

2020 ◽  
Vol 2020 (15) ◽  
pp. 196-1-196-7 ◽  
Author(s):  
Ziyi Zhao ◽  
Yujian Xu ◽  
Robert Ulichney ◽  
Matthew Gaubatz ◽  
Stephen Pollard ◽  
...  

An alignment approach for data-bearing halftone images, which are a visually pleasant alternative to barcodes, is proposed in this paper. In this paper, we address the alignment problem of data-bearing halftone images on a 3D surface. Different types of surfaces have been tested , using our proposed approach, and high accuracy results have been achieved. Additionally, we also develop a data retrieval tool from an aligned image, in order to decode the data embedded in the original image. A system to assess the accuracy of alignment is introduced to quantify the effectiveness of the proposed alignment approach.


2021 ◽  
Vol 13 (13) ◽  
pp. 2457
Author(s):  
Xuan Wu ◽  
Zhijie Zhang ◽  
Wanchang Zhang ◽  
Yaning Yi ◽  
Chuanrong Zhang ◽  
...  

Convolutional neural network (CNN) is capable of automatically extracting image features and has been widely used in remote sensing image classifications. Feature extraction is an important and difficult problem in current research. In this paper, data augmentation for avoiding over fitting was attempted to enrich features of samples to improve the performance of a newly proposed convolutional neural network with UC-Merced and RSI-CB datasets for remotely sensed scene classifications. A multiple grouped convolutional neural network (MGCNN) for self-learning that is capable of promoting the efficiency of CNN was proposed, and the method of grouping multiple convolutional layers capable of being applied elsewhere as a plug-in model was developed. Meanwhile, a hyper-parameter C in MGCNN is introduced to probe into the influence of different grouping strategies for feature extraction. Experiments on the two selected datasets, the RSI-CB dataset and UC-Merced dataset, were carried out to verify the effectiveness of this newly proposed convolutional neural network, the accuracy obtained by MGCNN was 2% higher than the ResNet-50. An algorithm of attention mechanism was thus adopted and incorporated into grouping processes and a multiple grouped attention convolutional neural network (MGCNN-A) was therefore constructed to enhance the generalization capability of MGCNN. The additional experiments indicate that the incorporation of the attention mechanism to MGCNN slightly improved the accuracy of scene classification, but the robustness of the proposed network was enhanced considerably in remote sensing image classifications.


Author(s):  
RAINA RAJU K ◽  
S. Swapna Kumar

Skin cancer is one of the most fatal disease. It is easily curable, when it is detected in its beginning stage. Early detection of melanoma through accurate techniques and innovative technologies has the greatest potential for decreasing mortality associated with this disease. Mainly there are four steps for detecting melanoma which includes preprocessing, segmentation, feature extraction and classification. The preprocessing stage will remove all the artifacts associated with the lesion. The exact boundaries of lesion are identified from normal skin through segmentation method. Feature extraction stage is used for calculating and obtaining different parameters of the lesion region. The final stage is to classify the lesion as benign or malignant.  In this paper different types of segmentation methods and classification methods are described. Both of these stages are accurately implemented to reach the final detection of the lesion.


2020 ◽  
Vol 10 (4) ◽  
pp. 1521
Author(s):  
Mei Li ◽  
Erhu Zhang ◽  
Yutong Wang ◽  
Jinghong Duan ◽  
Cuining Jing

Inverse halftoning is an ill-posed problem that refers to the problem of restoring continuous-tone images from their halftone versions. Although much progress has been achieved over the last decades, the restored images still suffer from detail loss and visual artifacts. Recent studies show that inverse halftoning methods based on deep learning are superior to other traditional methods, and thus this paper aimed to systematically review the inverse halftone methods based on deep learning, so as to provide a reference for the development of inverse halftoning. In this paper, we firstly proposed a classification method for inverse halftoning methods on the basis of the source of halftone images. Then, two types of inverse halftoning methods for digital halftone images and scanned halftone images were investigated in terms of network architecture, loss functions, and training strategies. Furthermore, we studied existing image quality evaluation including subjective and objective evaluation by experiments. The evaluation results demonstrated that methods based on multiple subnetworks and methods based on multi-stage strategies are superior to other methods. In addition, the perceptual loss and the gradient loss are helpful for improving the quality of restored images. Finally, we gave the future research directions by analyzing the shortcomings of existing inverse halftoning methods.


Author(s):  
Wei Li ◽  
Haiyu Song ◽  
Pengjie Wang

Traffic sign recognition (TSR) is the basic technology of the Advanced Driving Assistance System (ADAS) and intelligent automobile, whileas high-qualified feature vector plays a key role in TSR. Therefore, the feature extraction of TSR has become an active research in the fields of computer vision and intelligent automobiles. Although deep learning features have made a breakthrough in image classification, it is difficult to apply to TSR because of its large scale of training dataset and high space-time complexity of model training. Considering visual characteristics of traffic signs and external factors such as weather, light, and blur in real scenes, an efficient method to extract high-qualified image features is proposed. As a result, the lower-dimension feature can accurately depict the visual feature of TSR due to powerful descriptive and discriminative ability. In addition, benefiting from a simple feature extraction method and lower time cost, our method is suitable to recognize traffic signs online in real-world applications scenarios. Extensive quantitative experimental results demonstrate the effectiveness and efficiency of our method.


Author(s):  
Raina Raju K ◽  
S. Swapna Kumar

Skin cancer is one of the most fatal disease. It is easily curable, when it is detected in its beginning stage. Early detection of melanoma through accurate techniques and innovative technologies has the greatest potential for decreasing mortality associated with this disease. Mainly there are four steps for detecting melanoma which includes preprocessing, segmentation, feature extraction and classification. The preprocessing stage will remove all the artifacts associated with the lesion. The exact boundaries of lesion are identified from normal skin through segmentation method. Feature extraction stage is used for calculating and obtaining different parameters of the lesion region. The final stage is to classify the lesion as benign or malignant.  In this paper different types of segmentation methods and classification methods are described. Both of these stages are accurately implemented to reach the final detection of the lesion.


2019 ◽  
Vol 9 (8) ◽  
pp. 1599 ◽  
Author(s):  
Yuanyao Lu ◽  
Hongbo Li

With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment, the user’s state can be successfully captured through lip movements, thereby analyzing the user’s real-time thinking. Due to complex image processing, hard-to-train classifiers and long-term recognition processes, the traditional lip-reading recognition system is difficult to meet the requirements of practical applications. In this paper, the convolutional neural network (CNN) used to image feature extraction is combined with a recurrent neural network (RNN) based on attention mechanism for automatic lip-reading recognition. Our proposed method for automatic lip-reading recognition can be divided into three steps. Firstly, we extract keyframes from our own established independent database (English pronunciation of numbers from zero to nine by three males and three females). Then, we use the Visual Geometry Group (VGG) network to extract the lip image features. It is found that the image feature extraction results are fault-tolerant and effective. Finally, we compare two lip-reading models: (1) a fusion model with an attention mechanism and (2) a fusion model of two networks. The results show that the accuracy of the proposed model is 88.2% in the test dataset and 84.9% for the contrastive model. Therefore, our proposed method is superior to the traditional lip-reading recognition methods and the general neural networks.


2021 ◽  
Vol 13 (14) ◽  
pp. 2686
Author(s):  
Di Wei ◽  
Yuang Du ◽  
Lan Du ◽  
Lu Li

The existing Synthetic Aperture Radar (SAR) image target detection methods based on convolutional neural networks (CNNs) have achieved remarkable performance, but these methods require a large number of target-level labeled training samples to train the network. Moreover, some clutter is very similar to targets in SAR images with complex scenes, making the target detection task very difficult. Therefore, a SAR target detection network based on a semi-supervised learning and attention mechanism is proposed in this paper. Since the image-level label simply marks whether the image contains the target of interest or not, which is easier to be labeled than the target-level label, the proposed method uses a small number of target-level labeled training samples and a large number of image-level labeled training samples to train the network with a semi-supervised learning algorithm. The proposed network consists of a detection branch and a scene recognition branch with a feature extraction module and an attention module shared between these two branches. The feature extraction module can extract the deep features of the input SAR images, and the attention module can guide the network to focus on the target of interest while suppressing the clutter. During the semi-supervised learning process, the target-level labeled training samples will pass through the detection branch, while the image-level labeled training samples will pass through the scene recognition branch. During the test process, considering the help of global scene information in SAR images for detection, a novel coarse-to-fine detection procedure is proposed. After the coarse scene recognition determining whether the input SAR image contains the target of interest or not, the fine target detection is performed on the image that may contain the target. The experimental results based on the measured SAR dataset demonstrate that the proposed method can achieve better performance than the existing methods.


Author(s):  
Huimin Lu ◽  
Rui Yang ◽  
Zhenrong Deng ◽  
Yonglin Zhang ◽  
Guangwei Gao ◽  
...  

Chinese image description generation tasks usually have some challenges, such as single-feature extraction, lack of global information, and lack of detailed description of the image content. To address these limitations, we propose a fuzzy attention-based DenseNet-BiLSTM Chinese image captioning method in this article. In the proposed method, we first improve the densely connected network to extract features of the image at different scales and to enhance the model’s ability to capture the weak features. At the same time, a bidirectional LSTM is used as the decoder to enhance the use of context information. The introduction of an improved fuzzy attention mechanism effectively improves the problem of correspondence between image features and contextual information. We conduct experiments on the AI Challenger dataset to evaluate the performance of the model. The results show that compared with other models, our proposed model achieves higher scores in objective quantitative evaluation indicators, including BLEU , BLEU , METEOR, ROUGEl, and CIDEr. The generated description sentence can accurately express the image content.


2020 ◽  
Vol 13 (1) ◽  
pp. 71
Author(s):  
Zhiyong Xu ◽  
Weicun Zhang ◽  
Tianxiang Zhang ◽  
Jiangyun Li

Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay little attention to global context information. Moreover, the imbalance of category scale and uncertain boundary information meanwhile exists in RSIs, which also brings a challenging problem to the semantic segmentation task. To overcome these problems, a high-resolution context extraction network (HRCNet) based on a high-resolution network (HRNet) is proposed in this paper. In this approach, the HRNet structure is adopted to keep the spatial information. Moreover, the light-weight dual attention (LDA) module is designed to obtain global context information in the feature extraction stage and the feature enhancement feature pyramid (FEFP) structure is promoted and employed to fuse the contextual information of different scales. In addition, to achieve the boundary information, we design the boundary aware (BA) module combined with the boundary aware loss (BAloss) function. The experimental results evaluated on Potsdam and Vaihingen datasets show that the proposed approach can significantly improve the boundary and segmentation performance up to 92.0% and 92.3% on overall accuracy scores, respectively. As a consequence, it is envisaged that the proposed HRCNet model will be an advantage in remote sensing images segmentation.


Sign in / Sign up

Export Citation Format

Share Document