scholarly journals Person Reidentification Model Based on Multiattention Modules and Multiscale Residuals

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yongyi Li ◽  
Shiqi Wang ◽  
Shuang Dong ◽  
Xueling Lv ◽  
Changzhi Lv ◽  
...  

At present, person reidentification based on attention mechanism has attracted many scholars’ interests. Although attention module can improve the representation ability and reidentification accuracy of Re-ID model to a certain extent, it depends on the coupling of attention module and original network. In this paper, a person reidentification model that combines multiple attentions and multiscale residuals is proposed. The model introduces combined attention fusion module and multiscale residual fusion module in the backbone network ResNet 50 to enhance the feature flow between residual blocks and better fuse multiscale features. Furthermore, a global branch and a local branch are designed and applied to enhance the channel aggregation and position perception ability of the network by utilizing the dual ensemble attention module, as along as the fine-grained feature expression is obtained by using multiproportion block and reorganization. Thus, the global and local features are enhanced. The experimental results on Market-1501 dataset and DukeMTMC-reID dataset show that the indexes of the presented model, especially Rank-1 accuracy, reach 96.20% and 89.59%, respectively, which can be considered as a progress in Re-ID.

2021 ◽  
Vol 11 (5) ◽  
pp. 2174
Author(s):  
Xiaoguang Li ◽  
Feifan Yang ◽  
Jianglu Huang ◽  
Li Zhuo

Images captured in a real scene usually suffer from complex non-uniform degradation, which includes both global and local blurs. It is difficult to handle the complex blur variances by a unified processing model. We propose a global-local blur disentangling network, which can effectively extract global and local blur features via two branches. A phased training scheme is designed to disentangle the global and local blur features, that is the branches are trained with task-specific datasets, respectively. A branch attention mechanism is introduced to dynamically fuse global and local features. Complex blurry images are used to train the attention module and the reconstruction module. The visualized feature maps of different branches indicated that our dual-branch network can decouple the global and local blur features efficiently. Experimental results show that the proposed dual-branch blur disentangling network can improve both the subjective and objective deblurring effects for real captured images.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5839
Author(s):  
Denghua Fan ◽  
Liejun Wang ◽  
Shuli Cheng ◽  
Yongming Li

As a sub-direction of image retrieval, person re-identification (Re-ID) is usually used to solve the security problem of cross camera tracking and monitoring. A growing number of shopping centers have recently attempted to apply Re-ID technology. One of the development trends of related algorithms is using an attention mechanism to capture global and local features. We notice that these algorithms have apparent limitations. They only focus on the most salient features without considering certain detailed features. People’s clothes, bags and even shoes are of great help to distinguish pedestrians. We notice that global features usually cover these important local features. Therefore, we propose a dual branch network based on a multi-scale attention mechanism. This network can capture apparent global features and inconspicuous local features of pedestrian images. Specifically, we design a dual branch attention network (DBA-Net) for better performance. These two branches can optimize the extracted features of different depths at the same time. We also design an effective block (called channel, position and spatial-wise attention (CPSA)), which can capture key fine-grained information, such as bags and shoes. Furthermore, based on ID loss, we use complementary triplet loss and adaptive weighted rank list loss (WRLL) on each branch during the training process. DBA-Net can not only learn semantic context information of the channel, position, and spatial dimensions but can integrate detailed semantic information by learning the dependency relationships between features. Extensive experiments on three widely used open-source datasets proved that DBA-Net clearly yielded overall state-of-the-art performance. Particularly on the CUHK03 dataset, the mean average precision (mAP) of DBA-Net achieved 83.2%.


2019 ◽  
Vol 10 (1) ◽  
pp. 101 ◽  
Author(s):  
Yadong Yang ◽  
Chengji Xu ◽  
Feng Dong ◽  
Xiaofeng Wang

Computer vision systems are insensitive to the scale of objects in natural scenes, so it is important to study the multi-scale representation of features. Res2Net implements hierarchical multi-scale convolution in residual blocks, but its random grouping method affects the robustness and intuitive interpretability of the network. We propose a new multi-scale convolution model based on multiple attention. It introduces the attention mechanism into the structure of a Res2-block to better guide feature expression. First, we adopt channel attention to score channels and sort them in descending order of the feature’s importance (Channels-Sort). The sorted residual blocks are grouped and intra-block hierarchically convolved to form a single attention and multi-scale block (AMS-block). Then, we implement channel attention on the residual small blocks to constitute a dual attention and multi-scale block (DAMS-block). Introducing spatial attention before sorting the channels to form multi-attention multi-scale blocks(MAMS-block). A MAMS-convolutional neural network (CNN) is a series of multiple MAMS-blocks. It enables significant information to be expressed at more levels, and can also be easily grafted into different convolutional structures. Limited by hardware conditions, we only prove the validity of the proposed ideas through convolutional networks of the same magnitude. The experimental results show that the convolution model with an attention mechanism and multi-scale features is superior in image classification.


Author(s):  
Bowen Xing ◽  
Lejian Liao ◽  
Dandan Song ◽  
Jingang Wang ◽  
Fuzheng Zhang ◽  
...  

Aspect-based sentiment analysis (ABSA) aims to predict fine-grained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTM-based models take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspect-related information may be already discarded and aspect-irrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AA-LSTM can dynamically produce aspect-aware contextual representations. We experiment with several representative LSTM-based models by replacing the classic LSTM cells with the AA-LSTM cells. Experimental results on SemEval-2014 Datasets demonstrate the effectiveness of AA-LSTM.


2021 ◽  
Vol 7 (4) ◽  
pp. 123
Author(s):  
Yingxue Sun ◽  
Junbo Gao

<p>In recent years, more and more people express their feelings through both images and texts, boosting the growth of multimodal data. Multimodal data contains richer semantics and is more conducive to judging the real emotions of people. To fully learn the features of every single modality and integrate modal information, this paper proposes a fine-grained multimodal sentiment analysis method FCLAG based on gating and attention mechanism. First, the method is carried out from the character level and the word level in the text aspect. CNN is used to extract more fine-grained emotional information from characters, and the attention mechanism is used to improve the expressiveness of the keywords. In terms of images, a gating mechanism is added to control the flow of image information between networks. The images and text vectors represent the original data collectively. Then the bidirectional LSTM is used to complete further learning, which enhances the information interaction capability between the modalities. Finally, put the multimodal feature expression into the classifier. This method is verified on a self-built image and text dataset. The experimental results show that compared with other sentiment classification models, this method has greater improvement in accuracy and F1 score and it can effectively improve the performance of multimodal sentiment analysis.</p>


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1838
Author(s):  
Chih-Wei Lin ◽  
Mengxiang Lin ◽  
Jinfu Liu

Classifying fine-grained categories (e.g., bird species, car, and aircraft types) is a crucial problem in image understanding and is difficult due to intra-class and inter-class variance. Most of the existing fine-grained approaches individually utilize various parts and local information of objects to improve the classification accuracy but neglect the mechanism of the feature fusion between the object (global) and object’s parts (local) to reinforce fine-grained features. In this paper, we present a novel framework, namely object–part registration–fusion Net (OR-Net), which considers the mechanism of registration and fusion between an object (global) and its parts’ (local) features for fine-grained classification. Our model learns the fine-grained features from the object of global and local regions and fuses these features with the registration mechanism to reinforce each region’s characteristics in the feature maps. Precisely, OR-Net consists of: (1) a multi-stream feature extraction net, which generates features with global and various local regions of objects; (2) a registration–fusion feature module calculates the dimension and location relationships between global (object) regions and local (parts) regions to generate the registration information and fuses the local features into the global features with registration information to generate the fine-grained feature. Experiments execute symmetric GPU devices with symmetric mini-batch to verify that OR-Net surpasses the state-of-the-art approaches on CUB-200-2011 (Birds), Stanford-Cars, and Stanford-Aircraft datasets.


2015 ◽  
Vol 23 (21) ◽  
pp. 27376 ◽  
Author(s):  
Mitradeep Sarkar ◽  
Jean-François Bryche ◽  
Julien Moreau ◽  
Mondher Besbes ◽  
Grégory Barbillon ◽  
...  

2021 ◽  
pp. 1-12
Author(s):  
Lv YE ◽  
Yue Yang ◽  
Jian-Xu Zeng

The existing recommender system provides personalized recommendation service for users in online shopping, entertainment, and other activities. In order to improve the probability of users accepting the system’s recommendation service, compared with the traditional recommender system, the interpretable recommender system will give the recommendation reasons and results at the same time. In this paper, an interpretable recommendation model based on XGBoost tree is proposed to obtain comprehensible and effective cross features from side information. The results are input into the embedded model based on attention mechanism to capture the invisible interaction among user IDs, item IDs and cross features. The captured interactions are used to predict the match score between the user and the recommended item. Cross-feature attention score is used to generate different recommendation reasons for different user-items.Experimental results show that the proposed algorithm can guarantee the quality of recommendation. The transparency and readability of the recommendation process has been improved by providing reference reasons. This method can help users better understand the recommendation behavior of the system and has certain enlightenment to help the recommender system become more personalized and intelligent.


Sign in / Sign up

Export Citation Format

Share Document