scholarly journals Bottom-up visual attention model for still image: a preliminary study

Author(s):  
Adhi Prahara ◽  
Murinto Murinto ◽  
Dewi Pramudi Ismi

The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.

Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5178
Author(s):  
Sangbong Yoo ◽  
Seongmin Jeong ◽  
Seokyeon Kim ◽  
Yun Jang

Gaze movement and visual stimuli have been utilized to analyze human visual attention intuitively. Gaze behavior studies mainly show statistical analyses of eye movements and human visual attention. During these analyses, eye movement data and the saliency map are presented to the analysts as separate views or merged views. However, the analysts become frustrated when they need to memorize all of the separate views or when the eye movements obscure the saliency map in the merged views. Therefore, it is not easy to analyze how visual stimuli affect gaze movements since existing techniques focus excessively on the eye movement data. In this paper, we propose a novel visualization technique for analyzing gaze behavior using saliency features as visual clues to express the visual attention of an observer. The visual clues that represent visual attention are analyzed to reveal which saliency features are prominent for the visual stimulus analysis. We visualize the gaze data with the saliency features to interpret the visual attention. We analyze the gaze behavior with the proposed visualization to evaluate that our approach to embedding saliency features within the visualization supports us to understand the visual attention of an observer.


2021 ◽  
Author(s):  
◽  
Ibrahim Mohammad Hussain Rahman

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection.  Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet.  For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis.  The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency:  1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features.  2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects.  3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system.  4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map.  5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps.  The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>


Author(s):  
Kai Essig ◽  
Oleg Strogan ◽  
Helge Ritter ◽  
Thomas Schack

Various computational models of visual attention rely on the extraction of salient points or proto-objects, i.e., discrete units of attention, computed from bottom-up image features. In recent years, different solutions integrating top-down mechanisms were implemented, as research has shown that although eye movements initially are solely influenced by bottom-up information, after some time goal driven (high-level) processes dominate the guidance of visual attention towards regions of interest (Hwang, Higgins & Pomplun, 2009). However, even these improved modeling approaches are unlikely to generalize to a broader range of application contexts, because basic principles of visual attention, such as cognitive control, learning and expertise, have thus far not sufficiently been taken into account (Tatler, Hayhoe, Land & Ballard, 2011). In some recent work, the authors showed the functional role and representational nature of long-term memory structures for human perceptual skills and motor control. Based on these findings, the chapter extends a widely applied saliency-based model of visual attention (Walther & Koch, 2006) in two ways: first, it computes the saliency map using the cognitive visual attention approach (CVA) that shows a correspondence between regions of high saliency values and regions of visual interest indicated by participants’ eye movements (Oyekoya & Stentiford, 2004). Second, it adds an expertise-based component (Schack, 2012) to represent the influence of the quality of mental representation structures in long-term memory (LTM) and the roles of learning on the visual perception of objects, events, and motor actions.


2013 ◽  
Vol 347-350 ◽  
pp. 3764-3768 ◽  
Author(s):  
Zhuo Zhang ◽  
Xin Nan Fan ◽  
Xue Wu Zhang ◽  
Hai Yan Xu ◽  
Min Li

Inspired by the research of human visual system in neuroanatomy and psychology, the paper proposes a two-way collaborative visual attention model for target detection.In this new method , bottom-up attention information cooperates with top-down attention information to detect a target rapidly and accuractly. Firstly,the statistical prior knowledge of target and background is applied to optimize bottom-up attention information in different feature space and scale space.Secondly, after the SNR of salience difference between target and interference is computed ,the bottom-up gain factor is obtained.Thirdly, the gain factor is applied to adjust bottom up attention information extraction and then to maximize the salience contrast of target and background.Finally, target is detected by adjusted saliency.Experimental results shows that the proposed model in this paper can improve the real-time capability and reliability of target detection.


2012 ◽  
Vol 220-223 ◽  
pp. 1393-1397
Author(s):  
Li Bo Liu ◽  
Chun Jiang Zhao ◽  
Hua Rui Wu ◽  
Rong Hua Gao

Analyzing the crop growth status through leaf disease image is one of the hottest issues in agriculture and forestry fields currently. But the size of image gathered by digital camera is too large, the focus of this research is to zooming-out image at the condition of ensuring the main information which carried by the image to distort lower. Based on the further study of visual attention model proposed by Itti and Ma YF. This paper establishes visual attention and visual saliency map of rice blast and brown spot disease image, whose size is 4272*2878 pixels. Finally, determines the reduction scale of the corresponding effective target collection and provide a new way to reduce the plant leaf images.


2011 ◽  
Vol 225-226 ◽  
pp. 1016-1019
Author(s):  
Sheng He ◽  
Jun Wei Han ◽  
Ming Xu ◽  
Gong Cheng ◽  
Tian Yun Zhao ◽  
...  

Computer vision community has long attempted to automatically detect locations in the image that are able to capture attentions of users. In recent years, more and more researchers have proposed to address this problem from the perspective of simulating human visual attention mechanisms. In this paper, we study modeling visual attention in frequency domain. Our major contributions are twofold: 1. A new method called band-divided method (BDM) is developed to generate the saliency map by integrating the amplitude spectrum with the phase spectrum. 2. A quantitative measurement according to min-distance dissimilarity (MDD) is presented to evaluate the saliency map, which is more appropriate for non-binary ground-truth data. Experiments on benchmark dataset and comparisons with traditional approaches demonstrate the promise of the proposed work.


Sign in / Sign up

Export Citation Format

Share Document