Various computational models of visual attention rely on the extraction of salient points or proto-objects, i.e., discrete units of attention, computed from bottom-up image features. In recent years, different solutions integrating top-down mechanisms were implemented, as research has shown that although eye movements initially are solely influenced by bottom-up information, after some time goal driven (high-level) processes dominate the guidance of visual attention towards regions of interest (Hwang, Higgins & Pomplun, 2009). However, even these improved modeling approaches are unlikely to generalize to a broader range of application contexts, because basic principles of visual attention, such as cognitive control, learning and expertise, have thus far not sufficiently been taken into account (Tatler, Hayhoe, Land & Ballard, 2011). In some recent work, the authors showed the functional role and representational nature of long-term memory structures for human perceptual skills and motor control. Based on these findings, the chapter extends a widely applied saliency-based model of visual attention (Walther & Koch, 2006) in two ways: first, it computes the saliency map using the cognitive visual attention approach (CVA) that shows a correspondence between regions of high saliency values and regions of visual interest indicated by participants’ eye movements (Oyekoya & Stentiford, 2004). Second, it adds an expertise-based component (Schack, 2012) to represent the influence of the quality of mental representation structures in long-term memory (LTM) and the roles of learning on the visual perception of objects, events, and motor actions.