Distillation
Image super-resolution is one of the most appealing applications of image processing, capable of retrieving a high resolution image by fusing several registered low resolution images depicting an object of interest. However, employing super-resolution in video data is challenging: a video sequence generally contains a lot of scattered information regarding several objects of interest in cluttered scenes. Especially with hand-held cameras, the overall quality may be poor due to low resolution or unsteadiness. The objective of this chapter is to demonstrate why standard image super-resolution fails in video data, which are the problems that arise, and how we can overcome these problems. In our first contribution, we propose a novel Bayesian framework for super-resolution of persistent objects of interest in video sequences. We call this process Distillation. In the traditional formulation of the image super-resolution problem, the observed target is (1) always the same, (2) acquired using a camera making small movements, and (3) found in a number of low resolution images sufficient to recover high-frequency information. These assumptions are usually unsatisfied in real world video acquisitions and often beyond the control of the video operator. With Distillation, we aim to extend and to generalize the image super-resolution task, embedding it in a structured framework that accurately distills all the informative bits of an object of interest. In practice, the Distillation process: i) individuates, in a semi supervised way, a set of objects of interest, clustering the related video frames and registering them with respect to global rigid transformations; ii) for each one, produces a high resolution image, by weighting each pixel according to the information retrieved about the object of interest. As a second contribution, we extend the Distillation process to deal with objects of interest whose transformations in the appearance are not (only) rigid. Such process, built on top of the Distillation, is hierarchical, in the sense that a process of clustering is applied recursively, beginning with the analysis of whole frames, and selectively focusing on smaller sub-regions whose isolated motion can be reasonably assumed as rigid. The ultimate product of the overall process is a strip of images that describe at high resolution the dynamics of the video, switching between alternative local descriptions in response to visual changes. Our approach is first tested on synthetic data, obtaining encouraging comparative results with respect to known super-resolution techniques, and a good robustness against noise. Second, real data coming from different videos are considered, trying to solve the major details of the objects in motion.