Item – Theses Canada

OCLC number
1032929705
Link(s) to full text
LAC copy
LAC copy
Author
Mathe, Stefan.
Title
Actions in the Eye.
Degree
(Ph. D.)--University of Toronto, 2015.
Publisher
Toronto : University of Toronto, 2015.
Description
1 online resource
Notes
Includes bibliographical references.
Abstract
Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in `saccade and fixate' regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement three existing state-of-the art large scale static and dynamic computer vision annotated datasets (Hollywood-2, UCF Sports and Pascal VOC Actions) with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video and images, unique in terms of their (a) large scale and computer vision relevance (over 2 million fixations), (b) dynamic, video stimuli (Hollywood-2 and UCF Sports) and (c) task control, as well as free-viewing. Second, we perform quantitative analyses on the problems of inter-subject consistency and task influence on eye movements. To this end, we propose novel algorithms for the automatic discovery of areas of interest (AOI) and introduce several sequential consistency metrics. Our findings underline the stability of patterns of visual search among subjects in the experimental conditions we consider and show that task instructions can influence visual search patterns. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatiotemporal interest point image sampling strategies and human fixations, as well as their impact on visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results.
Other link(s)
tspace.library.utoronto.ca
hdl.handle.net
Subject
action recognition.
computer vision.
consistency analysis.
human eye movements.
machine learning.
saliency prediction.