Adrià Recasens*, Aditya Khosla*, Carl Vondrick, Antonio Torralba
Massachusetts Institute of Technology

Humans have the remarkable ability to follow the gaze of other people to identify what they are looking at. Following eye gaze, or gaze-following, is an important ability that allows us to understand what other people are thinking, the actions they are performing, and even predict what they might do next. Despite the importance of this topic, this problem has only been studied in limited scenarios within the computer vision community.

 

In this paper, we propose a deep neural network-based approach for gaze-following and a new benchmark dataset, , for thorough evaluation. Given an image and the location of a head, our approach follows the gaze of the person and identifies the object being looked at. Our deep network is able to discover how to extract head pose and gaze orientation, and to select objects in the scene that are in the predicted line of sight and likely to be looked at (such as televisions, balls and food). The quantitative evaluation shows that our approach produces reliable results, even when viewing only the back of the head. While our method outperforms several baseline approaches, we are still far from reaching human performance on this task. Overall, we believe that gaze-following is a challenging and important problem that deserves more attention from the community.
Download our paper

Please cite the following paper if you use this service:

A. Recasens*, A. Khosla*, C. Vondrick and A. Torralba
Advances in Neural Information Processing Systems (NIPS), 2015
(* - indicates equal contribution)

GazeFollow API

Usage: http://gazefollow.csail.mit.edu/cgi-bin/image.py?url=IMG_URL

Example: http://gazefollow.csail.mit.edu/cgi-bin/image.py?url=http://gazefollow.csail.mit.edu/imgs/1.jpg

Notice: Please do not overload our server by querying repeatedly in a short period of time. This is a free service for academic research and education purposes only. It has no guarantee of any kind. For any questions or comments regarding this API, please contact Adrià Recasens or Aditya Khosla.


Acknowledgements

We thank Andrew Owens for helpful discussions. Funding for this research was partially supported by the Obra Social “la Caixa” Fellowship for Post-Graduate Studies to Adrià Recasens and a Google PhD Fellowship to Carl Vondrick.