In this paper, we propose a deep neural network-based approach for gaze-following and a new benchmark dataset, , for thorough evaluation. Given an image and the location of a head, our approach follows the gaze of the person and identifies the object being looked at. Our deep network is able to discover how to extract head pose and gaze orientation, and to select objects in the scene that are in the predicted line of sight and likely to be looked at (such as televisions, balls and food). The quantitative evaluation shows that our approach produces reliable results, even when viewing only the back of the head. While our method outperforms several baseline approaches, we are still far from reaching human performance on this task. Overall, we believe that gaze-following is a challenging and important problem that deserves more attention from the community.
Please cite the following paper if you use this service:
A. Recasens*, A. Khosla*, C. Vondrick and A. Torralba
Advances in Neural Information Processing Systems (NIPS), 2015
(* - indicates equal contribution)
Usage: http://gazefollow.csail.mit.edu/cgi-bin/image.py?url=IMG_URL
Example:
http://gazefollow.csail.mit.edu/cgi-bin/image.py?url=http://gazefollow.csail.mit.edu/imgs/1.jpg
Notice: Please do not overload our server by querying repeatedly in a short period of time. This is a free service for academic research and education purposes only. It has no guarantee of any kind. For any questions or comments regarding this API, please contact Adrià Recasens or Aditya Khosla.
We thank Andrew Owens for helpful discussions. Funding for this research was partially supported by the Obra Social “la Caixa” Fellowship for Post-Graduate Studies to Adrià Recasens and a Google PhD Fellowship to Carl Vondrick.