Abstract
Gaze interaction is essential for social communication in many scenarios; therefore, interpreting people’s gaze direction is helpful for natural human-robot interactions and human-virtual characters. In this study, we first adopt a residual neural network (ResNet) structure with an embedding layer of personal identity (ID-ResNet) that outperformed the current best result of 2.51∘ with MPIIGaze data, a benchmark dataset for gaze estimation. To avoid using manually labelled data, we used UnityEye synthetic images with and without style transformation as the training data. We exceeded the previously reported best result with MPIIGaze data (from 2.76∘ to 2.55∘) and UT-Multiview data (from 4.01∘ to 3.40∘). In addition, it only needs to fine-tune with a few ”calibration” examples for a new person to yield significant performance gains. In addition, we presented the KLBS-eye dataset that contains 15,350 images collected from 12 participants while looking in nine known directions and received the state-of-the-art result of (0.59 ± 1.69∘).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Majaranta P, Bulling A (2014) Eye tracking and eye-based human–computer interaction. Springer, Berlin, pp 39–65
Sugano Y, Zhang X, Bulling A (2016) Aggregaze: Collective estimation of audience attention on public displays. In: Symposium on user interface software & technology
Ali A, Kim Y-G (2020) Deep fusion for 3d gaze estimation from natural face images using multi-stream cnns. IEEE Access 8:69212–69221. https://doi.org/10.1109/ACCESS.2020.2986815
Peréz A, Córdoba ML, Garcia A, Méndez R, Munoz M, Pedraza JL, Sanchez F (2003) A precise eye-gaze detection and tracking system
Young D, Tunley H, Samuels R (1995) Specialised hough transform and active contour methods for real-time eye tracking. University of Sussex, Cognitive and Computing Science, Technical Report 386
Guestrin ED, Eizenman M (2006) General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering 53(6):1124–1133. https://doi.org/10.1109/TBME.2005.863952
Tabernero J, Benito A, Alcón E, Artal P (2007) Mechanism of compensation of aberrations in the human eye. JOSA A 24(10):3274–3283. https://doi.org/10.1364/josaa.24.003274
Sandborn WJ, Loftus EV Jr, Colombel JF, Fleming KA, Seibold F, Homburger HA, Sendid B, Chapman RW, Tremaine WJ, Kaul DK et al (2001) Evaluation of serologic disease markers in a population-based cohort of patients with ulcerative colitis and crohn’s disease. Inflammatory Bowel Diseases 7(3):192–201. https://doi.org/10.1097/00054725-200108000-00003
Sirohey S, Rosenfeld A, Duric Z (2002) A method of detecting and tracking irises and eyelids in video. Pattern Recogn 35(6):1389–1401. https://doi.org/10.1016/S0031-3203(01)00116-9
Zhang X, Sugano Y, Fritz M, Bulling A (2017) Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Trans Pattern Anal Mach Intell PP(99):1–1. https://doi.org/10.1109/TPAMI.2017.2778103
Fischer T, Jin Chang H, Demiris Y (2018) Rt-gene: Real-time eye gaze estimation in natural environments. In: Proceedings of the European conference on computer vision (ECCV), pp 334–352
Lu F, Okabe T, Sugano Y, Sato Y (2011) A head pose-free approach for appearance-based gaze estimation. In: BMVC, pp 1–11, DOI https://doi.org/10.5244/C.25.126, (to appear in print)
Funes Mora KA, Odobez J-M (2012) Gaze estimation from multimodal kinect data. In: IEEE Conference in computer vision and pattern recognition, workshop on gesture recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision, pp 1501–1510
Sugano Y, Matsushita Y, Sato Y (2014) Learning-by-synthesis for appearance-based 3d gaze estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1821–1828
Mora KAF, Monay F, Odobez JM (2014) Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Wood E, Baltrusaitis T, Zhang X, Sugano Y, Robinson P, Bulling A (2015) Rendering of eyes for eye-shape registration and gaze estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3756–3764
Zhang X, Sugano Y, Fritz M, Bulling A (2015) Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4511–4520
Zhang X, Park S, Beeler T, Bradley D, Tang S, Hilliges O (2020) Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: European conference on computer vision. Springer, pp 365–381
Lemley J, Kar A, Drimbarean A, Corcoran P (2019) Convolutional neural network implementation for eye-gaze estimation on low-quality consumer imaging systems. IEEE Trans Consum Electron 65 (2):179–187. https://doi.org/10.1109/TCE.2019.2899869
Peng X, Sun B, Ali K, Saenko K (2014) Exploring invariances in deep convolutional neural networks using synthetic images, 2(4)
Park S, Mello SD, Molchanov P, Iqbal U, Hilliges O, Kautz J (2019) Few-shot adaptive gaze estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9368–9377
Krafka K, Khosla A, Kellnhofer P, Kannan H, Bhandarkar S, Matusik W, Torralba A (2016) Eye tracking for everyone. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2176–2184
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680, DOI https://doi.org/10.1364/josaa.24.003274, (to appear in print)
Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2107–2116
Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: International conference on image and graphics. Springer, pp 97–108
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Wood E, Baltrušaitis T, Morency LP, Robinson P, Bulling A (2016) Learning an appearance-based gaze estimator from one million synthesised images
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science
Yang T-Y, Huang Y-H, Lin Y-Y, Hsiu P-C, Chuang Y-Y (2018) Ssr-net: a compact soft stagewise regression network for age estimation. In: IJCAI, vol 5, p 7
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25:1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Lindén E., Sjostrand J, Proutiere A (2019) Learning to personalize in appearance-based gaze tracking. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Xiong Y, Kim HJ, Singh V (2019) Mixed effects neural networks (menets) with applications to gaze estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7743–7752
Liu G, Yu Y, Mora KAF, Odobez J-M (2018) A differential approach for gaze estimation with calibration. In: BMVC, vol 2, p 6
Acknowledgements
The research was supported by the Key Laboratory of Spectral Imaging Technology, Xi’an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, the Key laboratory of Biomedical Spectroscopy of Xi’an, the Outstanding Award for Talent Project of the Chinese Academy of Sciences, ”From 0 to 1” Original Innovation Project of the Basic Frontier Scientific Research Program of the Chinese Academy of Sciences, and Institute Supported Project of Xi’an Institute of Optics and Precision Mechanics of Chinese Academy of Sciences under grant number Y855W31213, Y955061213, Dongguan Dongquan Intelligent Technology Co., Ltd. and Dongguan Entrepreneur leadership 2018. We thank Li-Yao Song, Chi Gao, Xin-Ming Zhang, Shao-Kang Yin, and Chao Li for great discussion and editing the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors claim no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Quan Wang and Hui Wang These authors contributed equally to this work.
Rights and permissions
About this article
Cite this article
Wang, Q., Wang, H., Dang, RC. et al. Style transformed synthetic images for real world gaze estimation by using residual neural network with embedded personal identities. Appl Intell 53, 2026–2041 (2023). https://doi.org/10.1007/s10489-022-03481-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03481-9