Style transformed synthetic images for real world gaze estimation by using residual neural network with embedded personal identities | Applied Intelligence Skip to main content
Log in

Style transformed synthetic images for real world gaze estimation by using residual neural network with embedded personal identities

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Gaze interaction is essential for social communication in many scenarios; therefore, interpreting people’s gaze direction is helpful for natural human-robot interactions and human-virtual characters. In this study, we first adopt a residual neural network (ResNet) structure with an embedding layer of personal identity (ID-ResNet) that outperformed the current best result of 2.51 with MPIIGaze data, a benchmark dataset for gaze estimation. To avoid using manually labelled data, we used UnityEye synthetic images with and without style transformation as the training data. We exceeded the previously reported best result with MPIIGaze data (from 2.76 to 2.55) and UT-Multiview data (from 4.01 to 3.40). In addition, it only needs to fine-tune with a few ”calibration” examples for a new person to yield significant performance gains. In addition, we presented the KLBS-eye dataset that contains 15,350 images collected from 12 participants while looking in nine known directions and received the state-of-the-art result of (0.59 ± 1.69).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Majaranta P, Bulling A (2014) Eye tracking and eye-based human–computer interaction. Springer, Berlin, pp 39–65

    Google Scholar 

  2. Sugano Y, Zhang X, Bulling A (2016) Aggregaze: Collective estimation of audience attention on public displays. In: Symposium on user interface software & technology

  3. Ali A, Kim Y-G (2020) Deep fusion for 3d gaze estimation from natural face images using multi-stream cnns. IEEE Access 8:69212–69221. https://doi.org/10.1109/ACCESS.2020.2986815

    Article  Google Scholar 

  4. Peréz A, Córdoba ML, Garcia A, Méndez R, Munoz M, Pedraza JL, Sanchez F (2003) A precise eye-gaze detection and tracking system

  5. Young D, Tunley H, Samuels R (1995) Specialised hough transform and active contour methods for real-time eye tracking. University of Sussex, Cognitive and Computing Science, Technical Report 386

  6. Guestrin ED, Eizenman M (2006) General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering 53(6):1124–1133. https://doi.org/10.1109/TBME.2005.863952

    Article  Google Scholar 

  7. Tabernero J, Benito A, Alcón E, Artal P (2007) Mechanism of compensation of aberrations in the human eye. JOSA A 24(10):3274–3283. https://doi.org/10.1364/josaa.24.003274

    Article  Google Scholar 

  8. Sandborn WJ, Loftus EV Jr, Colombel JF, Fleming KA, Seibold F, Homburger HA, Sendid B, Chapman RW, Tremaine WJ, Kaul DK et al (2001) Evaluation of serologic disease markers in a population-based cohort of patients with ulcerative colitis and crohn’s disease. Inflammatory Bowel Diseases 7(3):192–201. https://doi.org/10.1097/00054725-200108000-00003

    Article  Google Scholar 

  9. Sirohey S, Rosenfeld A, Duric Z (2002) A method of detecting and tracking irises and eyelids in video. Pattern Recogn 35(6):1389–1401. https://doi.org/10.1016/S0031-3203(01)00116-9

    Article  MATH  Google Scholar 

  10. Zhang X, Sugano Y, Fritz M, Bulling A (2017) Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Trans Pattern Anal Mach Intell PP(99):1–1. https://doi.org/10.1109/TPAMI.2017.2778103

    Article  Google Scholar 

  11. Fischer T, Jin Chang H, Demiris Y (2018) Rt-gene: Real-time eye gaze estimation in natural environments. In: Proceedings of the European conference on computer vision (ECCV), pp 334–352

  12. Lu F, Okabe T, Sugano Y, Sato Y (2011) A head pose-free approach for appearance-based gaze estimation. In: BMVC, pp 1–11, DOI https://doi.org/10.5244/C.25.126, (to appear in print)

  13. Funes Mora KA, Odobez J-M (2012) Gaze estimation from multimodal kinect data. In: IEEE Conference in computer vision and pattern recognition, workshop on gesture recognition

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision, pp 1501–1510

  16. Sugano Y, Matsushita Y, Sato Y (2014) Learning-by-synthesis for appearance-based 3d gaze estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1821–1828

  17. Mora KAF, Monay F, Odobez JM (2014) Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras

  18. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  19. Wood E, Baltrusaitis T, Zhang X, Sugano Y, Robinson P, Bulling A (2015) Rendering of eyes for eye-shape registration and gaze estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3756–3764

  20. Zhang X, Sugano Y, Fritz M, Bulling A (2015) Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4511–4520

  21. Zhang X, Park S, Beeler T, Bradley D, Tang S, Hilliges O (2020) Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: European conference on computer vision. Springer, pp 365–381

  22. Lemley J, Kar A, Drimbarean A, Corcoran P (2019) Convolutional neural network implementation for eye-gaze estimation on low-quality consumer imaging systems. IEEE Trans Consum Electron 65 (2):179–187. https://doi.org/10.1109/TCE.2019.2899869

    Article  Google Scholar 

  23. Peng X, Sun B, Ali K, Saenko K (2014) Exploring invariances in deep convolutional neural networks using synthetic images, 2(4)

  24. Park S, Mello SD, Molchanov P, Iqbal U, Hilliges O, Kautz J (2019) Few-shot adaptive gaze estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9368–9377

  25. Krafka K, Khosla A, Kellnhofer P, Kannan H, Bhandarkar S, Matusik W, Torralba A (2016) Eye tracking for everyone. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2176–2184

  26. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680, DOI https://doi.org/10.1364/josaa.24.003274, (to appear in print)

  27. Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2107–2116

  28. Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: International conference on image and graphics. Springer, pp 97–108

  29. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  30. Wood E, Baltrušaitis T, Morency LP, Robinson P, Bulling A (2016) Learning an appearance-based gaze estimator from one million synthesised images

  31. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science

  32. Yang T-Y, Huang Y-H, Lin Y-Y, Hsiu P-C, Chuang Y-Y (2018) Ssr-net: a compact soft stagewise regression network for age estimation. In: IJCAI, vol 5, p 7

  33. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25:1097–1105

    Google Scholar 

  34. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  35. Lindén E., Sjostrand J, Proutiere A (2019) Learning to personalize in appearance-based gaze tracking. In: Proceedings of the IEEE/CVF international conference on computer vision workshops

  36. Xiong Y, Kim HJ, Singh V (2019) Mixed effects neural networks (menets) with applications to gaze estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7743–7752

  37. Liu G, Yu Y, Mora KAF, Odobez J-M (2018) A differential approach for gaze estimation with calibration. In: BMVC, vol 2, p 6

Download references

Acknowledgements

The research was supported by the Key Laboratory of Spectral Imaging Technology, Xi’an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, the Key laboratory of Biomedical Spectroscopy of Xi’an, the Outstanding Award for Talent Project of the Chinese Academy of Sciences, ”From 0 to 1” Original Innovation Project of the Basic Frontier Scientific Research Program of the Chinese Academy of Sciences, and Institute Supported Project of Xi’an Institute of Optics and Precision Mechanics of Chinese Academy of Sciences under grant number Y855W31213, Y955061213, Dongguan Dongquan Intelligent Technology Co., Ltd. and Dongguan Entrepreneur leadership 2018. We thank Li-Yao Song, Chi Gao, Xin-Ming Zhang, Shao-Kang Yin, and Chao Li for great discussion and editing the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Wang.

Ethics declarations

The authors claim no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Quan Wang and Hui Wang These authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Q., Wang, H., Dang, RC. et al. Style transformed synthetic images for real world gaze estimation by using residual neural network with embedded personal identities. Appl Intell 53, 2026–2041 (2023). https://doi.org/10.1007/s10489-022-03481-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03481-9

Keywords

Navigation