Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures

Livet, Nicolas

doi:10.1007/978-3-030-33720-9_35

Nicolas Livet²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11844))

Included in the following conference series:

International Symposium on Visual Computing

2183 Accesses

Abstract

Based on an innovative, efficient recurrent deep learning architecture, we present a highly stable and robust technique to localize face features on still images, captured and live video sequences. This dense (Fully Convolutional) CNN architecture, referred as the Refined Dense Mobilenet (RDM), is composed of (1) a main encoder-decoder block which aims to approximate face feature locations and, (2) a sequence of refiners which aims to robustly converge at the vicinity of the features. On video sequences, architecture is adapted into a Recurrent RDM where a shape prior component is re-injected in the form of temporal heatmaps obtained at previous frame inference.

Accuracy and stability of RDM/R-RDM architectures are compared with state-of-the-art Random Forest and CNN based approaches. The idea of combining a holistic feature localizer – taking advantage of large receptive fields to minimize large error – and refiners – working at higher resolution to converge at feature vicinities – is proving high accuracy in localizing face features. We demonstrate RDM/R-RDM architectures improve localization scores on 300W and AFLW datasets. Moreover, by relying on modern, efficient convolutional blocks and based on our recurrent architecture, we deliver the first stable and accurate real-time implementation of face feature localization on low-end Mobile devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recurrent Convolutional Face Alignment

RED-Net: A Recurrent Encoder–Decoder Network for Video-Based Face Alignment

Article 23 May 2018

Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks

Notes

1.
Accompanying video available at https://vimeo.com/348063383.

References

Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources, March 2017
Google Scholar
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230, 000 3D facial landmarks). CoRR abs/1703.07332 (2017)
Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894, June 2012
Google Scholar
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models: their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)
Article Google Scholar
Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 278–291. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_21
Chapter Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, vol. 41, pp. 929–938, January 2006
Google Scholar
Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: CVPR (2010)
Google Scholar
Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235–2245. IEEE (2018)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016
Google Scholar
He, Z., Kan, M., Zhang, J., Chen, X., Shan, S.: A fully end-to-end cascaded CNN for facial landmark detection. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 200–207, May 2017
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)
Google Scholar
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_49
Chapter Google Scholar
Livet, N., Berkowski, G.: Shape and appearance based sequenced convnets to detect real-time face attributes on mobile devices. In: Perales, F.J., Kittler, J. (eds.) AMDO 2018. LNCS, vol. 10945, pp. 73–84. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94544-6_8
Chapter Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015
Google Scholar
Luo, C., Wang, Z., Wang, S., Zhang, J., Yu, J.: Locating facial landmarks using probabilistic random forest. IEEE Signal Process. Lett. 22(12), 2324–2328 (2015)
Article Google Scholar
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of the First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Google Scholar
Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: A recurrent encoder-decoder network for sequential face alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 38–56. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_3
Chapter Google Scholar
Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28. Available on arXiv:1505.04597 [cs.CV]
Chapter Google Scholar
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge. Image Vision Comput. 47(C), 3–18 (2016)
Article Google Scholar
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018)
Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483, June 2013
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. CoRR abs/1409.4842 (2014)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708, June 2014
Google Scholar
Trigeorgis, G., Snape, P., Nicolaou, M., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment, June 2016. https://doi.org/10.1109/CVPR.2016.453
Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vision 4(34–47), 4 (2001)
Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. CoRR abs/1602.00134 (2016)
Google Scholar
XZIMG: Magic face - face features tracker for augmented reality apps (2016). http://www.xzimg.com
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 386–391, December 2013
Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning, pp. 3409–3417, June 2016
Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886, June 2012
Google Scholar

Download references

Author information

Authors and Affiliations

XZIMG Ltd. Research Lab, Kowloon, Hong Kong
Nicolas Livet

Authors

Nicolas Livet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Livet .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
NASA Ames Research Center, Moffett Field, CA, USA
Richard Boyle
University of Nevada, Reno, NV, USA
Bahram Parvin
Desert Research Institute, Reno, NV, USA
Darko Koracin
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Daniela Ushizima
Latent AI, Palo Alto, CA, USA
Sek Chai
Texas A&M University, College Station, TX, USA
Shinjiro Sueda
Louisiana State University, Baton Rouge, LA, USA
Xin Lin
University of North Carolina at Charlotte, Charlotte, NC, USA
Aidong Lu
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Daniel Thalmann
Notre Dame University, Notre Dame, IN, USA
Chaoli Wang
Bosch Research North America, Palo Alto, CA, USA
Panpan Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Livet, N. (2019). Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-33720-9_35
Published: 21 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Recurrent Convolutional Face Alignment

RED-Net: A Recurrent Encoder–Decoder Network for Video-Based Face Alignment

Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Recurrent Convolutional Face Alignment

RED-Net: A Recurrent Encoder–Decoder Network for Video-Based Face Alignment

Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation