Abstract
Based on an innovative, efficient recurrent deep learning architecture, we present a highly stable and robust technique to localize face features on still images, captured and live video sequences. This dense (Fully Convolutional) CNN architecture, referred as the Refined Dense Mobilenet (RDM), is composed of (1) a main encoder-decoder block which aims to approximate face feature locations and, (2) a sequence of refiners which aims to robustly converge at the vicinity of the features. On video sequences, architecture is adapted into a Recurrent RDM where a shape prior component is re-injected in the form of temporal heatmaps obtained at previous frame inference.
Accuracy and stability of RDM/R-RDM architectures are compared with state-of-the-art Random Forest and CNN based approaches. The idea of combining a holistic feature localizer – taking advantage of large receptive fields to minimize large error – and refiners – working at higher resolution to converge at feature vicinities – is proving high accuracy in localizing face features. We demonstrate RDM/R-RDM architectures improve localization scores on 300W and AFLW datasets. Moreover, by relying on modern, efficient convolutional blocks and based on our recurrent architecture, we deliver the first stable and accurate real-time implementation of face feature localization on low-end Mobile devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Accompanying video available at https://vimeo.com/348063383.
References
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources, March 2017
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230, 000 3D facial landmarks). CoRR abs/1703.07332 (2017)
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894, June 2012
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models: their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)
Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 278–291. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_21
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, vol. 41, pp. 929–938, January 2006
Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: CVPR (2010)
Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235–2245. IEEE (2018)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016
He, Z., Kan, M., Zhang, J., Chen, X., Shan, S.: A fully end-to-end cascaded CNN for facial landmark detection. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 200–207, May 2017
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_49
Livet, N., Berkowski, G.: Shape and appearance based sequenced convnets to detect real-time face attributes on mobile devices. In: Perales, F.J., Kittler, J. (eds.) AMDO 2018. LNCS, vol. 10945, pp. 73–84. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94544-6_8
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015
Luo, C., Wang, Z., Wang, S., Zhang, J., Yu, J.: Locating facial landmarks using probabilistic random forest. IEEE Signal Process. Lett. 22(12), 2324–2328 (2015)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of the First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: A recurrent encoder-decoder network for sequential face alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 38–56. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_3
Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28. Available on arXiv:1505.04597 [cs.CV]
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge. Image Vision Comput. 47(C), 3–18 (2016)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018)
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483, June 2013
Szegedy, C., et al.: Going deeper with convolutions. CoRR abs/1409.4842 (2014)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708, June 2014
Trigeorgis, G., Snape, P., Nicolaou, M., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment, June 2016. https://doi.org/10.1109/CVPR.2016.453
Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vision 4(34–47), 4 (2001)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. CoRR abs/1602.00134 (2016)
XZIMG: Magic face - face features tracker for augmented reality apps (2016). http://www.xzimg.com
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 386–391, December 2013
Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning, pp. 3409–3417, June 2016
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886, June 2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Livet, N. (2019). Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-33720-9_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)