Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework | Multimedia Tools and Applications Skip to main content
Log in

Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fine-grained feature detection and recognition is an important but tough work due to the resolution and noisy representation. Synthesize images with a specified tiny feature is even more challenging. Existing image-to-image generation studies usually focus on improving image generation resolution and increasing the representation learning abilities under coarse features. However, generating images with fine-grained attributes under an image-to-image framework is still a tough work. In this paper, we propose an attention based pipeline generative adversarial network (Atten-Pip-GAN) to generate various facial images under multi-label fine-grained attributes with only a neutral facial image. First, we use a pipeline adversarial structure to generate images with multiple features step by step. Second, we use an independent image-to-image framework as a prepossessing method to detection the small fine-grained features and provide an attention map to improve the generation performance of delicate features. Third, we also propose an attention-based location loss to improve the generated performance on small fine-grained features. We apply this method to an open facial image database RaFD and demonstrate the efficiency of Atten-Pip-GAN on generating fine-grained attribute facial images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277

  2. Bau D, Zhu J-Y, Strobelt H, Zhou B, Tenenbaum JB, Freeman WT, Torralba A (2018) Gan dissection: visualizing and understanding generative adversarial networks, arXiv:1811.10597

  3. Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis, arXiv:1809.11096

  4. Calvo MG, Lundqvist D (2008) Facial expressions of emotion (kdef): identification under different display-duration conditions. Behavior Res Methods 40(1):109–115

    Article  Google Scholar 

  5. Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks, arXiv:1612.02136

  6. Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation, arXiv preprint

  7. Cong D, Zhou Q, Cheng J, Wu X, Zhang S, Ou W, Lu H (2019) Can: contextual aggregating network for semantic segmentation. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1892–1896, DOI https://doi.org/10.1109/ICASSP.2019.8683673

  8. Elfenbein HA, Ambady N (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bulletin 128(2):203

    Article  Google Scholar 

  9. Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446

  10. Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326

  11. Ge Z, Bewley A, McCool C, Corke P, Upcroft B, Sanderson C (2016) Fine-grained classification via mixture of deep convolutional neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–6

  12. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  13. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of wasserstein gans, arXiv:1704.00028

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, 30, Curran Associates, Inc., pp 6626–6637

  16. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  17. Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments, Tech. Rep. 07-49, University of Massachusetts, Amherst

  18. Huang H, Yu PS, Wang C (2018) An introduction to image synthesis with generative adversarial nets, arXiv:1803.04469

  19. Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV), pp 172–189

  20. Isola P, Zhu J-Y, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks, arXiv:1611.07004

  21. Itseez (2015) Open source computer vision library, https://github.com/itseez/opencv

  22. Kan M, Shan S, Chang H, Chen X (2014) Stacked progressive auto-encoders (spae) for face recognition across poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1883–1890

  23. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation, arXiv:1710.10196

  24. Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th international conference on machine learning. JMLR. org, vol 70, pp 1857–1865

  25. King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10 (Jul):1755–1758

    Google Scholar 

  26. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980

  27. Kingma DP, Welling M (2013) Auto-encoding variational bayes, arXiv:1312.6114

  28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  29. Langner O, Dotsch R, Bijlstra G, Wigboldus D, Hawk S, van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognition and Emotion 24(8):1377–1388

    Article  Google Scholar 

  30. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp 4681–4690

  31. Lee H-S, Kim D (2006) Generating frontal view face image for pose invariant face recognition. Pattern Recogn Lett 27(7):747–754

    Article  Google Scholar 

  32. Lee H-Y, Tseng H-Y, Huang J-B, Singh M, Yang M-H (2018) Diverse image-to-image translation via disentangled representations. In: Proceedings of the European conference on computer vision (ECCV), pp 35–51

  33. Lin T-Y, Maji S (2017) Improved bilinear pooling with cnns, arXiv:1707.06772

  34. Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457

  35. Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708

  36. Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine-grained recognition, arXiv:1603.06765

  37. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738

  38. Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375

    Article  Google Scholar 

  39. Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Int Things J 5(4):2315–2322

    Article  Google Scholar 

  40. Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Futur Gener Comput Syst 82:142–148

    Article  Google Scholar 

  41. Lu H, Wang D, Li Y, Li J, Li X, Kim H, Serikawa S, Humar I (2019) Conet: a cognitive ocean network, arXiv:1901.06253

  42. Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784

  43. Ou W, Luan X, Gou J, Zhou Q, Xiao W, Xiong X, Zeng W (2018) Robust discriminative nonnegative dictionary learning for occluded face recognition. Pattern Recogn Lett 107:41–49. video Surveillance-oriented Biometrics. https://doi.org/10.1016/j.patrec.2017.07.006. http://www.sciencedirect.com/science/article/pii/S0167865517302386

    Article  Google Scholar 

  44. Ou W, Xuan R, Gou J, Zhou Q, Cao Y (2019) Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity. Multimed Tools Appl. https://doi.org/10.1007/s11042-019-7343-8

  45. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813

  46. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

  47. Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electrical Eng 40(1):41–50

    Article  Google Scholar 

  48. Sicre R, Jurie F (2015) Discriminative part model for visual recognition. Comput Vis Image Underst 141:28–37

    Article  Google Scholar 

  49. Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807

  50. Wang Z, Wang X, Wang G (2018) Learning fine-grained features via a cnn tree for large-scale classification. Neurocomputing 275:1231–1240

    Article  Google Scholar 

  51. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850

  52. Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324

  53. Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3973–3981

  54. Yeh RA, Chen C, Lim TY, Schwing AG, Hasegawa-Johnson M, Do MN (2017) Semantic image inpainting with deep generative models. In: CVPR, pp 5485–5493

  55. Yu C, Zhao X, Zheng Q, Zhang P, You X (2018) Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 574–589

  56. Yu S, Wu Y, Li W, Song Z, Zeng W (2017) A model for fine-grained vehicle classification based on deep learning. Neurocomputing 257:97–103

    Article  Google Scholar 

  57. Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks, arXiv:1805.08318

  58. Zhang H, Sindagi V, Patel VM (2017) Image de-raining using a conditional generative adversarial network, arXiv:1701.05957

  59. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) Stackgan++: realistic image synthesis with stacked generative adversarial networks, arXiv:1710.10916

  60. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915

  61. Zheng Z, Yu Z, Zheng H, Wang C, Wang N (2017) Pipeline generative adversarial networks for facial images generation with multiple attributes, arXiv:1711.10742

  62. Zhou Q, Yang W, Gao G, Ou W, Lu H, Chen J, Latecki LJ (2019) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22(2):555–570. https://doi.org/10.1007/s11280-018-0556-3

    Article  Google Scholar 

  63. Zhu J-Y, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: Advances in neural information processing systems, pp 465–476

  64. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, arXiv:1703.10593

Download references

Acknowledgments

This work was funded by the National Natural Science Foundation of China under Grant Number 61701463, the Natural Science Foundation of Shandong Province of China under Grant Number ZR2017BF011, the Fundamental Research Funds for the Central Universities under Grant Numbers 201822014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhibin Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Zheng, Z., Wang, C. et al. Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework. Multimed Tools Appl 79, 14981–15000 (2020). https://doi.org/10.1007/s11042-019-08346-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08346-x

Keywords

Navigation