Abstract
Fine-grained feature detection and recognition is an important but tough work due to the resolution and noisy representation. Synthesize images with a specified tiny feature is even more challenging. Existing image-to-image generation studies usually focus on improving image generation resolution and increasing the representation learning abilities under coarse features. However, generating images with fine-grained attributes under an image-to-image framework is still a tough work. In this paper, we propose an attention based pipeline generative adversarial network (Atten-Pip-GAN) to generate various facial images under multi-label fine-grained attributes with only a neutral facial image. First, we use a pipeline adversarial structure to generate images with multiple features step by step. Second, we use an independent image-to-image framework as a prepossessing method to detection the small fine-grained features and provide an attention map to improve the generation performance of delicate features. Third, we also propose an attention-based location loss to improve the generated performance on small fine-grained features. We apply this method to an open facial image database RaFD and demonstrate the efficiency of Atten-Pip-GAN on generating fine-grained attribute facial images.
Similar content being viewed by others
References
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277
Bau D, Zhu J-Y, Strobelt H, Zhou B, Tenenbaum JB, Freeman WT, Torralba A (2018) Gan dissection: visualizing and understanding generative adversarial networks, arXiv:1811.10597
Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis, arXiv:1809.11096
Calvo MG, Lundqvist D (2008) Facial expressions of emotion (kdef): identification under different display-duration conditions. Behavior Res Methods 40(1):109–115
Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks, arXiv:1612.02136
Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation, arXiv preprint
Cong D, Zhou Q, Cheng J, Wu X, Zhang S, Ou W, Lu H (2019) Can: contextual aggregating network for semantic segmentation. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1892–1896, DOI https://doi.org/10.1109/ICASSP.2019.8683673
Elfenbein HA, Ambady N (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bulletin 128(2):203
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326
Ge Z, Bewley A, McCool C, Corke P, Upcroft B, Sanderson C (2016) Fine-grained classification via mixture of deep convolutional neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–6
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of wasserstein gans, arXiv:1704.00028
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, 30, Curran Associates, Inc., pp 6626–6637
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments, Tech. Rep. 07-49, University of Massachusetts, Amherst
Huang H, Yu PS, Wang C (2018) An introduction to image synthesis with generative adversarial nets, arXiv:1803.04469
Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV), pp 172–189
Isola P, Zhu J-Y, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks, arXiv:1611.07004
Itseez (2015) Open source computer vision library, https://github.com/itseez/opencv
Kan M, Shan S, Chang H, Chen X (2014) Stacked progressive auto-encoders (spae) for face recognition across poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1883–1890
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation, arXiv:1710.10196
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th international conference on machine learning. JMLR. org, vol 70, pp 1857–1865
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10 (Jul):1755–1758
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980
Kingma DP, Welling M (2013) Auto-encoding variational bayes, arXiv:1312.6114
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Langner O, Dotsch R, Bijlstra G, Wigboldus D, Hawk S, van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognition and Emotion 24(8):1377–1388
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp 4681–4690
Lee H-S, Kim D (2006) Generating frontal view face image for pose invariant face recognition. Pattern Recogn Lett 27(7):747–754
Lee H-Y, Tseng H-Y, Huang J-B, Singh M, Yang M-H (2018) Diverse image-to-image translation via disentangled representations. In: Proceedings of the European conference on computer vision (ECCV), pp 35–51
Lin T-Y, Maji S (2017) Improved bilinear pooling with cnns, arXiv:1707.06772
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708
Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine-grained recognition, arXiv:1603.06765
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Int Things J 5(4):2315–2322
Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Futur Gener Comput Syst 82:142–148
Lu H, Wang D, Li Y, Li J, Li X, Kim H, Serikawa S, Humar I (2019) Conet: a cognitive ocean network, arXiv:1901.06253
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Ou W, Luan X, Gou J, Zhou Q, Xiao W, Xiong X, Zeng W (2018) Robust discriminative nonnegative dictionary learning for occluded face recognition. Pattern Recogn Lett 107:41–49. video Surveillance-oriented Biometrics. https://doi.org/10.1016/j.patrec.2017.07.006. http://www.sciencedirect.com/science/article/pii/S0167865517302386
Ou W, Xuan R, Gou J, Zhou Q, Cao Y (2019) Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity. Multimed Tools Appl. https://doi.org/10.1007/s11042-019-7343-8
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electrical Eng 40(1):41–50
Sicre R, Jurie F (2015) Discriminative part model for visual recognition. Comput Vis Image Underst 141:28–37
Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807
Wang Z, Wang X, Wang G (2018) Learning fine-grained features via a cnn tree for large-scale classification. Neurocomputing 275:1231–1240
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324
Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3973–3981
Yeh RA, Chen C, Lim TY, Schwing AG, Hasegawa-Johnson M, Do MN (2017) Semantic image inpainting with deep generative models. In: CVPR, pp 5485–5493
Yu C, Zhao X, Zheng Q, Zhang P, You X (2018) Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 574–589
Yu S, Wu Y, Li W, Song Z, Zeng W (2017) A model for fine-grained vehicle classification based on deep learning. Neurocomputing 257:97–103
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks, arXiv:1805.08318
Zhang H, Sindagi V, Patel VM (2017) Image de-raining using a conditional generative adversarial network, arXiv:1701.05957
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) Stackgan++: realistic image synthesis with stacked generative adversarial networks, arXiv:1710.10916
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
Zheng Z, Yu Z, Zheng H, Wang C, Wang N (2017) Pipeline generative adversarial networks for facial images generation with multiple attributes, arXiv:1711.10742
Zhou Q, Yang W, Gao G, Ou W, Lu H, Chen J, Latecki LJ (2019) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22(2):555–570. https://doi.org/10.1007/s11280-018-0556-3
Zhu J-Y, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: Advances in neural information processing systems, pp 465–476
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, arXiv:1703.10593
Acknowledgments
This work was funded by the National Natural Science Foundation of China under Grant Number 61701463, the Natural Science Foundation of Shandong Province of China under Grant Number ZR2017BF011, the Fundamental Research Funds for the Central Universities under Grant Numbers 201822014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, Y., Zheng, Z., Wang, C. et al. Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework. Multimed Tools Appl 79, 14981–15000 (2020). https://doi.org/10.1007/s11042-019-08346-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08346-x