Translation of Real-World Photographs into Artistic Images via Conditional CycleGAN and StarGAN

Komatsu, Rina; Gonsalves, Tad

doi:10.1007/s42979-021-00884-2

Translation of Real-World Photographs into Artistic Images via Conditional CycleGAN and StarGAN

Original Research
Published: 20 October 2021

Volume 2, article number 489, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

699 Accesses
Explore all metrics

Abstract

To translate a real-world photograph into an artistic image in the style of a famous artist, the selection of colors and brushstrokes should reflect those of the artist. A one-to-one domain translation architecture, CycleGAN, trained with an unpaired dataset can be used to translate a real-world photograph into an artistic image. However, to translate images in N number of multi-artistic styles, the disadvantage is that more than one CycleGAN must be trained corresponding to each style. Here, we develop a single deep learning architecture that can be controlled to yield multiple artistic styles by adding a conditional vector. The overall architecture includes a one-to-N domain translation architecture, namely, a conditional CycleGAN, and an N-to-N domain translation architecture, namely, StarGAN, for translating into five different artistic styles. An evaluation of the trained models reveal that multiple artistic styles can be produced from a single real-world photograph only by adjusting the conditional input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Unsupervised Structure-Consistent Image-to-Image Translation

References

Wu X, Sahoo D, Hoi SC. Recent advances in deep learning for object detection. Neurocomputing. 2020;396:39–64.
Article Google Scholar
Druzhkov PN, Kustikova VD. A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal. 2016;26(1):9–15.
Article Google Scholar
Parkhi OM, Vedaldi A, Zisserman A. Deep face recognition (2015).
Hou X, Gong Y, Liu B, Sun K, Liu J, Xu B, Qiu G. Learning based image transformation using convolutional neural networks. IEEE Access. 2018;6:49779–92.
Article Google Scholar
Komatsu R, Tad G. Comparing u-net based models for denoising color images. AI, MDPI. 2020;1(4):465–86.
Google Scholar
Deshpande A, Rock J, Forsyth D. Learning large-scale automatic image colorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 567–575 (2015).
Monkumar A, Sannathamby L, Goyal SB. Unified framework of dense convolution neural network for image super resolution. Mater Today Proc. 2021. https://doi.org/10.1016/j.matpr.2021.06.109 (ISSN 2214–7853).
Article Google Scholar
Cao S, An G, Zheng Z, Ruan Q. Interactions guided generative adversarial network for unsupervised image captioning. Neurocomputing. 2020;417:419–31. https://doi.org/10.1016/j.neucom.2020.08.019 (ISSN 0925–2312).
Article Google Scholar
Sagar, Vishwakarma DK. A state-of-the-arts and prospective in neural style transfer. In: 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 244–247 (2019). https://doi.org/10.1109/SPIN.2019.8711612.
Liu Q, Zhang F, Lin M, Wang Y. Portrait style transfer with generative adversarial networks. In: Liu Q, Liu X, Li L, Zhou H, Zhao HH, editors. Proceedings of the 9th international conference on computer engineering and networks. Advances in intelligent systems and computing, vol. 1143. Singapore: Springer; 2021. https://doi.org/10.1007/978-981-15-3753-0_36.
Chapter Google Scholar
Li S, Songzhi S, Lin J, Cai G, Sun L. Deep 3D caricature face generation with identity and structure consistency. Neurocomputing. 2021;454:178–88. https://doi.org/10.1016/j.neucom.2021.05.014 (ISSN 0925–2312).
Article Google Scholar
Li B, Zhu Y, Wang Y, Lin CW, Ghanem B, Shen L. AniGAN: style-guided generative adversarial networks for unsupervised anime face generation (2021). arXiv preprint arXiv:2102.12593.
Souly N, Spampinato C, Shah M. Semi supervised semantic segmentation using generative adversarial network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5688–5696 (2017).
Isola P, Zhu J, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134 (2017).
Choi J, Kim T, Kim C. Self-ensembling with gan-based data augmentation for domain adaptation in semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 6830–6840 (2019).
Cheng Z, Yang Q, Sheng B. Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 415–423 (2015).
Iizuka S, Simo-Serra E, Ishikawa H. Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans Graph (ToG). 2016;35(4):1–11.
Article Google Scholar
Chen Y, Lai Y, Liu Y. CartoonGAN: generative adversarial networks for photo cartoonization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9465–9474 (2018).
Chen J, Liu G, Chen X. AnimeGAN: a novel lightweight GAN for photo animation, International Symposium on Intelligence Computation and Applications. Singapore: Springer; 2019. p. 242–56.
Google Scholar
Kim J, Kim M, Kang H, Lee K. U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation (2019). arXiv preprint arXiv:1907.10830.
Zhu J, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017).
Mizura M, Osindero S. Conditional generative adversarial nets (2014). arXiv preprint arXiv:1411.1784.
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 5907–5915 (2017).
Li X, Zhang Y, Zhang J, Chen Y, Li H, Marsic I, Burd RS. Region-based activity recognition using conditional GAN. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 1059–1067 (2017).
Nguyen V, Vicente TFY, Zhao M, Hoai M, Samaras D. Shadow detection with conditional generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4510–4518 (2017).
Choi Y, Choi M, Kim M, Ha J, Kim S, Choo J. StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8789–8797 (2018).
Simonyan K, Zisserman A. Very deep convolution networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556.
He K, Zhang Z, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016).
LeCun Y, Haffner P, Bottou L, Bengio Y. Object recognition with gradient-based learning, shape, contour and grouping in computer vision. Berlin Heidelberg: Springer; 1999. p. 319–45.
Book Google Scholar
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Adv Neural Inform Process Syst. 2012;25:1097–105.
Google Scholar
Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst. 2015;28:91–9.
Google Scholar
Ying L, Dinghua S, Fuping W, Pang LK, Kiang CT, Yi L. Learning wavelet coefficients for face super-resolution. Visual Computer. 2020;37:1–10.
Google Scholar
Zhang J, Wang C, Li C, Qin H. Example-based rapid generation of vegetation on terrain via CNN-based distribution learning. Vis Comput. 2019;35:1181–91.
Article Google Scholar
Komatsu R, Gonsalves T. Conditional DCGAN's challenge: generating handwritten character digit, alphabet and katakana. In: Proceedings of the Annual Conference of JSAI 33rd Annual Conference, pp. 3B3E204–3B3E204. The Japanese Society for Artificial Intelligence (2019).
Gatys LA, Ecker AS, Bethge M. A neural algorithm of artistic style (2015). arXiv preprint arXiv:1508.06576.
Gatys LA, Ecker AS, Bethge M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414–2423 (2016).
Wang L, Wang Z, Yang X, Hu SM, Zhang J. Photographic style transfer. Vis Comput. 2020;36:317–31.
Article Google Scholar
Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution, European conference on computer vision. Cham: Springer; 2016. p. 694–711.
Google Scholar
Yanai K. Unseen style transfer based on a conditional fast style transfer network. In: Workshop of International Conference on Learning Representations (2017).
Liu M, Breuel T, Kautz J. Unsupervised image-to-image translation networks. Adv Neural Inform Process Syst. 2017:700–708.
Chen R, Huang W, Huang B, Sun F, Fang B. Reusing discriminators for encoding: towards unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8168–8177 (2020).
Lu Y, Tai Y, Tang C. Attribute-guided face generation using conditional CycleGAN. In: Proceedings of the European conference on computer vision (ECCV), pp. 282–297 (2018).
Horita D, Tanno R, Shimoda W, Yanai K. Food category transfer with conditional CycleGAN and a large-scale food image dataset. In: Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary Management, pp. 67–70 (2018).
Nie W, Narodytska N, Patel AB. RelGAN: relational generative adversarial networks for text generation. In: International conference on learning representations (2018).
Taigman Y, Polyak A, Wolf L. Unsupervised cross-domain image generation (2016). arXiv preprint arXiv:1611.02200.
Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, vol 70, pp. 214–223 (2017).
Duck SK, Nichol K. Painter by Number. (2016) https://www.kaggle.com/c/painter-by-numbers. Accessed 28 Aug 2020.
Kingma DP, Ba JL. Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inform Process Syst. 2017;30:6626–37.
Google Scholar
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017).

Download references

Funding

This study has received no funding.

Author information

Authors and Affiliations

Department of Information and Communication Sciences, Sophia University, 7-1 Kioicho, Chiyoda-ku, Tokyo, 102-8554, Japan
Rina Komatsu & Tad Gonsalves

Authors

Rina Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Tad Gonsalves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tad Gonsalves.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Visualized Results with Testing Images

Visualized Results with Author’s Photos

Rights and permissions

Reprints and permissions

About this article

Cite this article

Komatsu, R., Gonsalves, T. Translation of Real-World Photographs into Artistic Images via Conditional CycleGAN and StarGAN. SN COMPUT. SCI. 2, 489 (2021). https://doi.org/10.1007/s42979-021-00884-2

Download citation

Received: 08 June 2021
Accepted: 17 September 2021
Published: 20 October 2021
DOI: https://doi.org/10.1007/s42979-021-00884-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Translation of Real-World Photographs into Artistic Images via Conditional CycleGAN and StarGAN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Unsupervised Structure-Consistent Image-to-Image Translation

References

Funding