Abstract
This work focuses on synthesizing human poses from human-level text descriptions. We propose a model that is based on a conditional generative adversarial network. It is designed to generate 2D human poses conditioned on human-written text descriptions. The model is trained and evaluated using the COCO dataset, which consists of images capturing complex everyday scenes with various human poses. We show through qualitative and quantitative results that the model is capable of synthesizing plausible poses matching the given text, indicating that it is possible to generate poses that are consistent with the given semantic features, especially for actions with distinctive poses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning (2017)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 23–25 (2017)
Borji, A.: Pros and cons of gan evaluation measures. Computer Vision and Image Understanding 179 (2019)
Chu, C., Zhmoginov, A., Sandler, M.: Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950 (2017)
Dai, B., Fidler, S., Urtasun, R., Lin, D.: Towards diverse and natural image descriptions via a conditional gan. In: IEEE International Conference on Computer Vision (2017)
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 501–505 (2014)
Grabner, H., Gall, J., Van Gool, L.: What makes a chair a chair? In: IEEE Conference on Computer Vision and Pattern Recognition (2011)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Advances in Neural Information Processing Systems (2017)
Gupta, A., Satkin, S., Efros, A.A., Hebert, M.: From 3d scene geometry to human workspace. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)
Hinton, G.E.: Deep belief networks. Scholarpedia 4(5) (2009)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: IEEE International Conference on Computer Vision (2017)
Li, B., Qi, X., Lukasiewicz, T., Torr, P.: Controllable text-to-image generation. In: Advances in Neural Information Processing Systems (2019)
Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., Gao, J.: Object-driven text-to-image synthesis via adversarial training. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Li, X., Liu, S., Kim, K., Wang, X., Yang, M.H., Kautz, J.: Putting humans in a scene: learning affordance in 3d indoor environments. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Li, Y., et al.: Storygan: a sequential conditional gan for story visualization. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision (2014)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 432 (2015)
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems (2017)
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: International Conference on Language Resources and Evaluation (2018)
Petzka, H., Fischer, A., Lukovnicov, D.: On the regularization of wasserstein gans. In: International Conference on Learning Representations (2018)
Qiao, T., Zhang, J., Xu, D., Tao, D.: Learn, imagine and create: Text-to-image generation from prior knowledge. Advances in Neural Information Processing Systems 32 (2019)
Qiao, T., Zhang, J., Xu, D., Tao, D.: Mirrorgan: learning text-to-image generation by redescription. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning (2016)
Reed, S.E., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. Advances in Neural Information Processing Systems29 (2016)
Salakhutdinov, R.: Learning deep generative models. Ann. Rev. Stat. Appl. 2 (2015)
Tan, H., Liu, X., Li, X., Zhang, Y., Yin, B.: Semantics-enhanced adversarial nets for text-to-image synthesis. In: IEEE International Conference on Computer Vision (2019)
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems (2014)
Villani, C.: Optimal transport: old and new. Springer, Cham (2008)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.: Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE International Conference on Computer Vision (2017)
Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3d people in scenes without people. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Zhou, X., Huang, S., Li, B., Li, Y., Li, J., Zhang, Z.: Text guided person image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Zhu, S., Urtasun, R., Fidler, S., Lin, D., Change Loy, C.: Be your own prada: fashion synthesis with structural coherence. In: IEEE International Conference on Computer Vision (2017)
Acknowledgement
The work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) GA 1927/5-1 and the ERC Starting Grant ARCA (677650).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Briq, R., Tanke, J., Gall, J. (2021). Adversarial Synthesis of Human Pose from Text. In: Akata, Z., Geiger, A., Sattler, T. (eds) Pattern Recognition. DAGM GCPR 2020. Lecture Notes in Computer Science(), vol 12544. Springer, Cham. https://doi.org/10.1007/978-3-030-71278-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-71278-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71277-8
Online ISBN: 978-3-030-71278-5
eBook Packages: Computer ScienceComputer Science (R0)