{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,1]],"date-time":"2025-04-01T05:45:13Z","timestamp":1743486313579,"version":"3.37.3"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,10]]},"DOI":"10.1145\/3610548.3618249","type":"proceedings-article","created":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T17:28:40Z","timestamp":1702315720000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Face0: Instantaneously Conditioning a Text-to-Image Model on a Face"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4211-2866","authenticated-orcid":false,"given":"Dani","family":"Valevski","sequence":"first","affiliation":[{"name":"Google Research, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2875-3629","authenticated-orcid":false,"given":"Danny","family":"Lumen","sequence":"additional","affiliation":[{"name":"Google Research, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3960-6002","authenticated-orcid":false,"given":"Yossi","family":"Matias","sequence":"additional","affiliation":[{"name":"Google Research, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4080-4845","authenticated-orcid":false,"given":"Yaniv","family":"Leviathan","sequence":"additional","affiliation":[{"name":"Google Research, United States of America"}]}],"member":"320","published-online":{"date-parts":[[2023,12,11]]},"reference":[{"doi-asserted-by":"publisher","unstructured":"Rameen Abdal Peihao Zhu John Femiani Niloy\u00a0J. Mitra and Peter Wonka. 2021. CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions. https:\/\/doi.org\/10.48550\/ARXIV.2112.05219","key":"e_1_3_2_1_1_1","DOI":"10.48550\/ARXIV.2112.05219"},{"unstructured":"Abeba Birhane Vinay\u00a0Uday Prabhu and Emmanuel Kahembwe. 2021. Multimodal datasets: misogyny pornography and malignant stereotypes. arxiv:2110.01963\u00a0[cs.CY]","key":"e_1_3_2_1_2_1"},{"doi-asserted-by":"crossref","unstructured":"Qiong Cao Li Shen Weidi Xie Omkar\u00a0M. Parkhi and Andrew Zisserman. 2018. VGGFace2: A dataset for recognising faces across pose and age. arxiv:1710.08092\u00a0[cs.CV]","key":"e_1_3_2_1_3_1","DOI":"10.1109\/FG.2018.00020"},{"key":"e_1_3_2_1_4_1","volume-title":"Muse: Text-To-Image Generation via Masked Generative Transformers. arxiv:2301.00704\u00a0[cs.CV]","author":"Chang Huiwen","year":"2023","unstructured":"Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William\u00a0T. Freeman, Michael Rubinstein, Yuanzhen Li, and Dilip Krishnan. 2023. Muse: Text-To-Image Generation via Masked Generative Transformers. arxiv:2301.00704\u00a0[cs.CV]"},{"unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arxiv:2010.11929\u00a0[cs.CV]","key":"e_1_3_2_1_5_1"},{"doi-asserted-by":"publisher","unstructured":"Patrick Esser Robin Rombach and Bj\u00f6rn Ommer. 2020. Taming Transformers for High-Resolution Image Synthesis. https:\/\/doi.org\/10.48550\/ARXIV.2012.09841","key":"e_1_3_2_1_6_1","DOI":"10.48550\/ARXIV.2012.09841"},{"doi-asserted-by":"publisher","unstructured":"Rinon Gal Yuval Alaluf Yuval Atzmon Or Patashnik Amit\u00a0H. Bermano Gal Chechik and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https:\/\/doi.org\/10.48550\/ARXIV.2208.01618","key":"e_1_3_2_1_7_1","DOI":"10.48550\/ARXIV.2208.01618"},{"doi-asserted-by":"crossref","unstructured":"Rinon Gal Moab Arar Yuval Atzmon Amit\u00a0H. Bermano Gal Chechik and Daniel Cohen-Or. 2023. Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models. arxiv:2302.12228\u00a0[cs.CV]","key":"e_1_3_2_1_8_1","DOI":"10.1145\/3592133"},{"doi-asserted-by":"publisher","unstructured":"Ian\u00a0J. Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative Adversarial Networks. https:\/\/doi.org\/10.48550\/ARXIV.1406.2661","key":"e_1_3_2_1_9_1","DOI":"10.48550\/ARXIV.1406.2661"},{"doi-asserted-by":"publisher","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. https:\/\/doi.org\/10.48550\/ARXIV.2006.11239","key":"e_1_3_2_1_10_1","DOI":"10.48550\/ARXIV.2006.11239"},{"doi-asserted-by":"publisher","unstructured":"Jonathan Ho and Tim Salimans. 2022. Classifier-Free Diffusion Guidance. https:\/\/doi.org\/10.48550\/ARXIV.2207.12598","key":"e_1_3_2_1_11_1","DOI":"10.48550\/ARXIV.2207.12598"},{"unstructured":"Edward\u00a0J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arxiv:2106.09685\u00a0[cs.CL]","key":"e_1_3_2_1_12_1"},{"key":"e_1_3_2_1_13_1","volume-title":"Workshop on faces in\u2019Real-Life\u2019Images: detection, alignment, and recognition.","author":"Huang B","year":"2008","unstructured":"Gary\u00a0B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in\u2019Real-Life\u2019Images: detection, alignment, and recognition."},{"doi-asserted-by":"publisher","unstructured":"Tero Karras Samuli Laine and Timo Aila. 2018. A Style-Based Generator Architecture for Generative Adversarial Networks. https:\/\/doi.org\/10.48550\/ARXIV.1812.04948","key":"e_1_3_2_1_14_1","DOI":"10.48550\/ARXIV.1812.04948"},{"key":"e_1_3_2_1_15_1","volume-title":"Dreamix: Video Diffusion Models are General Video Editors","author":"Molad Eyal","year":"2023","unstructured":"Eyal Molad, Eliahu Horwitz, Dani Valevski, Alex\u00a0Rav Acha, Yossi Matias, Yael Pritch, Yaniv Leviathan, and Yedid Hoshen. 2023. Dreamix: Video Diffusion Models are General Video Editors. arxiv:2302.01329\u00a0[cs.CV]"},{"doi-asserted-by":"crossref","unstructured":"Yotam Nitzan Kfir Aberman Qiurui He Orly Liba Michal Yarom Yossi Gandelsman Inbar Mosseri Yael Pritch and Daniel Cohen-or. 2022. MyStyle: A Personalized Generative Prior. arxiv:2203.17272\u00a0[cs.CV]","key":"e_1_3_2_1_16_1","DOI":"10.1145\/3550454.3555436"},{"doi-asserted-by":"publisher","unstructured":"Aaron van\u00a0den Oord Oriol Vinyals and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. https:\/\/doi.org\/10.48550\/ARXIV.1711.00937","key":"e_1_3_2_1_17_1","DOI":"10.48550\/ARXIV.1711.00937"},{"doi-asserted-by":"publisher","unstructured":"Or Patashnik Zongze Wu Eli Shechtman Daniel Cohen-Or and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. https:\/\/doi.org\/10.48550\/ARXIV.2103.17249","key":"e_1_3_2_1_18_1","DOI":"10.48550\/ARXIV.2103.17249"},{"doi-asserted-by":"publisher","unstructured":"Alec Radford Jong\u00a0Wook Kim Chris Hallacy Aditya Ramesh Gabriel Goh Sandhini Agarwal Girish Sastry Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. https:\/\/doi.org\/10.48550\/ARXIV.2103.00020","key":"e_1_3_2_1_19_1","DOI":"10.48550\/ARXIV.2103.00020"},{"doi-asserted-by":"publisher","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. https:\/\/doi.org\/10.48550\/ARXIV.2204.06125","key":"e_1_3_2_1_20_1","DOI":"10.48550\/ARXIV.2204.06125"},{"doi-asserted-by":"publisher","unstructured":"Aditya Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. https:\/\/doi.org\/10.48550\/ARXIV.2102.12092","key":"e_1_3_2_1_21_1","DOI":"10.48550\/ARXIV.2102.12092"},{"unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2021a. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752\u00a0[cs.CV]","key":"e_1_3_2_1_22_1"},{"doi-asserted-by":"publisher","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2021b. High-Resolution Image Synthesis with Latent Diffusion Models. https:\/\/doi.org\/10.48550\/ARXIV.2112.10752","key":"e_1_3_2_1_23_1","DOI":"10.48550\/ARXIV.2112.10752"},{"doi-asserted-by":"crossref","unstructured":"Nataniel Ruiz Yuanzhen Li Varun Jampani Yael Pritch Michael Rubinstein and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation.","key":"e_1_3_2_1_24_1","DOI":"10.1109\/CVPR52729.2023.02155"},{"unstructured":"Simo Ryu. 2023. Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning. https:\/\/github.com\/cloneofsimo\/lora.","key":"e_1_3_2_1_25_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_26_1","DOI":"10.48550\/ARXIV.2205.11487"},{"unstructured":"Christoph Schuhmann Romain Beaumont Richard Vencu Cade Gordon Ross Wightman Mehdi Cherti Theo Coombes Aarush Katta Clayton Mullis Mitchell Wortsman Patrick Schramowski Srivatsa Kundurthy Katherine Crowson Ludwig Schmidt Robert Kaczmarczyk and Jenia Jitsev. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. arxiv:2210.08402\u00a0[cs.CV]","key":"e_1_3_2_1_27_1"},{"doi-asserted-by":"crossref","unstructured":"Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alex Alemi. 2016. Inception-v4 Inception-ResNet and the Impact of Residual Connections on Learning. arxiv:1602.07261\u00a0[cs.CV]","key":"e_1_3_2_1_28_1","DOI":"10.1609\/aaai.v31i1.11231"},{"doi-asserted-by":"crossref","unstructured":"Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2014. Going Deeper with Convolutions. arxiv:1409.4842\u00a0[cs.CV]","key":"e_1_3_2_1_29_1","DOI":"10.1109\/CVPR.2015.7298594"},{"unstructured":"Dani Valevski Matan Kalman Yossi Matias and Yaniv Leviathan. 2022. UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image. arxiv:2210.09477\u00a0[cs.CV]","key":"e_1_3_2_1_30_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_31_1","DOI":"10.1109\/CVPR52688.2022.00749"},{"doi-asserted-by":"publisher","unstructured":"Jiahui Yu Yuanzhong Xu Jing\u00a0Yu Koh Thang Luong Gunjan Baid Zirui Wang Vijay Vasudevan Alexander Ku Yinfei Yang Burcu\u00a0Karagol Ayan Ben Hutchinson Wei Han Zarana Parekh Xin Li Han Zhang Jason Baldridge and Yonghui Wu. 2022. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. https:\/\/doi.org\/10.48550\/ARXIV.2206.10789","key":"e_1_3_2_1_32_1","DOI":"10.48550\/ARXIV.2206.10789"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_33_1","DOI":"10.1109\/lsp.2016.2603342"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_34_1","DOI":"10.1109\/CVPR46437.2021.00480"}],"event":{"sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"],"acronym":"SA '23","name":"SA '23: SIGGRAPH Asia 2023","location":"Sydney NSW Australia"},"container-title":["SIGGRAPH Asia 2023 Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3610548.3618249","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,10]],"date-time":"2024-12-10T11:53:46Z","timestamp":1733831626000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610548.3618249"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,10]]},"references-count":34,"alternative-id":["10.1145\/3610548.3618249","10.1145\/3610548"],"URL":"https:\/\/doi.org\/10.1145\/3610548.3618249","relation":{},"subject":[],"published":{"date-parts":[[2023,12,10]]},"assertion":[{"value":"2023-12-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}