Xintao Wang
profile image

Xintao Wang

Contact Me
I am currently a Senior Staff Researcher at KwaiVGI, Kuaishou Technology, leading an effort on multimodal generation, including image/video generation and 3D generation. Previously, I was a Senior Staff Researcher at Tencent ARC Lab and Tencent AI Lab.
We are actively looking for research interns and full-time researchers to work on cutting-edge research topics. If you're interested in exploring these opportunities, please reach out to me at xintao.wang@outlook.com.

I received my Ph.D. from Multimedia Lab (MMLab), the Chinese University of Hong Kong, advised by Prof. Xiaoou Tang and Prof. Chen Change Loy. I obtained my bachelor's degree from Zhejiang University.

News

GFPGAN GitHub stars

Practical face restoration

Real-ESRGAN GitHub stars

Practical algorithms for image restoration

BasicSR GitHub stars

Open source image and video restoration toolbox

T2I-Adapter GitHub stars

Dig out controllable ability for text-to-image diffusion models

VideoCrafter GitHub stars

Open sourced large models for video generation

HandyView GitHub stars

Handy image viewer

Publications [Full List]

(* equal contribution, # corresponding author)

Selected Preprint

teaser

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang

arXiv preprint: 2501.13918.
  Project Page  Paper (arXiv) 

teaser

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu, Yiran Qin, Xintao Wang#, Pengfei Wan, Di Zhang, Xihui Liu#

arXiv preprint: 2501.08325.
  Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo

arXiv preprint: 2412.07744
  Project Page Paper (arXiv)  Codes  GitHub stars

2024

teaser

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao Wang, Ziyang Yuan, Xiao Fu, Zuozhu Liu, Haoji Hu, Pengfei Wan, Di Zhang

arXiv preprint: 2412.07760
ICLR, 2025.   Project Page Paper (arXiv)  Codes  GitHub stars

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Xiao Fu, Xian Liu, Xintao Wang#, Sida Peng, Menghan Xia, Xiaoyu Shi, Ziyang Yuan, Pengfei Wan, Di Zhang, Dahua Lin#

arXiv preprint: 2412.07759
ICLR, 2025.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan

arXiv preprint: 2407.06358
NeurIPS (Datasets & Benchmarks Track), 2024.  Project Page Paper (arXiv)  Codes  GitHub stars

teaser

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng

arXiv preprint: 2405.20222
ECCV, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

ToonCrafter: Generative Cartoon Interpolation

Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, Tien-Tsin Wong

arXiv preprint: 2405.17933
SIGGRAPH Asia, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

ReVideo: Remake a Video with Motion and Content Control

Chong Mou, Mingdeng Cao, Xintao Wang#, Zhaoyang Zhang, Ying Shan, Jian Zhang#

arXiv preprint: 2405.13865
NeurIPS, 2024.  Project Page Paper (arXiv)  Codes  GitHub stars

teaser

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Xuan Ju, Xian Liu, Xintao Wang#, Yuxuan Bian, Ying Shan, Qiang Xu#

arXiv preprint: 2403.06976
ECCV, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong

arXiv preprint: 2401.13627
CVPR, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Wen, Ying Shan

arXiv preprint: 2401.09047
CVPR, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

2023

teaser

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Yuzhou Huang, Liangbin Xie, Xintao Wang#, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang#, Ying Shan

arXiv preprint: 2312.06739
CVPR, 2024 (hilight).   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li, Mingdeng Cao, Xintao Wang#, Zhongang Qi, Ming-Ming Cheng#, Ying Shan

arXiv preprint: 2312.04461
CVPR, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Zhouxia Wang, Ziyang Yuan, Xintao Wang#, Tianshui Chen, Menghan Xia, Ping Luo#, Ying Shan

arXiv preprint: 2312.03641
SIGGRAPH, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Yibo Wang, Xintao Wang, Yujiu Yang, Ying Shan

arXiv preprint: 2312.00330
SIGGRAPH Asia, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

Ziyang Yuan, Mingdeng Cao, Xintao Wang#, Zhongang Qi, Chun Yuan#, Ying Shan

arXiv preprint: 2310.19784
ACM MM, 2024.   Project Page Paper (arXiv) 

teaser

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

arXiv preprint: 2310.15169.
ICLR, 2024   Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, Tien-Tsin Wong

arXiv preprint: 2310.12190
ECCV, 2024 (oral).   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond H. Chan, Ying Shan

arXiv preprint: 2310.11440
CVPR, 2024.   Project Page Paper (arXiv)  Codes  GitHub stars

teaser

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

arXiv preprint: 2310.07702.
ICLR, 2024 (spotlight)   Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

Making LLaMA SEE and Draw with SEED Tokenizer

Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, , Ying Shan, Ziwei Liu

arXiv preprint: 2310.01218.
ICLR, 2024   Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

StyleAdapter: A Unified Stylized Image Generation Model

Zhouxia Wang, Xintao Wang#, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo#

arXiv preprint: 2309.01770
IJCV, 2024.   Paper (arXiv) 

teaser

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang

arXiv preprint: 2307.02421.
ICLR, 2024 (spotlight)   Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

Yunpeng Bai, Xintao Wang, Yan-Pei Cao, Yixiao Ge, Chun Yuan, Ying Shan

arXiv preprint: 2306.16934
ECCV, 2024.   Paper (arXiv)  Codes  GitHub stars

teaser

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Mingdeng Cao, Xintao Wang#, Zhongang Qi, Ying Shan, Xiaohu Qie, Yinqiang Zheng#

arXiv preprint: 2304.08465
ICCV, 2023.   Project Page  Paper (arXiv)  Codes  GitHub stars

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

arXiv preprint: 2304.01186.
AAAI, 2024.   Project Page  Paper (arXiv)  Codes  GitHub stars

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen

arXiv preprint: 2303.09535
ICCV, 2023 (oral).   Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

Chong Mou, Xintao Wang#, Liangbin Xie, Yanze Wu, Jian Zhang#, Zhongang Qi, Ying Shan, Xiaohu Qie

arXiv preprint: 2302.08453
AAAI, 2023.   Paper (arXiv)  Codes  GitHub stars

2022

teaser

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou

arXiv preprint: 2212.11565
ICCV, 2023.   Project Page  Paper (arXiv)  Codes  GitHub stars

teaser

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models

Liangbin Xie*, Xintao Wang*, Xiangyu Chen*, Gen Li, Ying Shan, Jiantao Zhou, Chao Dong

ICML, 2023.   Paper (arXiv)  Codes  GitHub stars

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Jiale Xu, Xintao Wang#, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, Shenghua Gao#

CVPR, 2023.   Project Page  Paper (arXiv)  Codes (Coming Soon) 

teaser

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

Yuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Mike Zheng Shou

arXiv preprint: 2212.03185
CVPR, 2024.   Paper (arXiv)

teaser

OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer

Fanghua Yu*, Xintao Wang*, Mingdeng Cao, Gen Li, Ying Shan, Chao Dong#

CVPR, 2023.   Paper (arXiv)  Codes  GitHub stars

teaser

Mitigating Artifacts in Real-World Video Super-Resolution Models

Liangbin Xie, Xintao Wang, Shuwei Shi, Jinjin Gu, Chao Dong, Ying Shan

AAAI, 2022.   Paper (arXiv)  Codes  GitHub stars

teaser

Accelerating the Training of Video Super-resolution Models

Lijian Lin, Xintao Wang#, Zhongang Qi, Ying Shan

AAAI, 2022.   Paper (arXiv)  Codes  GitHub stars

teaser

Rethinking Alignment in Video Super-Resolution Transformers

Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, Chao Dong

NeurIPS, 2022.   Paper (arXiv)  Codes  GitHub stars

teaser

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder

Yuchao Gu, Xintao Wang, Liangbie Xie, Chao Dong, Gen Li, Ying Shan, Ming-Ming Cheng

Selected as oral (2.7%)
ECCV, 2022.   Paper (arXiv)  Codes  GitHub stars

teaser

MM-RealSR: Metric Learning based Interactive Modulation for Real-World Super-Resolution

Chong Mou, Yanze Wu, Xintao Wang, Chao Dong, Jian Zhang, Ying Shan

ECCV, 2022.   Paper (arXiv)  Codes  GitHub stars

2021

2020 and before

To be updated