CogvideoX Controlnet Extention

stacked_ship_video.mp4

This repo contains the code for simple Controlnet module for CogvideoX model.

ComfyUI

ComfyUI-CogVideoXWrapper supports controlnet pipeline. See an example file.

Models

Supported models for 5B:

Canny (HF Model Link)
Hed (HF Model Link)

Supported models for 2B:

Canny (HF Model Link)
Hed (HF Model Link)

How to

Clone repo

git clone https://github.com/TheDenk/cogvideox-controlnet.git
cd cogvideox-controlnet

Create venv

python -m venv venv
source venv/bin/activate

Install requirements

pip install -r requirements.txt

Simple examples

Inference with cli

python -m inference.cli_demo \
    --video_path "resources/car.mp4" \
    --prompt "The camera follows behind red car. Car is surrounded by a panoramic view of the vast, azure ocean. Seagulls soar overhead, and in the distance, a lighthouse stands sentinel, its beam cutting through the twilight. The scene captures a perfect blend of adventure and serenity, with the car symbolizing freedom on the open sea." \
    --controlnet_type "canny" \
    --base_model_path THUDM/CogVideoX-5b \
    --controlnet_model_path TheDenk/cogvideox-5b-controlnet-canny-v1

Inference with Gradio

python -m inference.gradio_web_demo \
    --controlnet_type "canny" \
    --base_model_path THUDM/CogVideoX-5b \
    --controlnet_model_path TheDenk/cogvideox-5b-controlnet-canny-v1

Detailed inference

CUDA_VISIBLE_DEVICES=0 python -m inference.cli_demo \
    --video_path "resources/car.mp4" \
    --prompt "The camera follows behind red car. Car is surrounded by a panoramic view of the vast, azure ocean. Seagulls soar overhead, and in the distance, a lighthouse stands sentinel, its beam cutting through the twilight. The scene captures a perfect blend of adventure and serenity, with the car symbolizing freedom on the open sea." \
    --controlnet_type "canny" \
    --base_model_path THUDM/CogVideoX-5b \
    --controlnet_model_path TheDenk/cogvideox-5b-controlnet-canny-v1 \
    --num_inference_steps 50 \
    --guidance_scale 6.0 \
    --controlnet_weights 1.0 \
    --controlnet_guidance_start 0.0 \
    --controlnet_guidance_end 0.5 \
    --output_path "./output.mp4" \
    --seed 42

Training

The 2B model requires 48 GB VRAM (For example A6000) and 80 GB for 5B. But it depends on the number of transformer blocks which default is 8 (controlnet_transformer_num_layers parameter in the config).

Dataset

OpenVid-1M dataset was taken as the base variant. CSV files for the dataset you can find here.

Train script

For start training you need fill the config files accelerate_config_machine_single.yaml and finetune_single_rank.sh.
In accelerate_config_machine_single.yaml set parameternum_processes: 1 to your GPU count.
In finetune_single_rank.sh:

Set MODEL_PATH for base CogVideoX model. Default is THUDM/CogVideoX-2b.
Set CUDA_VISIBLE_DEVICES (Default is 0).
(For OpenVid dataset) Set video_root_dir to directory with video files and csv_path.

Run taining

cd training
bash finetune_single_rank.sh

Acknowledgements

Original code and models CogVideoX.

Contacts

Issues should be raised directly in the repository. For professional support and recommendations please welcomedenk@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
inference		inference
resources		resources
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cogvideo_controlnet.py		cogvideo_controlnet.py
cogvideo_transformer.py		cogvideo_transformer.py
controlnet_pipeline.py		controlnet_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CogvideoX Controlnet Extention

ComfyUI

Models

How to

Simple examples

Inference with cli

Inference with Gradio

Detailed inference

Training

Dataset

Train script

Acknowledgements

Contacts

About

Releases

Packages

Languages

License

TheDenk/cogvideox-controlnet

Folders and files

Latest commit

History

Repository files navigation

CogvideoX Controlnet Extention

ComfyUI

Models

How to

Simple examples

Inference with cli

Inference with Gradio

Detailed inference

Training

Dataset

Train script

Acknowledgements

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages