Authors: Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen † (†: corresponding author)
- [2024/07/18] We release HuggingFace space, code, and checkpoints.
 - [2024/07/22] We release Colab Demo
 
- Support Colab
 
- Uni-directional Temporal Attention with Warmup Mechanism
 - Multitimestep KV-Cache for Temporal Attention during Inference
 - Depth Prior for Better Structure Consistency
 - Compatible with DreamBooth and LoRA for Various Styles
 - TensorRT Supported
 
The speed evaluation is conducted on Ubuntu 20.04.6 LTS and Pytorch 2.2.2 with RTX 4090 GPU and Intel(R) Xeon(R) Platinum 8352V CPU. Denoising steps are set as 2.
| Resolution | TensorRT | FPS | 
|---|---|---|
| 512 x 512 | On | 16.43 | 
| 512 x 512 | Off | 6.91 | 
| 768 x 512 | On | 12.15 | 
| 768 x 512 | Off | 6.29 | 
git clone https://github.com/open-mmlab/Live2Diff.git
# or vis ssh
git clone git@github.com:open-mmlab/Live2Diff.git
cd Live2Diff
git submodule update --init --recursiveCreate virtual environment via conda:
conda create -n live2diff python=3.10
conda activate live2diffSelect the appropriate version for your system.
# CUDA 11.8
pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu121Please may refers to https://pytorch.org/ for more detail.
If you want to use TensorRT acceleration (we recommend it), you can install it by the following command.
# for cuda 11.x
pip install ."[tensorrt_cu11]"
# for cuda 12.x
pip install ."[tensorrt_cu12]"Otherwise, you can install it via
pip install .If you want to install it with development mode (a.k.a. "Editable Installs"), you can add -e option.
# for cuda 11.x
pip install -e ."[tensorrt_cu11]"
# for cuda 12.x
pip install -e ."[tensorrt_cu12]"
# or
pip install -e .- Download StableDiffusion-v1-5
 
huggingface-cli download runwayml/stable-diffusion-v1-5 --local-dir ./models/Model/stable-diffusion-v1-5- 
Download Checkpoint from HuggingFace and put it under
modelsfolder. - 
Download Depth Detector from MiDaS's official release and put it under
modelsfolder. - 
Apply the download token from civitAI and then download Dreambooths and LoRAs via the script:
 
# download all DreamBooth/Lora
bash scripts/download.sh all YOUR_TOKEN
# or download the one you want to use
bash scripts/download.sh disney YOUR_TOKEN- Download demo data from OneDrive.
 
Then then data structure of models folder should be like this:
./
|-- models
|   |-- LoRA
|   |   |-- MoXinV1.safetensors
|   |   `-- ...
|   |-- Model
|   |   |-- 3Guofeng3_v34.safetensors
|   |   |-- ...
|   |   `-- stable-diffusion-v1-5
|   |-- live2diff.ckpt
|   `-- dpt_hybrid_384.pt
`--data
   |-- 1.mp4
   |-- 2.mp4
   |-- 3.mp4
   `-- 4.mp4The above installation steps (e.g. download script) are for Linux users and not well tested on Windows. If you face any difficulties, please feel free to open an issue 🤗.
You can try examples under data directory. For example,
# with TensorRT acceleration, please pay patience for the first time, may take more than 20 minutes
python test.py ./data/1.mp4 ./configs/disneyPixar.yaml --max-frames -1 --prompt "1man is talking" --output work_dirs/1-disneyPixar.mp4 --height 512 --width 512 --acceleration tensorrt
# without TensorRT acceleration
python test.py ./data/2.mp4 ./configs/disneyPixar.yaml --max-frames -1 --prompt "1man is talking" --output work_dirs/1-disneyPixar.mp4 --height 512 --width 512 --acceleration noneYou can adjust denoising strength via --num-inference-steps, --strength, and --t-index-list.  Please refers to test.py for more detail.
- If you face Cuda Out-of-memory error with TensorRT, please try to reduce 
t-index-listorstrength. When inference with TensorRT, we maintian a group of buffer for kv-cache, which consumes more memory. Reducet-index-listorstrengthcan reduce the size of kv-cache and save more GPU memory. 
There is an interactive txt2img demo in demo directory!
Please refers to demo/README.md for more details.
| 
                 Human Face (Web Camera Input)  | 
            
                 Anime Character (Screen Video Input)  | 
        
                online-demo.mp4 | 
            
                arknight-old-woman-v3.mp4 | 
        
The video and image demos in this GitHub repository were generated using LCM-LoRA. Stream batch in StreamDiffusion is used for model acceleration. The design of Video Diffusion Model is adopted from AnimateDiff. We use a third-party implementation of MiDaS implementation which support onnx export. Our online demo is modified from Real-Time-Latent-Consistency-Model.
If you find it helpful, please consider citing our work:
@article{xing2024live2diff,
  title={Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models},
  author={Zhening Xing and Gereon Fox and Yanhong Zeng and Xingang Pan and Mohamed Elgharib and Christian Theobalt and Kai Chen},
  booktitle={arXiv preprint arxiv:2407.08701},
  year={2024}
}

