Wan-AI/Wan2.2-T2V-A14B-Diffusers
Wan2.2 video generation models — T2V/I2V MoE (14B active) and unified TI2V (5B dense), served via vLLM-Omni
View on HuggingFaceGuide
Overview
Wan2.2 is a video generation family served via vLLM-Omni with optional Cache-DiT acceleration:
Wan-AI/Wan2.2-T2V-A14B-Diffusers— Text-to-Video (MoE, 14B active)Wan-AI/Wan2.2-I2V-A14B-Diffusers— Image-to-Video (MoE, 14B active)Wan-AI/Wan2.2-TI2V-5B-Diffusers— Unified Text+Image-to-Video (dense 5B)
Prerequisites
- vLLM-Omni on top of vLLM 0.12.0
- diffusers (bundled in vLLM-Omni CLI scripts)
Installation
uv venv
source .venv/bin/activate
uv pip install vllm==0.12.0
uv pip install git+https://github.com/vllm-project/vllm-omni.git@ef01223c42be10ee260b9f6e5ec31894cd09d86e
Text-to-Video (T2V)
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")
frames = omni.generate(
"Two anthropomorphic cats in comfy boxing gear fight on a spotlighted stage.",
height=720, width=1280,
num_frames=81,
num_inference_steps=40,
guidance_scale=4.0,
)
CLI:
python examples/offline_inference/text_to_video/text_to_video.py \
--model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--prompt "A serene lakeside sunrise with mist over the water." \
--height 720 --width 1280 \
--num_frames 81 --num_inference_steps 40 \
--guidance_scale 4.0 --fps 24 \
--output t2v_output.mp4
Image-to-Video (I2V)
import PIL.Image
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")
image = PIL.Image.open("input.jpg").convert("RGB")
frames = omni.generate(
"A cat playing with yarn",
pil_image=image,
height=480, width=832,
num_frames=81,
num_inference_steps=50,
guidance_scale=5.0,
)
TI2V CLI:
python examples/offline_inference/image_to_video/image_to_video.py \
--model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
--image input.jpg --prompt "A cat playing with yarn" \
--num_frames 81 --num_inference_steps 50 \
--guidance_scale 5.0 --fps 16 --output ti2v_output.mp4
Cache-DiT Acceleration
omni = Omni(
model="Wan-AI/Wan2.2-T2V-A14B-Diffusers",
cache_backend="cache_dit",
cache_config={
"Fn_compute_blocks": 8,
"Bn_compute_blocks": 0,
"max_warmup_steps": 4,
"residual_diff_threshold": 0.12,
},
)
Key Parameters
| Parameter | Default | Description |
|---|---|---|
height | 720 (T2V) / auto (I2V) | Video height (multiples of 16) |
width | 1280 (T2V) / auto (I2V) | Video width (multiples of 16) |
num_frames | 81 | Frames to generate |
num_inference_steps | 40–50 | Denoising steps |
guidance_scale | 4.0–5.0 | Classifier-free guidance scale |
boundary_ratio | 0.875 | MoE boundary split ratio |
flow_shift | 5.0 (720p) / 12.0 (480p) | Scheduler flow shift |