LTX-Video: Official repository for LTX-Video
LTX-Video
Official repository for LTX-Video
Introduction
LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 30 FPS videos at 1216×704 resolution, faster than it takes to watch them. The model is trained on a large-scale dataset of diverse videos and can generate high-resolution videos with realistic and diverse content.
The model supports text-to-image, image-to-video, keyframe-based animation, video extension (both forward and backward), video-to-video transformations, and any combination of these features.
Models & Workflows
Name | Notes | inference.py config | ComfyUI workflow (Recommended) |
---|---|---|---|
ltxv-13b-0.9.7-dev | Highest quality, requires more VRAM | ltxv-13b-0.9.7-dev.yaml | ltxv-13b-i2v-base.json |
ltxv-13b-0.9.7-mix | Mix ltxv-13b-dev and ltxv-13b-distilled in the same multi-scale rendering workflow for balanced speed-quality | N/A | ltxv-13b-i2v-mixed-multiscale.json |
ltxv-13b-0.9.7-distilled | Faster, less VRAM usage, slight quality reduction compared to 13b. Ideal for rapid iterations | ltxv-13b-0.9.7-distilled.yaml | ltxv-13b-dist-i2v-base.json |
ltxv-13b-0.9.7-distilled-lora128 | LoRA to make ltxv-13b-dev behave like the distilled model | N/A | N/A |
ltxv-13b-0.9.7-fp8 | Quantized version of ltxv-13b | Coming soon | ltxv-13b-i2v-base-fp8.json |
ltxv-13b-0.9.7-distilled-fp8 | Quantized version of ltxv-13b-distilled | Coming soon | ltxv-13b-dist-i2v-base-fp8.json |
ltxv-2b-0.9.6 | Good quality, lower VRAM requirement than ltxv-13b | ltxv-2b-0.9.6-dev.yaml | ltxvideo-i2v.json |
ltxv-2b-0.9.6-distilled | 15× faster, real-time capable, fewer steps needed, no STG/CFG required | ltxv-2b-0.9.6-distilled.yaml | ltxvideo-i2v-distilled.json |
Quick Start Guide
Online inference
The model is accessible right away via the following links:
- LTX-Studio image-to-video (13B-mix)
- LTX-Studio image-to-video (13B distilled)
- Fal.ai text-to-video
- Fal.ai image-to-video
- Replicate text-to-video and image-to-video
Run locally
Installation
The codebase was tested with Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2. On macos, MPS was tested with PyTorch 2.3.0, and should support PyTorch == 2.3 or >= 2.6.
git clone https://github.com/Lightricks/LTX-Video.gitcd LTX-Video
# create envpython -m venv envsource env/bin/activatepython -m pip install -e .\[inference-script\]
Inference
📝 Note: For best results, we recommend using our ComfyUI workflow. We’re working on updating the inference.py script to match the high quality and output fidelity of ComfyUI.
To use our model, please follow the inference code in inference.py:
For text-to-video generation:
python inference.py --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
For image-to-video generation:
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
Extending a video:
📝 Note: Input video segments must contain a multiple of 8 frames plus 1 (e.g., 9, 17, 25, etc.), and the target frame number should be a multiple of 8.
python inference.py --prompt "PROMPT" --conditioning_media_paths VIDEO_PATH --conditioning_start_frames START_FRAME --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
For video generation with multiple conditions:
You can now generate a video conditioned on a set of images and/or short video segments. Simply provide a list of paths to the images or video segments you want to condition on, along with their target frame numbers in the generated video. You can also specify the conditioning strength for each item (default: 1.0).
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
ComfyUI Integration
To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.
Diffusers Integration
To use our model with the Diffusers Python library, check out the official documentation.
Diffusers also support an 8-bit version of LTX-Video, see details below
Model User Guide
📝 Prompt Engineering
When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:
- Start with main action in a single sentence
- Add specific details about movements and gestures
- Describe character/object appearances precisely
- Include background and environment details
- Specify camera angles and movements
- Describe lighting and colors
- Note any changes or sudden events
- See examples for more inspiration.
Automatic Prompt Enhancement
When using inference.py
, shorts prompts (below prompt_enhancement_words_threshold
words) are automatically enhanced by a language model. This is supported with text-to-video and image-to-video (first-frame conditioning).
When using LTXVideoPipeline
directly, you can enable prompt enhancement by setting enhance_prompt=True
.
🎮 Parameter Guide
- Resolution Preset: Higher resolutions for detailed scenes, lower for faster generation and simpler scenes. The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames. The model works best on resolutions under 720 x 1280 and number of frames below 257
- Seed: Save seed values to recreate specific styles or compositions you like
- Guidance Scale: 3-3.5 are the recommended values
- Inference Steps: More steps (40+) for quality, fewer steps (20-30) for speed
📝 For advanced parameters usage, please see python inference.py --help
← Back to projects