LTX-Video: Official repository for LTX-Video

LTX-Video

Official repository for LTX-Video

Introduction

LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 30 FPS videos at 1216×704 resolution, faster than it takes to watch them. The model is trained on a large-scale dataset of diverse videos and can generate high-resolution videos with realistic and diverse content.

The model supports text-to-image, image-to-video, keyframe-based animation, video extension (both forward and backward), video-to-video transformations, and any combination of these features.

Models & Workflows

NameNotesinference.py configComfyUI workflow (Recommended)
ltxv-13b-0.9.7-devHighest quality, requires more VRAMltxv-13b-0.9.7-dev.yamlltxv-13b-i2v-base.json
ltxv-13b-0.9.7-mixMix ltxv-13b-dev and ltxv-13b-distilled in the same multi-scale rendering workflow for balanced speed-qualityN/Altxv-13b-i2v-mixed-multiscale.json
ltxv-13b-0.9.7-distilledFaster, less VRAM usage, slight quality reduction compared to 13b. Ideal for rapid iterationsltxv-13b-0.9.7-distilled.yamlltxv-13b-dist-i2v-base.json
ltxv-13b-0.9.7-distilled-lora128LoRA to make ltxv-13b-dev behave like the distilled modelN/AN/A
ltxv-13b-0.9.7-fp8Quantized version of ltxv-13bComing soonltxv-13b-i2v-base-fp8.json
ltxv-13b-0.9.7-distilled-fp8Quantized version of ltxv-13b-distilledComing soonltxv-13b-dist-i2v-base-fp8.json
ltxv-2b-0.9.6Good quality, lower VRAM requirement than ltxv-13bltxv-2b-0.9.6-dev.yamlltxvideo-i2v.json
ltxv-2b-0.9.6-distilled15× faster, real-time capable, fewer steps needed, no STG/CFG requiredltxv-2b-0.9.6-distilled.yamlltxvideo-i2v-distilled.json

Quick Start Guide

Online inference

The model is accessible right away via the following links:

Run locally

Installation

The codebase was tested with Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2. On macos, MPS was tested with PyTorch 2.3.0, and should support PyTorch == 2.3 or >= 2.6.

Terminal window
git clone https://github.com/Lightricks/LTX-Video.git
cd LTX-Video
# create env
python -m venv env
source env/bin/activate
python -m pip install -e .\[inference-script\]

Inference

📝 Note: For best results, we recommend using our ComfyUI workflow. We’re working on updating the inference.py script to match the high quality and output fidelity of ComfyUI.

To use our model, please follow the inference code in inference.py:

For text-to-video generation:
Terminal window
python inference.py --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
For image-to-video generation:
Terminal window
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
Extending a video:

📝 Note: Input video segments must contain a multiple of 8 frames plus 1 (e.g., 9, 17, 25, etc.), and the target frame number should be a multiple of 8.

Terminal window
python inference.py --prompt "PROMPT" --conditioning_media_paths VIDEO_PATH --conditioning_start_frames START_FRAME --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
For video generation with multiple conditions:

You can now generate a video conditioned on a set of images and/or short video segments. Simply provide a list of paths to the images or video segments you want to condition on, along with their target frame numbers in the generated video. You can also specify the conditioning strength for each item (default: 1.0).

Terminal window
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml

ComfyUI Integration

To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.

Diffusers Integration

To use our model with the Diffusers Python library, check out the official documentation.

Diffusers also support an 8-bit version of LTX-Video, see details below

Model User Guide

📝 Prompt Engineering

When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:

  • Start with main action in a single sentence
  • Add specific details about movements and gestures
  • Describe character/object appearances precisely
  • Include background and environment details
  • Specify camera angles and movements
  • Describe lighting and colors
  • Note any changes or sudden events
  • See examples for more inspiration.

Automatic Prompt Enhancement

When using inference.py, shorts prompts (below prompt_enhancement_words_threshold words) are automatically enhanced by a language model. This is supported with text-to-video and image-to-video (first-frame conditioning).

When using LTXVideoPipeline directly, you can enable prompt enhancement by setting enhance_prompt=True.

🎮 Parameter Guide

  • Resolution Preset: Higher resolutions for detailed scenes, lower for faster generation and simpler scenes. The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames. The model works best on resolutions under 720 x 1280 and number of frames below 257
  • Seed: Save seed values to recreate specific styles or compositions you like
  • Guidance Scale: 3-3.5 are the recommended values
  • Inference Steps: More steps (40+) for quality, fewer steps (20-30) for speed

📝 For advanced parameters usage, please see python inference.py --help


← Back to projects