Video Generation
LocalAI can generate videos from text prompts and optional reference images via the /video endpoint. Supported backends include diffusers, stablediffusion, and vllm-omni.
API
- Method:
POST - Endpoint:
/video
Request
The request body is JSON with the following fields:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | Model name to use | |
prompt | string | Yes | Text description of the video to generate | |
negative_prompt | string | No | What to exclude from the generated video | |
start_image | string | No | Starting image as base64 string or URL | |
end_image | string | No | Ending image for guided generation | |
width | int | No | 512 | Video width in pixels |
height | int | No | 512 | Video height in pixels |
num_frames | int | No | Number of frames | |
fps | int | No | Frames per second | |
seconds | string | No | Duration in seconds | |
size | string | No | Size specification (alternative to width/height) | |
input_reference | string | No | Input reference for the generation | |
seed | int | No | Random seed for reproducibility | |
cfg_scale | float | No | Classifier-free guidance scale | |
step | int | No | Number of inference steps | |
response_format | string | No | url | url to return a file URL, b64_json for base64 output |
Response
Returns an OpenAI-compatible JSON response:
| Field | Type | Description |
|---|---|---|
created | int | Unix timestamp of generation |
id | string | Unique identifier (UUID) |
data | array | Array of generated video items |
data[].url | string | URL path to video file (if response_format is url) |
data[].b64_json | string | Base64-encoded video (if response_format is b64_json) |
Usage
Generate a video from a text prompt
Example response
Generate with a starting image
Get base64-encoded output
Error Responses
| Status Code | Description |
|---|---|
| 400 | Missing or invalid model or request parameters |
| 500 | Backend error during video generation |