Sound Generation
LocalAI supports generating audio from text descriptions via the /v1/sound-generation endpoint. This endpoint is compatible with the ElevenLabs sound generation API and can produce music, sound effects, and other audio content.
API
- Method:
POST - Endpoint:
/v1/sound-generation
Request
The request body is JSON. There are two usage modes: simple and advanced.
Simple mode
| Parameter | Type | Required | Description |
|---|---|---|---|
model_id | string | Yes | Model identifier |
text | string | Yes | Audio description or prompt |
instrumental | bool | No | Generate instrumental audio (no vocals) |
vocal_language | string | No | Language code for vocals (e.g. bn, ja) |
Advanced mode
| Parameter | Type | Required | Description |
|---|---|---|---|
model_id | string | Yes | Model identifier |
text | string | Yes | Text prompt or description |
duration_seconds | float | No | Target duration in seconds |
prompt_influence | float | No | Temperature / prompt influence parameter |
do_sample | bool | No | Enable sampling |
think | bool | No | Enable extended thinking for generation |
caption | string | No | Caption describing the audio |
lyrics | string | No | Lyrics for the generated audio |
bpm | int | No | Beats per minute |
keyscale | string | No | Musical key/scale (e.g. Ab major) |
language | string | No | Language code |
vocal_language | string | No | Vocal language (fallback if language is empty) |
timesignature | string | No | Time signature (e.g. 4) |
instrumental | bool | No | Generate instrumental audio (no vocals) |
Response
Returns a binary audio file with the appropriate Content-Type header (e.g. audio/wav, audio/mpeg, audio/flac, audio/ogg).
Usage
Generate a sound effect
Generate a song with vocals
Generate music with advanced parameters
Error Responses
| Status Code | Description |
|---|---|
| 400 | Missing or invalid model or request parameters |
| 500 | Backend error during sound generation |