Sound Generation

LocalAI supports generating audio from text descriptions via the /v1/sound-generation endpoint. This endpoint is compatible with the ElevenLabs sound generation API and can produce music, sound effects, and other audio content.

API

  • Method: POST
  • Endpoint: /v1/sound-generation

Request

The request body is JSON. There are two usage modes: simple and advanced.

Simple mode

ParameterTypeRequiredDescription
model_idstringYesModel identifier
textstringYesAudio description or prompt
instrumentalboolNoGenerate instrumental audio (no vocals)
vocal_languagestringNoLanguage code for vocals (e.g. bn, ja)

Advanced mode

ParameterTypeRequiredDescription
model_idstringYesModel identifier
textstringYesText prompt or description
duration_secondsfloatNoTarget duration in seconds
prompt_influencefloatNoTemperature / prompt influence parameter
do_sampleboolNoEnable sampling
thinkboolNoEnable extended thinking for generation
captionstringNoCaption describing the audio
lyricsstringNoLyrics for the generated audio
bpmintNoBeats per minute
keyscalestringNoMusical key/scale (e.g. Ab major)
languagestringNoLanguage code
vocal_languagestringNoVocal language (fallback if language is empty)
timesignaturestringNoTime signature (e.g. 4)
instrumentalboolNoGenerate instrumental audio (no vocals)

Response

Returns a binary audio file with the appropriate Content-Type header (e.g. audio/wav, audio/mpeg, audio/flac, audio/ogg).

Usage

Generate a sound effect

curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "rain falling on a tin roof"
  }' \
  --output rain.wav

Generate a song with vocals

curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "a soft Bengali love song for a quiet evening",
    "instrumental": false,
    "vocal_language": "bn"
  }' \
  --output song.wav

Generate music with advanced parameters

curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "upbeat pop",
    "caption": "A funky Japanese disco track",
    "lyrics": "[Verse 1]\nDancing in the neon lights",
    "think": true,
    "bpm": 120,
    "duration_seconds": 225,
    "keyscale": "Ab major",
    "language": "ja",
    "timesignature": "4"
  }' \
  --output disco.wav

Error Responses

Status CodeDescription
400Missing or invalid model or request parameters
500Backend error during sound generation