๐Ÿ—ฃ Text to audio (TTS)

The /tts endpoint can be used to generate speech from text.

Input: input, model

For example, to generate an audio file, you can send a POST request to the /tts endpoint with the instruction as the request body:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "input": "Hello world",
  "model": "tts"

Returns an audio/wav file.


LocalAI supports bark , piper and vall-e-x:


The piper backend is used for onnx models and requires the modules to be downloaded first.

To install the piper audio models manually:

To use the tts endpoint, run the following command. You can specify a backend with the backend parameter. For example, to use the piper backend:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "backend": "piper",
  "input": "Ciao, sono Ettore"
}' | aplay


  • aplay is a Linux command. You can use other tools to play the audio file.
  • The model name is the filename with the extension.
  • The model name is case sensitive.
  • LocalAI must be compiled with the GO_TAGS=tts flag.


Audio models can be configured via YAML files. This allows to configure specific setting for each backend. For instance, backends might be specifying a voice or supports voice cloning which must be specified in the configuration file.

name: tts
backend: vall-e-x
parameters: ...