On this page
article
Model compatibility table
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
llama.cpp | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal |
whisper | whisper | no | Audio | no | no | N/A |
langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
piper (binding) | Any piper onnx model | no | Text to voice | no | no | N/A |
sentencetransformers | BERT | no | Embeddings only | yes | no | N/A |
bark | bark | no | Audio generation | no | no | yes |
autogptq | GPTQ | yes | GPT | yes | no | N/A |
diffusers | SD,… | no | Image generation | no | no | N/A |
vllm | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
exllama2 | GPTQ | yes | GPT only | no | no | N/A |
transformers-musicgen | no | Audio generation | no | no | N/A | |
stablediffusion | no | Image | no | no | N/A | |
coqui | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
rerankers | Reranking API | no | Reranking | no | no | CPU/CUDA |
transformers | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CPU/CUDA/XPU |
bark-cpp | bark | no | Audio-Only | no | no | yes |
stablediffusion-cpp | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A |
silero-vad with Golang bindings | Silero VAD | no | Voice Activity Detection | no | no | CPU |
Note: any backend name listed above can be used in the backend
field of the model configuration file (See the advanced section).
- * Only for CUDA and OpenVINO CPU/XPU acceleration.
Last updated 17 Feb 2025, 16:51 +0100 .