On this page
article
Model compatibility table
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
llama.cpp | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | yes | GPT and Functions | yes** | yes | CUDA, openCL, cuBLAS, Metal |
llama.cpp’s ggml model (backward compatibility with old format, before GGUF) (binding) | LLama, GPT-2, and many others | yes | GPT and Functions | yes** | yes | CUDA, openCL, cuBLAS, Metal |
whisper | whisper | no | Audio | no | no | N/A |
stablediffusion (binding) | stablediffusion | no | Image | no | no | N/A |
langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
piper (binding) | Any piper onnx model | no | Text to voice | no | no | N/A |
sentencetransformers | BERT | no | Embeddings only | yes | no | N/A |
bark | bark | no | Audio generation | no | no | yes |
autogptq | GPTQ | yes | GPT | yes | no | N/A |
exllama | GPTQ | yes | GPT only | no | no | N/A |
diffusers | SD,… | no | Image generation | no | no | N/A |
vall-e-x | Vall-E | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
vllm | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
mamba | Mamba models architecture | yes | GPT | no | no | CPU/CUDA |
exllama2 | GPTQ | yes | GPT only | no | no | N/A |
transformers-musicgen | no | Audio generation | no | no | N/A | |
tinydream | stablediffusion | no | Image | no | no | N/A |
coqui | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
openvoice | Open voice | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
parler-tts | Open voice | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
rerankers | Reranking API | no | Reranking | no | no | CPU/CUDA |
transformers | Various GPTs and quantization formats | yes | GPT, embeddings | yes | yes**** | CPU/CUDA/XPU |
bark-cpp | bark | no | Audio-Only | no | no | yes |
stablediffusion-cpp | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A |
silero-vad with Golang bindings | Silero VAD | no | Voice Activity Detection | no | no | CPU |
Note: any backend name listed above can be used in the backend
field of the model configuration file (See the advanced section).
- * 7b ONLY
- ** doesn’t seem to be accurate
- *** 7b and 40b with the
ggccv
format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML - **** Only for CUDA and OpenVINO CPU/XPU acceleration.
Last updated 05 Dec 2024, 16:57 +0100 .