On this page
article
Model compatibility table
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
llama.cpp | Vicuna, Alpaca, LLaMa, Falcon, Starcoder, GPT-2, and many others | yes | GPT and Functions | yes** | yes | CUDA, openCL, cuBLAS, Metal |
gpt4all-llama | Vicuna, Alpaca, LLaMa | yes | GPT | no | yes | N/A |
gpt4all-mpt | MPT | yes | GPT | no | yes | N/A |
gpt4all-j | GPT4ALL-J | yes | GPT | no | yes | N/A |
falcon-ggml (binding) | Falcon (*) | yes | GPT | no | no | N/A |
dolly (binding) | Dolly | yes | GPT | no | no | N/A |
gptj (binding) | GPTJ | yes | GPT | no | no | N/A |
mpt (binding) | MPT | yes | GPT | no | no | N/A |
replit (binding) | Replit | yes | GPT | no | no | N/A |
gptneox (binding) | GPT NeoX, RedPajama, StableLM | yes | GPT | no | no | N/A |
bloomz (binding) | Bloom | yes | GPT | no | no | N/A |
rwkv (binding) | rwkv | yes | GPT | no | yes | N/A |
bert (binding) | bert | no | Embeddings only | yes | no | N/A |
whisper | whisper | no | Audio | no | no | N/A |
stablediffusion (binding) | stablediffusion | no | Image | no | no | N/A |
langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
piper (binding) | Any piper onnx model | no | Text to voice | no | no | N/A |
sentencetransformers | BERT | no | Embeddings only | yes | no | N/A |
bark | bark | no | Audio generation | no | no | yes |
autogptq | GPTQ | yes | GPT | yes | no | N/A |
exllama | GPTQ | yes | GPT only | no | no | N/A |
diffusers | SD,… | no | Image generation | no | no | N/A |
vall-e-x | Vall-E | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
vllm | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
exllama2 | GPTQ | yes | GPT only | no | no | N/A |
transformers-musicgen | no | Audio generation | no | no | N/A | |
tinydream | stablediffusion | no | Image | no | no | N/A |
coqui | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
transformers | Various GPTs and quantization formats | yes | GPT, embeddings | yes | yes**** | CPU/CUDA/XPU |
Note: any backend name listed above can be used in the backend
field of the model configuration file (See the advanced section).
- * 7b ONLY
- ** doesn’t seem to be accurate
- *** 7b and 40b with the
ggccv
format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML - **** Only for CUDA and OpenVINO CPU/XPU acceleration.
Last updated 20 Aug 2024, 10:01 +0200 .