Model compatibility

LocalAI is compatible with the models supported by llama.cpp supports also GPT4ALL-J and cerebras-GPT with ggml.


LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.

Hardware requirements

Depending on the model you are attempting to run might need more RAM or CPU resources. Check out also here for gguf based backends. rwkv is less expensive on resources.

Model compatibility table

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.

Backend and Bindings Compatible models Completion/Chat endpoint Capability Embeddings support Token stream support Acceleration
llama.cpp Vicuna, Alpaca, LLaMa yes GPT and Functions yes** yes CUDA, openCL, cuBLAS, Metal
gpt4all-llama Vicuna, Alpaca, LLaMa yes GPT no yes N/A
gpt4all-mpt MPT yes GPT no yes N/A
gpt4all-j GPT4ALL-J yes GPT no yes N/A
falcon-ggml (binding) Falcon (*) yes GPT no no N/A
gpt2 (binding) GPT2, Cerebras yes GPT no no N/A
dolly (binding) Dolly yes GPT no no N/A
gptj (binding) GPTJ yes GPT no no N/A
mpt (binding) MPT yes GPT no no N/A
replit (binding) Replit yes GPT no no N/A
gptneox (binding) GPT NeoX, RedPajama, StableLM yes GPT no no N/A
starcoder (binding) Starcoder yes GPT no no N/A
bloomz (binding) Bloom yes GPT no no N/A
rwkv (binding) rwkv yes GPT no yes N/A
bert (binding) bert no Embeddings only yes no N/A
whisper whisper no Audio no no N/A
stablediffusion (binding) stablediffusion no Image no no N/A
langchain-huggingface Any text generators available on HuggingFace through API yes GPT no no N/A
piper (binding) Any piper onnx model no Text to voice no no N/A
falcon (binding) Falcon *** yes GPT no yes CUDA
huggingface-embeddings sentence-transformers BERT no Embeddings only yes no N/A
bark bark no Audio generation no no yes
AutoGPTQ GPTQ yes GPT yes no N/A
exllama GPTQ yes GPT only no no N/A
diffusers SD,… no Image generation no no N/A
vall-e-x Vall-E no Audio generation and Voice cloning no no CPU/CUDA
vllm Various GPTs and quantization formats yes GPT no no CPU/CUDA

Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).

Tested with:

Note: You might need to convert some models from older models to the new format, for indications, see the README in llama.cpp for instance to run gpt4all.