Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
llama.cppLLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many othersyesGPT and Functionsyes**yesCUDA, openCL, cuBLAS, Metal
llama.cpp’s ggml model (backward compatibility with old format, before GGUF) (binding)LLama, GPT-2, and many othersyesGPT and Functionsyes**yesCUDA, openCL, cuBLAS, Metal
whisperwhispernoAudiononoN/A
stablediffusion (binding)stablediffusionnoImagenonoN/A
langchain-huggingfaceAny text generators available on HuggingFace through APIyesGPTnonoN/A
piper (binding)Any piper onnx modelnoText to voicenonoN/A
sentencetransformersBERTnoEmbeddings onlyyesnoN/A
barkbarknoAudio generationnonoyes
autogptqGPTQyesGPTyesnoN/A
exllamaGPTQyesGPT onlynonoN/A
diffusersSD,…noImage generationnonoN/A
vall-e-xVall-EnoAudio generation and Voice cloningnonoCPU/CUDA
vllmVarious GPTs and quantization formatsyesGPTnonoCPU/CUDA
mambaMamba models architectureyesGPTnonoCPU/CUDA
exllama2GPTQyesGPT onlynonoN/A
transformers-musicgennoAudio generationnonoN/A
tinydreamstablediffusionnoImagenonoN/A
coquiCoquinoAudio generation and Voice cloningnonoCPU/CUDA
openvoiceOpen voicenoAudio generation and Voice cloningnonoCPU/CUDA
parler-ttsOpen voicenoAudio generation and Voice cloningnonoCPU/CUDA
rerankersReranking APInoRerankingnonoCPU/CUDA
transformersVarious GPTs and quantization formatsyesGPT, embeddingsyesyes****CPU/CUDA/XPU
bark-cppbarknoAudio-Onlynonoyes
stablediffusion-cppstablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMakernoImagenonoN/A
silero-vad with Golang bindingsSilero VADnoVoice Activity DetectionnonoCPU

Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).

Last updated 05 Dec 2024, 16:57 +0100 . history