Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
llama.cppVicuna, Alpaca, LLaMa, Falcon, Starcoder, GPT-2, and many othersyesGPT and Functionsyes**yesCUDA, openCL, cuBLAS, Metal
gpt4all-llamaVicuna, Alpaca, LLaMayesGPTnoyesN/A
gpt4all-mptMPTyesGPTnoyesN/A
gpt4all-jGPT4ALL-JyesGPTnoyesN/A
falcon-ggml (binding)Falcon (*)yesGPTnonoN/A
dolly (binding)DollyyesGPTnonoN/A
gptj (binding)GPTJyesGPTnonoN/A
mpt (binding)MPTyesGPTnonoN/A
replit (binding)ReplityesGPTnonoN/A
gptneox (binding)GPT NeoX, RedPajama, StableLMyesGPTnonoN/A
bloomz (binding)BloomyesGPTnonoN/A
rwkv (binding)rwkvyesGPTnoyesN/A
bert (binding)bertnoEmbeddings onlyyesnoN/A
whisperwhispernoAudiononoN/A
stablediffusion (binding)stablediffusionnoImagenonoN/A
langchain-huggingfaceAny text generators available on HuggingFace through APIyesGPTnonoN/A
piper (binding)Any piper onnx modelnoText to voicenonoN/A
sentencetransformersBERTnoEmbeddings onlyyesnoN/A
barkbarknoAudio generationnonoyes
autogptqGPTQyesGPTyesnoN/A
exllamaGPTQyesGPT onlynonoN/A
diffusersSD,…noImage generationnonoN/A
vall-e-xVall-EnoAudio generation and Voice cloningnonoCPU/CUDA
vllmVarious GPTs and quantization formatsyesGPTnonoCPU/CUDA
exllama2GPTQyesGPT onlynonoN/A
transformers-musicgennoAudio generationnonoN/A
tinydreamstablediffusionnoImagenonoN/A
coquiCoquinoAudio generation and Voice cloningnonoCPU/CUDA
petalsVarious GPTs and quantization formatsyesGPTnonoCPU/CUDA
transformersVarious GPTs and quantization formatsyesGPT, embeddingsyesyes****CPU/CUDA/XPU

Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).

Last updated 06 May 2024, 10:52 +0200 . history