LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.

LocalAI is available as a container image and binary, compatible with various container engines like Docker, Podman, and Kubernetes. Container images are published on quay.io and Docker Hub. Binaries can be downloaded from GitHub.


Before you begin, ensure you have a container engine installed if you are not using the binaries. Suitable options include Docker or Podman. For installation instructions, refer to the following guides:

Running LocalAI with All-in-One (AIO) Images

Do you have already a model file? Skip to Run models manually or Run other models to use an already-configured model.

LocalAI’s All-in-One (AIO) images are pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset.

These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and requires no configuration.

It suggested to use the AIO images if you don’t want to configure the models to run on LocalAI. If you want to run specific models, you can use the manual method.

The AIO Images comes pre-configured with the following features:

  • Text to Speech (TTS)
  • Speech to Text
  • Function calling
  • Large Language Models (LLM) for text generation
  • Image generation
  • Embedding server

Start the image with Docker:

  docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
# For Nvidia GPUs:
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-11
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12

Or with a docker-compose file:

  version: "3.9"
    image: localai/localai:latest-aio-cpu
    # For a specific version:
    # image: localai/localai:v2.12.4-aio-cpu
    # For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
    # image: localai/localai:v2.12.4-aio-gpu-nvidia-cuda-11
    # image: localai/localai:v2.12.4-aio-gpu-nvidia-cuda-12
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
      - 8080:8080
      - DEBUG=true
      # ...
      - ./models:/build/models:cached
    # decomment the following piece if running with Nvidia GPUs
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

For a list of all the container-images available, see Container images. To learn more about All-in-one images instead, see All-in-one Images.

Try it out

LocalAI does not ship a webui by default, however you can use 3rd party projects to interact with it (see also Integrations ). However, you can test out the API endpoints using curl, you can find few examples below.

Text Generation

Creates a model response for the given chat conversation. OpenAI documentation.

  curl http://localhost:8080/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }' 

GPT Vision

Understand images.

  curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{ 
        "model": "gpt-4-vision-preview", 
        "messages": [
            "role": "user", "content": [
              {"type":"text", "text": "What is in the image?"},
                "type": "image_url", 
                "image_url": {
                  "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" 
          "temperature": 0.9

Function calling

Call functions

  curl https://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
        "role": "user",
        "content": "What is the weather like in Boston?"
    "tools": [
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
            "required": ["location"]
    "tool_choice": "auto"

Image Generation

Creates an image given a prompt. OpenAI documentation.

  curl http://localhost:8080/v1/images/generations \
      -H "Content-Type: application/json" -d '{
          "prompt": "A cute baby sea otter",
          "size": "256x256"

Text to speech

Generates audio from the input text. OpenAI documentation.

  curl http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Audio Transcription

Transcribes audio into the input language. OpenAI Documentation.

Download first a sample to transcribe:

  wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg 

Send the example audio file to the transcriptions endpoint :

  curl http://localhost:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@$PWD/gb1.ogg" -F model="whisper-1"

Embeddings Generation

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. OpenAI Embeddings.

  curl http://localhost:8080/embeddings \
    -X POST -H "Content-Type: application/json" \
    -d '{ 
        "input": "Your text string goes here", 
        "model": "text-embedding-ada-002"

What’s next?

There is much more to explore! run any model from huggingface, video generation, and voice cloning with LocalAI, check out the features section for a full overview.

Explore further resources and community contributions:

Last updated 07 Apr 2024, 11:06 +0200 . history