Containers

LocalAI supports Docker, Podman, and other OCI-compatible container engines. This guide covers the common aspects of running LocalAI in containers.

Prerequisites

Before you begin, ensure you have a container engine installed:

Install Docker (Mac, Windows, Linux)
Install Podman (Linux, macOS, Windows WSL2)

Quick Start

The fastest way to get started is with the CPU image:

docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
# Or with Podman:
podman run -p 8080:8080 --name local-ai -ti localai/localai:latest

This will:

Start LocalAI (you’ll need to install models separately)
Make the API available at http://localhost:8080

Image Types

LocalAI provides several image types to suit different needs. These images work with both Docker and Podman.

Standard Images

Standard images don’t include pre-configured models. Use these if you want to configure models manually.

CPU Image

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest

GPU Images

NVIDIA CUDA 13:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-13

NVIDIA CUDA 12:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-12

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-gpu-hipblas

Intel GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-gpu-intel

Vulkan:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

NVIDIA Jetson (L4T ARM64):

CUDA 12 (for Nvidia AGX Orin and similar platforms):

docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64

CUDA 13 (for Nvidia DGX Spark):

docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

Using Compose

For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:

Using CDI (Container Device Interface) - Recommended for NVIDIA Container Toolkit 1.14+

The CDI approach is recommended for newer versions of the NVIDIA Container Toolkit (1.14 and later). It provides better compatibility and is the future-proof method:

version: "3.9"
services:
  api:
    image: localai/localai:latest-gpu-nvidia-cuda-12
    # For CUDA 13, use: localai/localai:latest-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=false
    volumes:
      - ./models:/models:cached
    # CDI driver configuration (recommended for NVIDIA Container Toolkit 1.14+)
    # This uses the nvidia.com/gpu resource API
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia.com/gpu
              count: all
              capabilities: [gpu]

Save this as compose.yaml and run:

docker compose up -d
# Or with Podman:
podman-compose up -d

Using Legacy NVIDIA Driver - For Older NVIDIA Container Toolkit

If you are using an older version of the NVIDIA Container Toolkit (before 1.14), or need backward compatibility, use the legacy approach:

version: "3.9"
services:
  api:
    image: localai/localai:latest-gpu-nvidia-cuda-12
    # For CUDA 13, use: localai/localai:latest-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=false
    volumes:
      - ./models:/models:cached
    # Legacy NVIDIA driver configuration (for older NVIDIA Container Toolkit)
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Persistent Storage

The container exposes the following volumes:

Volume	Description	CLI Flag	Environment Variable
`/models`	Model files used for inferencing	`--models-path`	`$LOCALAI_MODELS_PATH`
`/backends`	Custom backends for inferencing	`--backends-path`	`$LOCALAI_BACKENDS_PATH`
`/configuration`	Dynamic config files (api_keys.json, external_backends.json, runtime_settings.json)	`--localai-config-dir`	`$LOCALAI_CONFIG_DIR`
`/data`	Persistent data (collections, agent state, tasks, jobs)	`--data-path`	`$LOCALAI_DATA_PATH`

To persist models and data, mount volumes:

docker run -ti --name local-ai -p 8080:8080 \
  -v $PWD/models:/models \
  -v $PWD/data:/data \
  localai/localai:latest
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 \
  -v $PWD/models:/models \
  -v $PWD/data:/data \
  localai/localai:latest

Or use named volumes:

docker volume create localai-models
docker volume create localai-data
docker run -ti --name local-ai -p 8080:8080 \
  -v localai-models:/models \
  -v localai-data:/data \
  localai/localai:latest
# Or with Podman:
podman volume create localai-models
podman volume create localai-data
podman run -ti --name local-ai -p 8080:8080 \
  -v localai-models:/models \
  -v localai-data:/data \
  localai/localai:latest

Next Steps

After installation:

Access the WebUI at http://localhost:8080
Check available models: curl http://localhost:8080/v1/models
Install additional models
Try out examples

Troubleshooting

Container won’t start

Check container engine is running: docker ps or podman ps
Check port 8080 is available: netstat -an | grep 8080 (Linux/Mac)
View logs: docker logs local-ai or podman logs local-ai

GPU not detected

Ensure Docker has GPU access: docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
For Podman, see the Podman installation guide
For NVIDIA: Install NVIDIA Container Toolkit
For AMD: Ensure devices are accessible: ls -la /dev/kfd /dev/dri

NVIDIA Container fails to start with “Auto-detected mode as ’legacy’” error

If you encounter this error:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: invalid expression

This indicates a Docker/NVIDIA Container Toolkit configuration issue. The container runtime’s prestart hook fails before LocalAI starts. This is not a LocalAI code bug.

Solutions:

Use CDI mode (recommended): Update your docker-compose.yaml to use the CDI driver configuration:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia.com/gpu
          count: all
          capabilities: [gpu]

Upgrade NVIDIA Container Toolkit: Ensure you have version 1.14 or later, which has better CDI support.
Check NVIDIA Container Toolkit configuration: Run nvidia-container-cli --query-gpu to verify your installation is working correctly outside of containers.
Verify Docker GPU access: Test with docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

Models not downloading

Check internet connection
Verify disk space: df -h
Check container logs for errors: docker logs local-ai or podman logs local-ai