API Discovery & Instructions

LocalAI exposes a set of discovery endpoints that let external agents, coding assistants, and automation tools programmatically learn what the instance can do and how to control it - without reading documentation ahead of time.

Quick start

# 1. Discover what's available
curl http://localhost:8080/.well-known/localai.json

# 2. Browse instruction areas
curl http://localhost:8080/api/instructions

# 3. Get an API guide for a specific instruction
curl http://localhost:8080/api/instructions/config-management

Well-Known Discovery Endpoint

GET /.well-known/localai.json

Returns the instance version, all available endpoint URLs (flat and categorized), and runtime capabilities.

Example response (abbreviated):

{
  "version": "v2.28.0",
  "endpoints": {
    "chat_completions": "/v1/chat/completions",
    "models": "/v1/models",
    "models_capabilities": "/v1/models/capabilities",
    "config_metadata": "/api/models/config-metadata",
    "instructions": "/api/instructions",
    "swagger": "/swagger/index.html"
  },
  "endpoint_groups": {
    "openai_compatible": { "chat_completions": "/v1/chat/completions", "..." : "..." },
    "config_management": { "config_metadata": "/api/models/config-metadata", "..." : "..." },
    "model_management": { "..." : "..." },
    "monitoring": { "..." : "..." }
  },
  "capabilities": {
    "config_metadata": true,
    "config_patch": true,
    "vram_estimate": true,
    "mcp": true,
    "agents": false,
    "p2p": false
  }
}

The capabilities object reflects the current runtime configuration - for example, mcp is only true if MCP is enabled, and agents is true only if the agent pool is running.

Instructions API

Instructions are curated groups of related API endpoints. Each instruction maps to one or more Swagger tags and provides a focused, LLM-readable guide.

List all instructions

GET /api/instructions

curl http://localhost:8080/api/instructions

Returns a compact list of instruction areas:

{
  "instructions": [
    {
      "name": "chat-inference",
      "description": "OpenAI-compatible chat completions, text completions, and embeddings",
      "tags": ["inference", "embeddings"],
      "url": "/api/instructions/chat-inference"
    },
    {
      "name": "config-management",
      "description": "Discover, read, and modify model configuration fields with VRAM estimation",
      "tags": ["config"],
      "url": "/api/instructions/config-management"
    }
  ],
  "hint": "Fetch GET {url} for a markdown API guide. Add ?format=json for a raw OpenAPI fragment."
}

Available instructions:

Instruction	Description
`chat-inference`	Chat completions, text completions, embeddings (OpenAI-compatible)
`audio`	Text-to-speech, transcription, voice activity detection, sound generation
`images`	Image generation and inpainting
`model-management`	Browse gallery, install, delete, manage models and backends
`config-management`	Discover, read, and modify model config fields with VRAM estimation
`monitoring`	System metrics, backend status, system information
`mcp`	Model Context Protocol - tool-augmented chat with MCP servers
`agents`	Agent task and job management
`video`	Video generation from text prompts

Get an instruction guide

GET /api/instructions/:name

By default, returns a markdown guide suitable for LLMs and humans:

curl http://localhost:8080/api/instructions/config-management

Add ?format=json to get a raw OpenAPI fragment (filtered Swagger spec with only the relevant paths and definitions):

curl http://localhost:8080/api/instructions/config-management?format=json

Model Capabilities

GET /v1/models/capabilities

An additive, LocalAI-specific superset of /v1/models. It returns the same set of models but enriches each entry with the capabilities the model supports and the input/output modalities it accepts and produces. Use it to decide, before sending a request, whether a given model can take an image, audio, or video attachment directly - or whether the input needs converting/transcribing first.

Because it is purely additive, clients that only understand /v1/models keep working unchanged; they simply never call this route.

curl http://localhost:8080/v1/models/capabilities

{
  "object": "list",
  "data": [
    {
      "id": "qwen2.5-omni",
      "object": "model",
      "capabilities": ["chat", "vision", "tools"],
      "input_modalities": ["text", "image", "audio"],
      "output_modalities": ["text"]
    },
    {
      "id": "parakeet",
      "object": "model",
      "capabilities": ["transcript"],
      "input_modalities": ["audio"],
      "output_modalities": ["text"]
    }
  ]
}

capabilities - canonical usecase strings (e.g. chat, vision, transcript, tts, embeddings, image, video) plus the modifiers tools and thinking.
input_modalities / output_modalities - the subsets of {text, image, audio, video} the model accepts and produces. LocalAI combines usecase-based inference, backend settings such as vLLM limit_mm_per_prompt, and explicit model-level known_input_modalities / known_output_modalities. The explicit fields cover distinctions a usecase cannot express, such as a video model that also accepts speech.

The same query parameters as /v1/models are honored (filter, excludeConfigured), and the same per-user model allowlist is applied when authentication is enabled.

Configuration Management APIs

These endpoints let agents discover model configuration fields, read current settings, modify them, and estimate VRAM usage.

Config metadata

GET /api/models/config-metadata

Returns structured metadata for all model configuration fields, organized by section. Each field includes its YAML path, Go type, UI type, label, description, default value, validation constraints, and available options.

# All fields
curl http://localhost:8080/api/models/config-metadata

# Filter by section
curl http://localhost:8080/api/models/config-metadata?section=parameters

Autocomplete values

GET /api/models/config-metadata/autocomplete/:provider

Returns runtime values for dynamic fields. Providers include backends, models, models:chat, models:tts, models:transcript, models:vad.

# List available backends
curl http://localhost:8080/api/models/config-metadata/autocomplete/backends

# List chat-capable models
curl http://localhost:8080/api/models/config-metadata/autocomplete/models:chat

Read model config

GET /api/models/config-json/:name

Returns the full model configuration as JSON:

curl http://localhost:8080/api/models/config-json/my-model

Update model config

PATCH /api/models/config-json/:name

Deep-merges a JSON patch into the existing model configuration. Only include the fields you want to change:

curl -X PATCH http://localhost:8080/api/models/config-json/my-model \
  -H "Content-Type: application/json" \
  -d '{"context_size": 16384, "gpu_layers": 40}'

The endpoint validates the merged config and writes it to disk as YAML.

Config management endpoints require admin authentication when API keys are configured. The discovery and instructions endpoints are unauthenticated.

VRAM estimation

POST /api/models/vram-estimate

Estimates VRAM usage for an installed model based on its weight files, context size, and GPU layer offloading:

curl -X POST http://localhost:8080/api/models/vram-estimate \
  -H "Content-Type: application/json" \
  -d '{"model": "my-model", "context_size": 8192}'

{
  "sizeBytes": 4368438272,
  "sizeDisplay": "4.4 GB",
  "vramBytes": 6123456789,
  "vramDisplay": "6.1 GB",
  "context_note": "Estimate used default context_size=8192. The model's trained maximum context is 131072; VRAM usage will be higher at larger context sizes.",
  "model_max_context": 131072
}

Optional parameters: gpu_layers (number of layers to offload, 0 = all), kv_quant_bits (KV cache quantization, 0 = fp16).

Integration guide

A recommended workflow for agent/tool builders:

Discover: Fetch /.well-known/localai.json to learn available endpoints and capabilities
Browse instructions: Fetch /api/instructions for an overview of instruction areas
Deep dive: Fetch /api/instructions/:name for a markdown API guide on a specific area
Explore config: Use /api/models/config-metadata to understand configuration fields
Interact: Use the standard OpenAI-compatible endpoints for inference, and the config management endpoints for runtime tuning

Swagger UI

The full interactive API documentation is available at /swagger/index.html. All annotated endpoints can be explored and tested directly from the browser.