API Discovery & Instructions

LocalAI exposes a set of discovery endpoints that let external agents, coding assistants, and automation tools programmatically learn what the instance can do and how to control it — without reading documentation ahead of time.

Quick start

# 1. Discover what's available
curl http://localhost:8080/.well-known/localai.json

# 2. Browse instruction areas
curl http://localhost:8080/api/instructions

# 3. Get an API guide for a specific instruction
curl http://localhost:8080/api/instructions/config-management

Well-Known Discovery Endpoint

GET /.well-known/localai.json

Returns the instance version, all available endpoint URLs (flat and categorized), and runtime capabilities.

Example response (abbreviated):

{
  "version": "v2.28.0",
  "endpoints": {
    "chat_completions": "/v1/chat/completions",
    "models": "/v1/models",
    "config_metadata": "/api/models/config-metadata",
    "instructions": "/api/instructions",
    "swagger": "/swagger/index.html"
  },
  "endpoint_groups": {
    "openai_compatible": { "chat_completions": "/v1/chat/completions", "..." : "..." },
    "config_management": { "config_metadata": "/api/models/config-metadata", "..." : "..." },
    "model_management": { "..." : "..." },
    "monitoring": { "..." : "..." }
  },
  "capabilities": {
    "config_metadata": true,
    "config_patch": true,
    "vram_estimate": true,
    "mcp": true,
    "agents": false,
    "p2p": false
  }
}

The capabilities object reflects the current runtime configuration — for example, mcp is only true if MCP is enabled, and agents is true only if the agent pool is running.

Instructions API

Instructions are curated groups of related API endpoints. Each instruction maps to one or more Swagger tags and provides a focused, LLM-readable guide.

List all instructions

GET /api/instructions

curl http://localhost:8080/api/instructions

Returns a compact list of instruction areas:

{
  "instructions": [
    {
      "name": "chat-inference",
      "description": "OpenAI-compatible chat completions, text completions, and embeddings",
      "tags": ["inference", "embeddings"],
      "url": "/api/instructions/chat-inference"
    },
    {
      "name": "config-management",
      "description": "Discover, read, and modify model configuration fields with VRAM estimation",
      "tags": ["config"],
      "url": "/api/instructions/config-management"
    }
  ],
  "hint": "Fetch GET {url} for a markdown API guide. Add ?format=json for a raw OpenAPI fragment."
}

Available instructions:

InstructionDescription
chat-inferenceChat completions, text completions, embeddings (OpenAI-compatible)
audioText-to-speech, transcription, voice activity detection, sound generation
imagesImage generation and inpainting
model-managementBrowse gallery, install, delete, manage models and backends
config-managementDiscover, read, and modify model config fields with VRAM estimation
monitoringSystem metrics, backend status, system information
mcpModel Context Protocol — tool-augmented chat with MCP servers
agentsAgent task and job management
videoVideo generation from text prompts

Get an instruction guide

GET /api/instructions/:name

By default, returns a markdown guide suitable for LLMs and humans:

curl http://localhost:8080/api/instructions/config-management

Add ?format=json to get a raw OpenAPI fragment (filtered Swagger spec with only the relevant paths and definitions):

curl http://localhost:8080/api/instructions/config-management?format=json

Configuration Management APIs

These endpoints let agents discover model configuration fields, read current settings, modify them, and estimate VRAM usage.

Config metadata

GET /api/models/config-metadata

Returns structured metadata for all model configuration fields, organized by section. Each field includes its YAML path, Go type, UI type, label, description, default value, validation constraints, and available options.

# All fields
curl http://localhost:8080/api/models/config-metadata

# Filter by section
curl http://localhost:8080/api/models/config-metadata?section=parameters

Autocomplete values

GET /api/models/config-metadata/autocomplete/:provider

Returns runtime values for dynamic fields. Providers include backends, models, models:chat, models:tts, models:transcript, models:vad.

# List available backends
curl http://localhost:8080/api/models/config-metadata/autocomplete/backends

# List chat-capable models
curl http://localhost:8080/api/models/config-metadata/autocomplete/models:chat

Read model config

GET /api/models/config-json/:name

Returns the full model configuration as JSON:

curl http://localhost:8080/api/models/config-json/my-model

Update model config

PATCH /api/models/config-json/:name

Deep-merges a JSON patch into the existing model configuration. Only include the fields you want to change:

curl -X PATCH http://localhost:8080/api/models/config-json/my-model \
  -H "Content-Type: application/json" \
  -d '{"context_size": 16384, "gpu_layers": 40}'

The endpoint validates the merged config and writes it to disk as YAML.

Details

Config management endpoints require admin authentication when API keys are configured. The discovery and instructions endpoints are unauthenticated.

VRAM estimation

POST /api/models/vram-estimate

Estimates VRAM usage for an installed model based on its weight files, context size, and GPU layer offloading:

curl -X POST http://localhost:8080/api/models/vram-estimate \
  -H "Content-Type: application/json" \
  -d '{"model": "my-model", "context_size": 8192}'
{
  "sizeBytes": 4368438272,
  "sizeDisplay": "4.4 GB",
  "vramBytes": 6123456789,
  "vramDisplay": "6.1 GB",
  "context_note": "Estimate used default context_size=8192. The model's trained maximum context is 131072; VRAM usage will be higher at larger context sizes.",
  "model_max_context": 131072
}

Optional parameters: gpu_layers (number of layers to offload, 0 = all), kv_quant_bits (KV cache quantization, 0 = fp16).

Integration guide

A recommended workflow for agent/tool builders:

  1. Discover: Fetch /.well-known/localai.json to learn available endpoints and capabilities
  2. Browse instructions: Fetch /api/instructions for an overview of instruction areas
  3. Deep dive: Fetch /api/instructions/:name for a markdown API guide on a specific area
  4. Explore config: Use /api/models/config-metadata to understand configuration fields
  5. Interact: Use the standard OpenAI-compatible endpoints for inference, and the config management endpoints for runtime tuning

Swagger UI

The full interactive API documentation is available at /swagger/index.html. All annotated endpoints can be explored and tested directly from the browser.