Backend Monitor

LocalAI provides endpoints to monitor and manage running backends. The /backend/monitor endpoint reports the status and resource usage of loaded models, and /backend/shutdown allows stopping a model’s backend process.

Monitor API

  • Method: GET
  • Endpoints: /backend/monitor, /v1/backend/monitor

Request

The request body is JSON:

ParameterTypeRequiredDescription
modelstringYesName of the model to monitor

Response

Returns a JSON object with the backend status:

FieldTypeDescription
stateintBackend state: 0 = uninitialized, 1 = busy, 2 = ready, -1 = error
memoryobjectMemory usage information
memory.totaluint64Total memory usage in bytes
memory.breakdownobjectPer-component memory breakdown (key-value pairs)

If the gRPC status call fails, the endpoint falls back to local process metrics:

FieldTypeDescription
memory_infoobjectProcess memory info (RSS, VMS)
memory_percentfloatMemory usage percentage
cpu_percentfloatCPU usage percentage

Usage

curl http://localhost:8080/backend/monitor \
  -H "Content-Type: application/json" \
  -d '{"model": "my-model"}'

Example response

{
  "state": 2,
  "memory": {
    "total": 1073741824,
    "breakdown": {
      "weights": 536870912,
      "kv_cache": 268435456
    }
  }
}

Shutdown API

  • Method: POST
  • Endpoints: /backend/shutdown, /v1/backend/shutdown

Request

ParameterTypeRequiredDescription
modelstringYesName of the model to shut down

Usage

curl -X POST http://localhost:8080/backend/shutdown \
  -H "Content-Type: application/json" \
  -d '{"model": "my-model"}'

Response

Returns 200 OK with the shutdown confirmation message on success.

Error Responses

Status CodeDescription
400Invalid or missing model name
500Backend error or model not loaded