Distribution

LocalAI supports distributing inference workloads across multiple machines. There are two approaches, each suited to different use cases:

Distributed Mode (PostgreSQL + NATS)

Production-grade horizontal scaling with centralized management. Frontends are stateless LocalAI instances behind a load balancer; workers self-register and receive backends dynamically via NATS. State lives in PostgreSQL.

Best for: production deployments, Kubernetes, managed infrastructure.

Read more

P2P / Federated Inference

Peer-to-peer networking via libp2p. Share a token to form a cluster with automatic discovery — no central server required. Supports federated load balancing and worker-mode weight sharding.

Best for: ad-hoc clusters, community sharing, quick experimentation.

Read more

Quick Comparison

P2P / FederationDistributed Mode
DiscoveryAutomatic via libp2p tokenSelf-registration to frontend URL
State storageIn-memory / ledgerPostgreSQL
CoordinationGossip protocolNATS messaging
Node managementAutomaticREST API + WebUI
Setup complexityMinimal (share a token)Requires PostgreSQL + NATS