Fine-Tuning

LocalAI supports fine-tuning LLMs directly through the API and Web UI. Fine-tuning is powered by pluggable backends that implement a generic gRPC interface, allowing support for different training frameworks and model types.

Supported Backends

BackendDomainGPU RequiredTraining MethodsAdapter Types
trlLLM fine-tuningNo (CPU or GPU)SFT, DPO, GRPO, RLOO, Reward, KTO, ORPOLoRA, Full

Enabling Fine-Tuning

Fine-tuning is disabled by default. Enable it with:

LOCALAI_ENABLE_FINETUNING=true local-ai

When authentication is enabled, fine-tuning is a per-user feature (default OFF). Admins can enable it for specific users via the user management API.

Quick Start

1. Start a fine-tuning job

curl -X POST http://localhost:8080/api/fine-tuning/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "backend": "trl",
    "training_method": "sft",
    "training_type": "lora",
    "dataset_source": "yahma/alpaca-cleaned",
    "num_epochs": 1,
    "batch_size": 2,
    "learning_rate": 0.0002,
    "adapter_rank": 16,
    "adapter_alpha": 16,
    "extra_options": {
      "max_seq_length": "512"
    }
  }'

2. Monitor progress (SSE stream)

curl -N http://localhost:8080/api/fine-tuning/jobs/{job_id}/progress

3. List checkpoints

curl http://localhost:8080/api/fine-tuning/jobs/{job_id}/checkpoints

4. Export model

curl -X POST http://localhost:8080/api/fine-tuning/jobs/{job_id}/export \
  -H "Content-Type: application/json" \
  -d '{
    "export_format": "gguf",
    "quantization_method": "q4_k_m",
    "output_path": "/models/my-finetuned-model"
  }'

API Reference

Endpoints

MethodPathDescription
POST/api/fine-tuning/jobsStart a fine-tuning job
GET/api/fine-tuning/jobsList all jobs
GET/api/fine-tuning/jobs/:idGet job details
DELETE/api/fine-tuning/jobs/:idStop a running job
GET/api/fine-tuning/jobs/:id/progressSSE progress stream
GET/api/fine-tuning/jobs/:id/checkpointsList checkpoints
POST/api/fine-tuning/jobs/:id/exportExport model
POST/api/fine-tuning/datasetsUpload dataset file

Job Request Fields

FieldTypeDescription
modelstringHuggingFace model ID or local path (required)
backendstringBackend name (default: trl)
training_methodstringsft, dpo, grpo, rloo, reward, kto, orpo
training_typestringlora or full
dataset_sourcestringHuggingFace dataset ID or local file path (required)
adapter_rankintLoRA rank (default: 16)
adapter_alphaintLoRA alpha (default: 16)
num_epochsintNumber of training epochs (default: 3)
batch_sizeintPer-device batch size (default: 2)
learning_ratefloatLearning rate (default: 2e-4)
gradient_accumulation_stepsintGradient accumulation (default: 4)
warmup_stepsintWarmup steps (default: 5)
optimizerstringadamw_torch, adamw_8bit, sgd, adafactor, prodigy
extra_optionsmapBackend-specific options (see below)

Backend-Specific Options (extra_options)

TRL

KeyDescriptionDefault
max_seq_lengthMaximum sequence length512
packingEnable sequence packingfalse
trust_remote_codeTrust remote code in modelfalse
load_in_4bitEnable 4-bit quantization (GPU only)false

DPO-specific (training_method=dpo)

KeyDescriptionDefault
betaKL penalty coefficient0.1
loss_typeLoss type: sigmoid, hinge, iposigmoid
max_lengthMaximum sequence length512

GRPO-specific (training_method=grpo)

KeyDescriptionDefault
num_generationsNumber of generations per prompt4
max_completion_lengthMax completion token length256

GRPO Reward Functions

GRPO training requires reward functions to evaluate model completions. Specify them via the reward_functions field (a typed array) or via extra_options["reward_funcs"] (a JSON string).

Built-in Reward Functions

NameDescriptionParameters
format_rewardChecks <think>...</think> then answer format (1.0/0.0)
reasoning_accuracy_rewardExtracts <answer> content, compares to dataset’s answer column
length_rewardScore based on proximity to target length [0, 1]target_length (default: 200)
xml_tag_rewardScores properly opened/closed <think> and <answer> tags
no_repetition_rewardPenalizes n-gram repetition [0, 1]
code_execution_rewardChecks Python code block syntax validity (1.0/0.0)

Inline Custom Reward Functions

You can provide custom reward function code as a Python function body. The function receives completions (list of strings) and **kwargs, and must return list[float].

Security restrictions for inline code:

  • Allowed builtins: len, int, float, str, list, dict, range, enumerate, zip, map, filter, sorted, min, max, sum, abs, round, any, all, isinstance, print, True, False, None
  • Available modules: re, math, json, string
  • Blocked: open, __import__, exec, eval, compile, os, subprocess, getattr, setattr, delattr, globals, locals
  • Functions are compiled and validated at job start (fail-fast on syntax errors)

Example API Request

curl -X POST http://localhost:8080/api/fine-tuning/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-1.5B-Instruct",
    "backend": "trl",
    "training_method": "grpo",
    "training_type": "lora",
    "dataset_source": "my-reasoning-dataset",
    "num_epochs": 1,
    "batch_size": 2,
    "learning_rate": 5e-6,
    "reward_functions": [
      {"type": "builtin", "name": "reasoning_accuracy_reward"},
      {"type": "builtin", "name": "format_reward"},
      {"type": "builtin", "name": "length_reward", "params": {"target_length": "200"}},
      {"type": "inline", "name": "think_presence", "code": "return [1.0 if \"<think>\" in c else 0.0 for c in completions]"}
    ],
    "extra_options": {
      "num_generations": "4",
      "max_completion_length": "256"
    }
  }'

Export Formats

FormatDescriptionNotes
loraLoRA adapter filesSmallest, requires base model
merged_16bitFull model in 16-bitLarge but standalone
merged_4bitFull model in 4-bitSmaller, standalone
ggufGGUF formatFor llama.cpp, requires quantization_method

GGUF Quantization Methods

q4_k_m, q5_k_m, q8_0, f16, q4_0, q5_0

Web UI

When fine-tuning is enabled, a “Fine-Tune” page appears in the sidebar under the Agents section. The UI provides:

  1. Job Configuration — Select backend, model, training method, adapter type, and hyperparameters
  2. Dataset Upload — Upload local datasets or reference HuggingFace datasets
  3. Training Monitor — Real-time loss chart, progress bar, metrics display
  4. Export — Export trained models in various formats

Dataset Formats

Datasets should follow standard HuggingFace formats:

  • SFT: Alpaca format (instruction, input, output fields) or ChatML/ShareGPT
  • DPO: Preference pairs (prompt, chosen, rejected fields)
  • GRPO: Prompts with reward signals

Supported file formats: .json, .jsonl, .csv

Architecture

Fine-tuning uses the same gRPC backend architecture as inference:

  1. Proto layer: FineTuneRequest, FineTuneProgress (streaming), StopFineTune, ListCheckpoints, ExportModel
  2. Python backends: Each backend implements the gRPC interface with its specific training framework
  3. Go service: Manages job lifecycle, routes API requests to backends
  4. REST API: HTTP endpoints with SSE progress streaming
  5. React UI: Configuration form, real-time training monitor, export panel