Fine-Tuning

LocalAI supports fine-tuning LLMs directly through the API and Web UI. Fine-tuning is powered by pluggable backends that implement a generic gRPC interface, allowing support for different training frameworks and model types.

Supported Backends

BackendDomainGPU RequiredTraining MethodsAdapter Types
trlLLM fine-tuningNo (CPU or GPU)SFT, DPO, GRPO, RLOO, Reward, KTO, ORPOLoRA, Full

Availability

Fine-tuning is always enabled. When authentication is enabled, fine-tuning is a per-user feature (default OFF). Admins can enable it for specific users via the user management API.

Note

This feature is experimental and may change in future releases.

Quick Start

1. Start a fine-tuning job

curl -X POST http://localhost:8080/api/fine-tuning/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "backend": "trl",
    "training_method": "sft",
    "training_type": "lora",
    "dataset_source": "yahma/alpaca-cleaned",
    "num_epochs": 1,
    "batch_size": 2,
    "learning_rate": 0.0002,
    "adapter_rank": 16,
    "adapter_alpha": 16,
    "extra_options": {
      "max_seq_length": "512"
    }
  }'

2. Monitor progress (SSE stream)

curl -N http://localhost:8080/api/fine-tuning/jobs/{job_id}/progress

3. List checkpoints

curl http://localhost:8080/api/fine-tuning/jobs/{job_id}/checkpoints

4. Export model

curl -X POST http://localhost:8080/api/fine-tuning/jobs/{job_id}/export \
  -H "Content-Type: application/json" \
  -d '{
    "export_format": "gguf",
    "quantization_method": "q4_k_m",
    "output_path": "/models/my-finetuned-model"
  }'

API Reference

Endpoints

MethodPathDescription
POST/api/fine-tuning/jobsStart a fine-tuning job
GET/api/fine-tuning/jobsList all jobs
GET/api/fine-tuning/jobs/:idGet job details
DELETE/api/fine-tuning/jobs/:idStop a running job
GET/api/fine-tuning/jobs/:id/progressSSE progress stream
GET/api/fine-tuning/jobs/:id/checkpointsList checkpoints
POST/api/fine-tuning/jobs/:id/exportExport model
POST/api/fine-tuning/datasetsUpload dataset file

Job Request Fields

FieldTypeDescription
modelstringHuggingFace model ID or local path (required)
backendstringBackend name (default: trl)
training_methodstringsft, dpo, grpo, rloo, reward, kto, orpo
training_typestringlora or full
dataset_sourcestringHuggingFace dataset ID or local file path (required)
adapter_rankintLoRA rank (default: 16)
adapter_alphaintLoRA alpha (default: 16)
num_epochsintNumber of training epochs (default: 3)
batch_sizeintPer-device batch size (default: 2)
learning_ratefloatLearning rate (default: 2e-4)
gradient_accumulation_stepsintGradient accumulation (default: 4)
warmup_stepsintWarmup steps (default: 5)
optimizerstringadamw_torch, adamw_8bit, sgd, adafactor, prodigy
extra_optionsmapBackend-specific options (see below)

Backend-Specific Options (extra_options)

TRL

KeyDescriptionDefault
max_seq_lengthMaximum sequence length512
packingEnable sequence packingfalse
trust_remote_codeTrust remote code in modelfalse
load_in_4bitEnable 4-bit quantization (GPU only)false

DPO-specific (training_method=dpo)

KeyDescriptionDefault
betaKL penalty coefficient0.1
loss_typeLoss type: sigmoid, hinge, iposigmoid
max_lengthMaximum sequence length512

GRPO-specific (training_method=grpo)

KeyDescriptionDefault
num_generationsNumber of generations per prompt4
max_completion_lengthMax completion token length256

GRPO Reward Functions

GRPO training requires reward functions to evaluate model completions. Specify them via the reward_functions field (a typed array) or via extra_options["reward_funcs"] (a JSON string).

Built-in Reward Functions

NameDescriptionParameters
format_rewardChecks <think>...</think> then answer format (1.0/0.0)
reasoning_accuracy_rewardExtracts <answer> content, compares to dataset’s answer column
length_rewardScore based on proximity to target length [0, 1]target_length (default: 200)
xml_tag_rewardScores properly opened/closed <think> and <answer> tags
no_repetition_rewardPenalizes n-gram repetition [0, 1]
code_execution_rewardChecks Python code block syntax validity (1.0/0.0)

Inline Custom Reward Functions

You can provide custom reward function code as a Python function body. The function receives completions (list of strings) and **kwargs, and must return list[float].

Security restrictions for inline code:

  • Allowed builtins: len, int, float, str, list, dict, range, enumerate, zip, map, filter, sorted, min, max, sum, abs, round, any, all, isinstance, print, True, False, None
  • Available modules: re, math, json, string
  • Blocked: open, __import__, exec, eval, compile, os, subprocess, getattr, setattr, delattr, globals, locals
  • Functions are compiled and validated at job start (fail-fast on syntax errors)

Example API Request

curl -X POST http://localhost:8080/api/fine-tuning/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-1.5B-Instruct",
    "backend": "trl",
    "training_method": "grpo",
    "training_type": "lora",
    "dataset_source": "my-reasoning-dataset",
    "num_epochs": 1,
    "batch_size": 2,
    "learning_rate": 5e-6,
    "reward_functions": [
      {"type": "builtin", "name": "reasoning_accuracy_reward"},
      {"type": "builtin", "name": "format_reward"},
      {"type": "builtin", "name": "length_reward", "params": {"target_length": "200"}},
      {"type": "inline", "name": "think_presence", "code": "return [1.0 if \"<think>\" in c else 0.0 for c in completions]"}
    ],
    "extra_options": {
      "num_generations": "4",
      "max_completion_length": "256"
    }
  }'

Export Formats

FormatDescriptionNotes
loraLoRA adapter filesSmallest, requires base model
merged_16bitFull model in 16-bitLarge but standalone
merged_4bitFull model in 4-bitSmaller, standalone
ggufGGUF formatFor llama.cpp, requires quantization_method

GGUF Quantization Methods

q4_k_m, q5_k_m, q8_0, f16, q4_0, q5_0

Web UI

When fine-tuning is enabled, a “Fine-Tune” page appears in the sidebar under the Agents section. The UI provides:

  1. Job Configuration — Select backend, model, training method, adapter type, and hyperparameters
  2. Dataset Upload — Upload local datasets or reference HuggingFace datasets
  3. Training Monitor — Real-time loss chart, progress bar, metrics display
  4. Export — Export trained models in various formats

Dataset Formats

Datasets should follow standard HuggingFace formats:

  • SFT: Alpaca format (instruction, input, output fields) or ChatML/ShareGPT
  • DPO: Preference pairs (prompt, chosen, rejected fields)
  • GRPO: Prompts with reward signals

Supported file formats: .json, .jsonl, .csv

Architecture

Fine-tuning uses the same gRPC backend architecture as inference:

  1. Proto layer: FineTuneRequest, FineTuneProgress (streaming), StopFineTune, ListCheckpoints, ExportModel
  2. Python backends: Each backend implements the gRPC interface with its specific training framework
  3. Go service: Manages job lifecycle, routes API requests to backends
  4. REST API: HTTP endpoints with SSE progress streaming
  5. React UI: Configuration form, real-time training monitor, export panel