Object Detection

LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include RF-DETR (Python) and rf-detr.cpp (native C++/ggml) for object detection and segmentation, and sam3.cpp for image segmentation (SAM 3/2/EdgeTAM).

For detecting faces specifically, see the dedicated Face Recognition feature - its /v1/detection support is tuned for face bounding boxes and ships with commercially-safe model options.

Overview

Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.

Key Features:

Real-time object detection
High accuracy detection with bounding boxes
Image segmentation with binary masks (SAM backends)
Text-prompted, point-prompted, and box-prompted segmentation
Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
Structured detection results with confidence scores
Easy integration through the /v1/detection endpoint

Usage

Detection Endpoint

LocalAI provides a dedicated /v1/detection endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.

API Reference

To perform object detection, send a POST request to the /v1/detection endpoint:

curl -X POST http://localhost:8080/v1/detection \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rfdetr-base",
    "image": "https://media.roboflow.com/dog.jpeg"
  }'

Request Format

The request body should contain:

model: The name of the object detection model (e.g., “rfdetr-base”)
image: The image to analyze, which can be:
- A URL to an image
- A base64-encoded image
prompt (optional): Text prompt for text-prompted segmentation (SAM 3 only)
points (optional): Point coordinates as [x, y, label, ...] triples (label: 1=positive, 0=negative)
boxes (optional): Box coordinates as [x1, y1, x2, y2, ...] quads
threshold (optional): Detection confidence threshold (default: 0.5)

Response Format

The API returns a JSON response with detected objects:

{
  "detections": [
    {
      "x": 100.5,
      "y": 150.2,
      "width": 200.0,
      "height": 300.0,
      "confidence": 0.95,
      "class_name": "dog"
    },
    {
      "x": 400.0,
      "y": 200.0,
      "width": 150.0,
      "height": 250.0,
      "confidence": 0.87,
      "class_name": "person"
    }
  ]
}

Each detection includes:

x, y: Coordinates of the bounding box top-left corner
width, height: Dimensions of the bounding box
confidence: Detection confidence score (0.0 to 1.0)
class_name: The detected object class
mask (optional): Base64-encoded PNG binary segmentation mask (SAM backends only)

Backends

RF-DETR Backend

The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:

CPU: Optimized for CPU inference
NVIDIA GPU: CUDA acceleration for NVIDIA GPUs
Intel GPU: Intel oneAPI optimization
AMD GPU: ROCm acceleration for AMD GPUs
NVIDIA Jetson: Optimized for ARM64 NVIDIA Jetson devices

Setup

Using the Model Gallery (Recommended)
The easiest way to get started is using the model gallery. The rfdetr-base model is available in the official LocalAI gallery:
```
# Install and run the rfdetr-base model
local-ai run rfdetr-base
```
You can also install it through the web interface by navigating to the Models section and searching for “rfdetr-base”.
Manual Configuration
Create a model configuration file in your models directory:
```
name: rfdetr
backend: rfdetr
parameters:
  model: rfdetr-base
```

Available Models

Currently, the following model is available in the Model Gallery:

rfdetr-base: Base model with balanced performance and accuracy

You can browse and install this model through the LocalAI web interface or using the command line.

RF-DETR Native Backend (rfdetr-cpp)

The rfdetr-cpp backend is a native C++/ggml implementation of RF-DETR inference based on rf-detr.cpp. It runs as a Go gRPC service that dlopens a per-CPU-variant shared library, so there is no Python runtime on the inference path - startup is fast and the binary is self-contained.

Compared to the Python rfdetr backend, the native backend:

Has no Python or PyTorch dependency at inference time
Loads quantized GGUF models (F32, F16, Q8_0, Q4_K) for smaller footprint
Supports both detection and segmentation variants of RF-DETR
Returns segmentation masks as PNG bytes in Detection.mask

Setup

Install the backend
```
local-ai backends install rfdetr-cpp
```

Using the Model Gallery (Recommended)

The gallery ships ready-to-run entries for every published variant:

# Detection variants
local-ai run rfdetr-cpp-nano
local-ai run rfdetr-cpp-small
local-ai run rfdetr-cpp-base
local-ai run rfdetr-cpp-medium
local-ai run rfdetr-cpp-large

# Segmentation variants (return per-instance PNG masks)
local-ai run rfdetr-cpp-seg-nano
local-ai run rfdetr-cpp-seg-small
local-ai run rfdetr-cpp-seg-medium
local-ai run rfdetr-cpp-seg-large
local-ai run rfdetr-cpp-seg-xlarge
local-ai run rfdetr-cpp-seg-2xlarge

Manual Configuration
```
name: rfdetr-cpp-seg-nano
backend: rfdetr-cpp
parameters:
  model: rfdetr-seg-nano-f16.gguf
  threads: 4
known_usecases:
  - detection
```
Pre-quantized GGUFs are published under mudler/rfdetr-cpp-* on Hugging Face. Each repo carries the F32/F16/Q8_0/Q4_K quants - F16 is the recommended default (matches F32 accuracy, ~1.86x smaller).

Segmentation Output

When running a segmentation model (any rfdetr-cpp-seg-* variant), each Detection in the response carries a mask field with a base64-encoded PNG of the per-instance binary mask. The mask is sized to the original image resolution and aligns with the corresponding bounding box.

SAM3 Backend (sam3-cpp)

The sam3-cpp backend provides image segmentation using sam3.cpp, a portable C++ implementation of Meta’s Segment Anything Model. It supports multiple model architectures:

SAM 3: Full model with text encoder for text-prompted detection and segmentation
SAM 2 / SAM 2.1: Hiera backbone models in multiple sizes
SAM 3 Visual-Only: Point/box segmentation without text encoder
EdgeTAM: Ultra-efficient mobile variant (~15MB quantized)

Setup

Manual Configuration

Create a model configuration file in your models directory:

name: sam3
backend: sam3-cpp
parameters:
  model: edgetam_q4_0.ggml
  threads: 4
known_usecases:
  - detection

Download the model from Hugging Face.

Segmentation Modes

Point-prompted segmentation (all models):

curl -X POST http://localhost:8080/v1/detection \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sam3",
    "image": "data:image/jpeg;base64,...",
    "points": [256.0, 256.0, 1.0],
    "threshold": 0.5
  }'

Box-prompted segmentation (all models):

curl -X POST http://localhost:8080/v1/detection \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sam3",
    "image": "data:image/jpeg;base64,...",
    "boxes": [100.0, 100.0, 400.0, 400.0],
    "threshold": 0.5
  }'

Text-prompted segmentation (SAM 3 full model only):

curl -X POST http://localhost:8080/v1/detection \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sam3",
    "image": "data:image/jpeg;base64,...",
    "prompt": "cat",
    "threshold": 0.5
  }'

The response includes segmentation masks as base64-encoded PNGs in the mask field of each detection.

Examples

Basic Object Detection

curl -X POST http://localhost:8080/v1/detection \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rfdetr-base",
    "image": "https://example.com/image.jpg"
  }'

Base64 Image Detection

base64_image=$(base64 -w 0 image.jpg)
curl -X POST http://localhost:8080/v1/detection \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"rfdetr-base\",
    \"image\": \"data:image/jpeg;base64,$base64_image\"
  }"

Troubleshooting

Common Issues

Model Loading Errors
- Ensure the model file is properly downloaded
- Check available disk space
- Verify model compatibility with your backend version
Low Detection Accuracy
- Ensure good image quality and lighting
- Check if objects are clearly visible
- Consider using a larger model for better accuracy
Slow Performance
- Enable GPU acceleration if available
- Use a smaller model for faster inference
- Optimize image resolution

Debug Mode

Enable debug logging for troubleshooting:

local-ai run --debug rfdetr-base