Running on Nvidia ARM64
LocalAI can be run on Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, Jetson AGX Orin, and Nvidia DGX Spark. The following instructions will guide you through building and using the LocalAI container for Nvidia ARM64 devices.
Platform Compatibility
- CUDA 12 L4T images: Compatible with Nvidia AGX Orin and similar platforms (Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- CUDA 13 L4T images: Compatible with Nvidia DGX Spark
Prerequisites
- Docker engine installed (https://docs.docker.com/engine/install/ubuntu/)
- Nvidia container toolkit installed (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-ap)
Pre-built Images
Pre-built images are available on quay.io and dockerhub:
CUDA 12 (for AGX Orin and similar platforms)
CUDA 13 (for DGX Spark)
Build the container
If you need to build the container yourself, use the following commands:
CUDA 12 (for AGX Orin and similar platforms)
CUDA 13 (for DGX Spark)
Usage
Run the LocalAI container on Nvidia ARM64 devices using the following commands, where /data/models is the directory containing the models:
CUDA 12 (for AGX Orin and similar platforms)
CUDA 13 (for DGX Spark)
Note: /data/models is the directory containing the models. You can replace it with the directory containing your models.
GPU reporting in distributed mode
If you run a worker on a Jetson, DGX Spark (GB10), or Thor and the Nodes page in the frontend shows the node as fully used, check two things:
NVIDIA_DRIVER_CAPABILITIESmust includeutilitysonvidia-smi/ NVML work inside the container. With--gpus allalone (or--runtime nvidiawithout extra flags) onlycomputeis wired in on some driver versions. Add-e NVIDIA_DRIVER_CAPABILITIES=compute,utilityto yourdocker run, orcapabilities: [gpu, utility]in compose / Kubernetes device reservations.- Pass
--inittodocker run(orinit: truein compose) so the container has a proper PID 1 reaper — otherwise short-lived child processes likenvidia-smican intermittently fail withwaitid: no child processes.
On unified-memory devices LocalAI auto-detects the SoC via
/sys/devices/soc0/{family,soc_id} and reports system RAM as VRAM, so
nvidia-smi is not strictly required for VRAM metrics. See
Distributed Mode → NVIDIA GPU support
for full context.