anime_girl (Generated with AnimagineXL)

LocalAI supports generating images with Stable diffusion, running on CPU using C++ and Python implementations.

Usage

OpenAI docs: https://platform.openai.com/docs/api-reference/images/create

To generate an image you can send a POST request to the /v1/images/generations endpoint with the instruction as the request body:

  # 512x512 is supported too
curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "A cute baby sea otter",
  "size": "256x256"
}'
  

Available additional parameters: mode, step.

Note: To set a negative prompt, you can split the prompt with |, for instance: a cute baby sea otter|malformed.

  curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "floating hair, portrait, ((loli)), ((one girl)), cute face, hidden hands, asymmetrical bangs, beautiful detailed eyes, eye shadow, hair ornament, ribbons, bowties, buttons, pleated skirt, (((masterpiece))), ((best quality)), colorful|((part of the head)), ((((mutated hands and fingers)))), deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, Octane renderer, lowres, bad anatomy, bad hands, text",
  "size": "256x256"
}'
  

Backends

stablediffusion-ggml

This backend is based on stable-diffusion.cpp. Every model supported by that backend is suppoerted indeed with LocalAI.

Setup

There are already several models in the gallery that are available to install and get up and running with this backend, you can for example run flux by searching it in the Model gallery (flux.1-dev-ggml) or start LocalAI with run:

  local-ai run flux.1-dev-ggml
  

To use a custom model, you can follow these steps:

  1. Create a model file stablediffusion.yaml in the models folder:
  name: stablediffusion
backend: stablediffusion-ggml
parameters:
  model: gguf_model.gguf
step: 25
cfg_scale: 4.5
options:
- "clip_l_path:clip_l.safetensors"
- "clip_g_path:clip_g.safetensors"
- "t5xxl_path:t5xxl-Q5_0.gguf"
- "sampler:euler"
  
  1. Download the required assets to the models repository
  2. Start LocalAI

Diffusers

Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. LocalAI has a diffusers backend which allows image generation using the diffusers library.

anime_girl (Generated with AnimagineXL)

Model setup

The models will be downloaded the first time you use the backend from huggingface automatically.

Create a model configuration file in the models directory, for instance to use Linaqruf/animagine-xl with CPU:

  name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers

# Force CPU usage - set to true for GPU
f16: false
diffusers:
  cuda: false # Enable for GPU usage (CUDA)
  scheduler_type: euler_a
  

Dependencies

This is an extra backend - in the container is already available and there is nothing to do for the setup. Do not use core images (ending with -core). If you are building manually, see the build instructions.

Model setup

The models will be downloaded the first time you use the backend from huggingface automatically.

Create a model configuration file in the models directory, for instance to use Linaqruf/animagine-xl with CPU:

  name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers
cuda: true
f16: true
diffusers:
  scheduler_type: euler_a
  

Local models

You can also use local models, or modify some parameters like clip_skip, scheduler_type, for instance:

  name: stablediffusion
parameters:
  model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
cuda: true
diffusers:
  pipeline_type: StableDiffusionPipeline
  enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
  scheduler_type: "k_dpmpp_sde"
  clip_skip: 11

cfg_scale: 8
  

Configuration parameters

The following parameters are available in the configuration file:

ParameterDescriptionDefault
f16Force the usage of float16 instead of float32false
stepNumber of steps to run the model for30
cudaEnable CUDA accelerationfalse
enable_parametersParameters to enable for the modelnegative_prompt,num_inference_steps,clip_skip
scheduler_typeScheduler typek_dpp_sde
cfg_scaleConfiguration scale8
clip_skipClip skipNone
pipeline_typePipeline typeAutoPipelineForText2Image
lora_adaptersA list of lora adapters (file names relative to model directory) to applyNone
lora_scalesA list of lora scales (floats) to applyNone

There are available several types of schedulers:

SchedulerDescription
ddimDDIM
pndmPNDM
heunHeun
unipcUniPC
eulerEuler
euler_aEuler a
lmsLMS
k_lmsLMS Karras
dpm_2DPM2
k_dpm_2DPM2 Karras
dpm_2_aDPM2 a
k_dpm_2_aDPM2 a Karras
dpmpp_2mDPM++ 2M
k_dpmpp_2mDPM++ 2M Karras
dpmpp_sdeDPM++ SDE
k_dpmpp_sdeDPM++ SDE Karras
dpmpp_2m_sdeDPM++ 2M SDE
k_dpmpp_2m_sdeDPM++ 2M SDE Karras

Pipelines types available:

Pipeline typeDescription
StableDiffusionPipelineStable diffusion pipeline
StableDiffusionImg2ImgPipelineStable diffusion image to image pipeline
StableDiffusionDepth2ImgPipelineStable diffusion depth to image pipeline
DiffusionPipelineDiffusion pipeline
StableDiffusionXLPipelineStable diffusion XL pipeline
StableVideoDiffusionPipelineStable video diffusion pipeline
AutoPipelineForText2ImageAutomatic detection pipeline for text to image
VideoDiffusionPipelineVideo diffusion pipeline
StableDiffusion3PipelineStable diffusion 3 pipeline
FluxPipelineFlux pipeline
FluxTransformer2DModelFlux transformer 2D model
SanaPipelineSana pipeline
Advanced: Additional parameters

Additional arbitrarly parameters can be specified in the option field in key/value separated by ::

  name: animagine-xl
# ...
options:
- "cfg_scale:6"
  

Note: There is no complete parameter list. Any parameter can be passed arbitrarly and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters.

The example above, will result in the following python code when generating images:

  pipe(
    prompt="A cute baby sea otter", # Options passed via API
    size="256x256", # Options passed via API
    cfg_scale=6 # Additional parameter passed via configuration file
)
  

Usage

Text to Image

Use the image generation endpoint with the model name from the configuration file:

  curl http://localhost:8080/v1/images/generations \
    -H "Content-Type: application/json" \
    -d '{
      "prompt": "<positive prompt>|<negative prompt>", 
      "model": "animagine-xl", 
      "step": 51,
      "size": "1024x1024" 
    }'
  

Image to Image

https://huggingface.co/docs/diffusers/using-diffusers/img2img

An example model (GPU):

  name: stablediffusion-edit
parameters:
  model: nitrosocke/Ghibli-Diffusion
backend: diffusers
step: 25
cuda: true
f16: true
diffusers:
  pipeline_type: StableDiffusionImg2ImgPipeline
  enable_parameters: "negative_prompt,num_inference_steps,image"
  
  IMAGE_PATH=/path/to/your/image
(echo -n '{"file": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
  

Depth to Image

https://huggingface.co/docs/diffusers/using-diffusers/depth2img

  name: stablediffusion-depth
parameters:
  model: stabilityai/stable-diffusion-2-depth
backend: diffusers
step: 50
# Force CPU usage
f16: true
cuda: true
diffusers:
  pipeline_type: StableDiffusionDepth2ImgPipeline
  enable_parameters: "negative_prompt,num_inference_steps,image"

cfg_scale: 6
  
  (echo -n '{"file": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
  

img2vid

  name: img2vid
parameters:
  model: stabilityai/stable-video-diffusion-img2vid
backend: diffusers
step: 25
# Force CPU usage
f16: true
cuda: true
diffusers:
  pipeline_type: StableVideoDiffusionPipeline
  
  (echo -n '{"file": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true","size": "512x512","model":"img2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
  

txt2vid

  name: txt2vid
parameters:
  model: damo-vilab/text-to-video-ms-1.7b
backend: diffusers
step: 25
# Force CPU usage
f16: true
cuda: true
diffusers:
  pipeline_type: VideoDiffusionPipeline
  cuda: true
  
  (echo -n '{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
  

Last updated 17 Feb 2025, 16:51 +0100 . history