✍️ Constrained grammars

The chat endpoint accepts an additional grammar parameter which takes a BNF defined grammar.

This allows the LLM to constrain the output to a user-defined schema, allowing to generate JSON, YAML, and everything that can be defined with a BNF grammar.


This feature works only with models compatible with the llama.cpp backend (see also Model compatibility). For details on how it works, see the upstream PRs: https://github.com/ggerganov/llama.cpp/pull/1773, https://github.com/ggerganov/llama.cpp/pull/1887


Follow the setup instructions from the LocalAI functions page.

💡 Usage example

For example, to constrain the output to either yes, no:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "gpt-4",
  "messages": [{"role": "user", "content": "Do you like apples?"}],
  "grammar": "root ::= (\"yes\" | \"no\")"