Skip to content

Slot as model

hal0’s OpenAI-compatible API accepts two kinds of identifiers in the model field:

  • A registry ref — e.g. "qwen2.5-0.5b-instruct-q4_k_m" — picks that exact model file.
  • A slot name — e.g. "primary" — picks whatever model is currently loaded in that slot.

Both routes go through the dispatcher. Slot names are stable; registry refs change every time you pull a newer quant. Most clients should address slots, not registry refs.

Imagine you’re a Python script that wants “the chat model”:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="local")
# Stable across model swaps:
resp = client.chat.completions.create(
model="primary",
messages=[{"role": "user", "content": "Hello!"}],
)

Tomorrow you swap primary from a 7B Qwen-Coder to a 70B Hermes-4 — the script still works. Address by slot, swap models independently of clients.

When you actually want a specific model — say you’re benchmarking two quants side-by-side — use the registry ref:

resp = client.chat.completions.create(
model="qwen2.5-0.5b-instruct-q4_k_m",
messages=[{"role": "user", "content": "Hello!"}],
)

The dispatcher resolves the ref, picks the slot that owns it, and proxies the request. If the model isn’t loaded in any slot, you get a structured model.not_loaded error.

GET /v1/models returns both flavours so the client can discover either route:

{
"object": "list",
"data": [
{"id": "primary", "object": "model", "owned_by": "hal0"},
{"id": "embed", "object": "model", "owned_by": "hal0"},
{"id": "stt", "object": "model", "owned_by": "hal0"},
{"id": "tts", "object": "model", "owned_by": "hal0"},
{"id": "qwen2.5-0.5b-instruct-q4_k_m", "object": "model", "owned_by": "hal0"}
]
}

OpenWebUI and most clients pick the first registry-ref-looking entry. That’s why we list slot aliases first — they’re stable.

You sent…Dispatcher resolves to…
"primary"The model currently in the primary slot.
"embed"The model currently in the embed slot.
"stt"The model currently in the stt slot.
"tts"The model currently in the tts slot.
A custom slot nameThat slot’s current model.
A registry refThe slot that currently owns that ref.
An external upstream refThe upstream provider (OpenRouter, Anthropic, OpenAI, …).
Anything else{"error": {"code": "model.not_found", ...}}