Built-in slots
hal0 always ships four slots out of the box. They live in
BUILTIN_SLOTS (src/hal0/slots/manager.py) and cannot be deleted
from the dashboard — you can swap their model, unload them, or leave
them offline, but the slot itself is always present.
The four slots
Section titled “The four slots”| Slot | What it serves | Default backend |
|---|---|---|
primary | Chat and general LLM — /v1/chat/completions, /v1/completions | llama.cpp (Vulkan) |
embed | Embeddings — /v1/embeddings — and rerank — /v1/rerankings | llama.cpp (Vulkan) |
stt | Speech-to-text — /v1/audio/transcriptions | Moonshine |
tts | Text-to-speech — /v1/audio/speech | Kokoro |
Why four
Section titled “Why four”These map directly to the modalities OpenAI exposes through /v1/*.
Any client written against the OpenAI SDK can hit hal0 unmodified and
reach chat, embeddings, transcription, and speech. Rerank piggybacks
on the embed slot because it uses the same backend process.
Addressing
Section titled “Addressing”All four bind to 127.0.0.1 on a port in the slot range
(8081–8099). Only the API (:8080) and OpenWebUI (:3001) bind
public interfaces. Clients should always talk to the API, never to a
slot directly — the API does authentication, single-flight, and
structured-error wrapping.
You address a slot by its name in the OpenAI model field:
curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "primary", "messages": [{"role": "user", "content": "Hello!"}] }'The dispatcher resolves "primary" to whichever model is currently
loaded in the primary slot. See
Slot as model for the full convention.
Swapping the default model
Section titled “Swapping the default model”Every slot has a default model picked at install time by the hardware probe. You can swap it at any time:
hal0 slot swap primary --model qwen3-30b-a3b-instruct-2507-q4_k_mThe slot transitions through unloading → starting → warming → ready
without dropping the API socket — in-flight requests on other slots
keep flowing.
User-defined slots
Section titled “User-defined slots”Beyond the four built-ins, you can add custom slots — e.g. a vision
slot for a multimodal model, or an npu slot for the FLM provider on
AMD XDNA hardware. See Custom slots.