Skip to content

Custom slots

The four built-in slots cover the OpenAI-compatible surface. You can define additional slots — for a second chat model, a vision model, the FLM provider on AMD XDNA, ComfyUI, or anything else a provider supports.

  • Keep two chat models hot at once. A small fast model for autocomplete, a big slow model for reasoning. Different slot names, same /v1/chat/completions endpoint, addressed by model name.
  • Use the NPU alongside the iGPU. A custom npu slot binds the FLM provider to XDNA so the iGPU stays free for primary chat.
  • Dedicated vision slot. A multimodal model lives in its own slot so swapping the chat primary doesn’t unload it.

The dashboard’s Slots view has an Add slot form. It takes:

  • A name (must match ^[a-z][a-z0-9_-]*$).
  • A provider (llama.cpp, flm, moonshine, kokoro).
  • A model registry ref.
  • An idle-timeout (after which the slot transitions ready → idle and becomes an unload candidate).

The form writes the slot definition to the TOML config atomically (see Config), then stages the systemd unit. You start the slot with the Load button.

Slots are entries in hal0.toml under [slots.<name>]. The shape mirrors the dashboard form one-for-one. After editing:

Terminal window
hal0 config validate # schema check
hal0 config show # confirm the merged view
hal0 slot load my-slot # bring it up

See Config schema for the field list.

  • Slot names must be unique. The four built-in names (primary, embed, stt, tts) are reserved.
  • Total slot count is capped by the port range (80818099, so 19 concurrent slots).
  • A slot can only host one model at a time. To run two models on the same provider, define two slots.

The full custom-slot authoring guide will cover:

  • Choosing a provider for the workload you have in mind.
  • Sizing notes per provider — what fits in VRAM vs unified memory vs what needs paging.
  • Per-slot env overrides and provider-specific knobs.
  • Defining a slot that fronts an external upstream (OpenRouter, Anthropic, custom OpenAI-compatible endpoint).

Until then, copy the shape of one of the built-ins in hal0.toml and adjust the model + name fields.