Skip to content

Slot lifecycle

Every slot in hal0 is in exactly one state at any moment. The set of states is fixed; the transitions between them are validated against LEGAL_TRANSITIONS; every transition is persisted atomically to state.json and streamed over SSE to clients. The dashboard reflects real state, not just systemctl is-active snapshots.

The canonical enum is src/hal0/slots/state.py — a StrEnum, wire- stable across versions.

StateMeaning
offlineNo systemd unit active.
pullingModel files downloading / verifying.
startingsystemd unit up; container booting.
warmingContainer live; model loading into VRAM / GTT.
readyPassed health probe (non-empty /v1/models + sentinel completion).
servingInference request in-flight.
idleReady but no traffic past the idle timeout — unload candidate.
unloadingGraceful stop in progress.
errorFailed; details in state.json + journald.

error is a sideband — it’s reachable from most other states when something goes wrong, and the slot returns to offline from there once the failure is acknowledged.

The canonical happy-path flow:

┌──────────────────────────────────────────────────────────┐
│ │
▼ │
┌────────┐ pull ┌─────────┐ done ┌──────────┐ spawn ┌────────┐│
│offline │────────▶│ pulling │────────▶│ starting │─────────▶│warming ││
└────────┘ └─────────┘ └──────────┘ └────┬───┘│
▲ │ │
│ health probe pass ▼ │
│ ┌────────┐│
│ │ ready ││
│ idle timer fires └───┬────┘│
│ ┌───────────────────┤ │
│ ▼ │ │
│ ┌────────┐ │ │
│ │ idle │ │ │
│ └───┬────┘ inference │ │
│ │ request │ │
│ ▼ ▼ │
│ ┌───────────┐ done ┌────────┐│
│ │ unloading │◀───────│serving ││
│ └─────┬─────┘ └────────┘│
│ │ │
└───────────────────────────────────────┘ │
error sideband — reachable from pulling / starting / warming │
/ ready / serving / unloading; returns to offline when ack'd ──┘
FromAllowed to:
offlinepulling, starting, error
pullingstarting, error, offline
startingwarming, error
warmingready, error
readyserving, idle, unloading, error
servingready, error, unloading
idleserving, unloading, ready
unloadingoffline, error
erroroffline

Any transition not listed is rejected by the manager with a slot.invalid_transition error. This is a hard invariant — there is no escape hatch.

  • Every transition writes /var/lib/hal0/slots/<name>/state.json through the atomic-write path (NamedTemporaryFile → fsync → os.replace()).
  • The slot manager emits one SSE event per transition on GET /api/slots/events.
  • The dashboard’s Slots view subscribes to that stream — what you see in the UI is the same wire format the daemon writes to disk.
Slot stateAPI behaviour
offlineRequests fail fast with slot.not_loaded.
pullingRequests fail with slot.pulling and a progress hint.
startingRequests block briefly; if it doesn’t reach warming quickly, returns 503.
warmingRequests block on the adaptive cold-boot probe; succeed if the slot reaches ready within the request deadline.
readyRequests are served. Transitions to serving for the duration.
servingConcurrent requests stack on the slot’s queue.
idleRequests are served as in ready; first request resets the idle timer.
unloadingRequests fail with slot.unloading.
errorRequests fail with the structured error envelope from the failure.

A naive setup conflates “the systemd unit is up” with “the model is ready to serve”. They aren’t the same — the model still has to load into VRAM, the backend has to pass its first health probe, and the sentinel completion has to succeed. hal0’s state machine encodes that distinction explicitly, which is why the dashboard can show “warming — 12s elapsed” instead of “service is up” while the request still 503s.