OpenAI-compatible /v1/* API
shipped Chat, completions, embeddings, rerank, transcriptions, speech — every OpenAI SDK works unchanged against your local box.
Slot lifecycle state machine
shipped Real atomic transitions (offline → pulling → starting → warming → ready → serving), persisted and SSE-streamed so the dashboard reflects reality, not snapshots.
Hardware-aware probe
shipped Detects GPU, NPU, and unified memory; surfaces VRAM/RAM fit warnings inline before you load a model that won’t fit.
Hugging Face model pulls
shipped Streaming downloads from the dashboard or CLI with live progress. No manual GGUF wrangling.
Bundled OpenWebUI
shipped Prewired chat on :3001 pointed at the local API. Zero config — the installer wires it for you.
Cosign-signed self-update
shipped Atomic version swap with rollback. Stable and nightly channels, GitHub OIDC-verified release tarballs.
Five-provider stack
shipped llama.cpp (Vulkan / ROCm) for chat + embed, FLM for NPU multiplex, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation — all first-class in v1.
Image generation
shipped POST /v1/images/generations served by a ComfyUI provider on ROCm. Curated SDXL Turbo / SD 1.5 / Flux Schnell with license badges; the img slot runs inside the same lifecycle as chat, embed, and audio.
Caddy + auth + HTTPS + mDNS
shipped install.sh --auth=basic brings up Caddy at the edge with basic_auth, Bearer tokens for the OpenAI API, automatic HTTPS (internal CA for .local, Let’s Encrypt for real domains), and a best-effort Avahi service file so hal0.local resolves on the LAN. Off by default — opt in when you’re ready to expose the box.
Installer overhaul
shipped ASCII banner + step counter, extracted preflight.sh (disk + ports + python + systemd), hal0 doctor for re-running it post-install, hardware cards + auto-populated slots/primary.toml, contextual ERR-trap recovery hints, and a live-hello + QR + reachability finish.
Agent management & integration
in-progress First-class agents stored on hal0 with conversation orchestration and deterministic tool-use loops. The platform for agents to run on, not just chat with.
Unified memory system
in-progress Persistent, federated memory across local slots and (optionally) external models. Mem0-style API with sources, search, and scoping.
MCP support — host and server
in-progress hal0 speaks Model Context Protocol both directions. Compose tools across local slots and external MCP services; discover them from the dashboard.
Extensions framework
in-progress Third-party apps packaged as hal0 extensions: systemd unit + dashboard tile + healthcheck. Bring-your-own-app, lifecycle-managed.
Benchmarks & presets UI
in-progress In-dashboard tok/s + latency runs, plus curated loadout presets you can flash onto a fresh install.
AUR PKGBUILD & Ubuntu PPA
in-progress Native distro packages on top of the install script — pacman and apt as first-class install paths.
Multi-host federation
later A slot mesh across LAN boxes — primary on the Strix Halo, embed on the workstation, all behind one /v1/* surface.
Fine-tune & LoRA hot-swap
later Attach and rotate LoRAs against a warm base model without unloading the underlying weights.
Per-model rate limits & budgets
later Cost-style accounting for local inference — cap a chatty agent without taking the whole box down.
Voice mode end-to-end
later Moonshine + agent loop + Kokoro stitched into a hands-free streaming conversation, gated through the slot lifecycle.
ChatOps adapters
later Slack and Matrix bridges as extensions — talk to hal0 from the rooms you already live in.