roadmap

A platform for the local AI stack.

hal0 is not a llama-server wrapper. v1 ships a polished OpenAI-compatible inference platform — slots, dispatcher, hardware probe, bundled chat, signed self-update. v0.2 takes the same lifecycle discipline and extends it into the rest of the local AI stack: agents, persistent memory, MCP, tools, image, voice, third-party extensions. The goal is one place on your hardware that runs the whole stack, not seven daemons glued together.

No dates below. Items are direction — the closer a thing is to the left column, the closer it is to running on your box.

Shipped

v1

OpenAI-compatible /v1/* API

shipped
Chat, completions, embeddings, rerank, transcriptions, speech — every OpenAI SDK works unchanged against your local box.

Slot lifecycle state machine

shipped
Real atomic transitions (offline → pulling → starting → warming → ready → serving), persisted and SSE-streamed so the dashboard reflects reality, not snapshots.

Hardware-aware probe

shipped
Detects GPU, NPU, and unified memory; surfaces VRAM/RAM fit warnings inline before you load a model that won’t fit.

Hugging Face model pulls

shipped
Streaming downloads from the dashboard or CLI with live progress. No manual GGUF wrangling.

Bundled OpenWebUI

shipped
Prewired chat on :3001 pointed at the local API. Zero config — the installer wires it for you.

Cosign-signed self-update

shipped
Atomic version swap with rollback. Stable and nightly channels, GitHub OIDC-verified release tarballs.

Five-provider stack

shipped
llama.cpp (Vulkan / ROCm) for chat + embed, FLM for NPU multiplex, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation — all first-class in v1.

Image generation

shipped
POST /v1/images/generations served by a ComfyUI provider on ROCm. Curated SDXL Turbo / SD 1.5 / Flux Schnell with license badges; the img slot runs inside the same lifecycle as chat, embed, and audio.

Caddy + auth + HTTPS + mDNS

shipped
install.sh --auth=basic brings up Caddy at the edge with basic_auth, Bearer tokens for the OpenAI API, automatic HTTPS (internal CA for .local, Let’s Encrypt for real domains), and a best-effort Avahi service file so hal0.local resolves on the LAN. Off by default — opt in when you’re ready to expose the box.

Installer overhaul

shipped
ASCII banner + step counter, extracted preflight.sh (disk + ports + python + systemd), hal0 doctor for re-running it post-install, hardware cards + auto-populated slots/primary.toml, contextual ERR-trap recovery hints, and a live-hello + QR + reachability finish.

Soon

v0.2

Agent management & integration

in-progress
First-class agents stored on hal0 with conversation orchestration and deterministic tool-use loops. The platform for agents to run on, not just chat with.

Unified memory system

in-progress
Persistent, federated memory across local slots and (optionally) external models. Mem0-style API with sources, search, and scoping.

MCP support — host and server

in-progress
hal0 speaks Model Context Protocol both directions. Compose tools across local slots and external MCP services; discover them from the dashboard.

Extensions framework

in-progress
Third-party apps packaged as hal0 extensions: systemd unit + dashboard tile + healthcheck. Bring-your-own-app, lifecycle-managed.

Benchmarks & presets UI

in-progress
In-dashboard tok/s + latency runs, plus curated loadout presets you can flash onto a fresh install.

AUR PKGBUILD & Ubuntu PPA

in-progress
Native distro packages on top of the install script — pacman and apt as first-class install paths.

Exploring

v1.x +

Multi-host federation

later
A slot mesh across LAN boxes — primary on the Strix Halo, embed on the workstation, all behind one /v1/* surface.

Fine-tune & LoRA hot-swap

later
Attach and rotate LoRAs against a warm base model without unloading the underlying weights.

Per-model rate limits & budgets

later
Cost-style accounting for local inference — cap a chatty agent without taking the whole box down.

Voice mode end-to-end

later
Moonshine + agent loop + Kokoro stitched into a hands-free streaming conversation, gated through the slot lifecycle.

ChatOps adapters

later
Slack and Matrix bridges as extensions — talk to hal0 from the rooms you already live in.

Exploring items are bets we believe in, not promises. Scope and order will shift as v0.2 lands and we learn what people actually do with a local platform.

shape it

The roadmap is a conversation.

File issues, open discussions, send patches. hal0 is Apache-2.0 — the platform is yours to extend, fork, or steer. The agents / memory / MCP track in particular benefits from real-world use cases more than from our guesses.