roadmap

A platform for the local AI stack.

hal0 is not a llama-server wrapper. v1 ships a polished OpenAI-compatible inference platform — slots, dispatcher, hardware probe, bundled chat, signed self-update. v0.2 takes the same lifecycle discipline and extends it into the rest of the local AI stack: agents, persistent memory, MCP, tools, image, voice, third-party extensions. The goal is one place on your hardware that runs the whole stack, not seven daemons glued together.

No dates below. Items are direction — the closer a thing is to the left column, the closer it is to running on your box.

Shipped

OpenAI-compatible /v1/* API

shipped

Chat, completions, embeddings, rerank, transcriptions, speech — every OpenAI SDK works unchanged against your local box.

Slot lifecycle state machine

shipped

Real atomic transitions (offline → pulling → starting → warming → ready → serving), persisted and SSE-streamed so the dashboard reflects reality, not snapshots.

Hardware-aware probe

shipped

Detects GPU, NPU, and unified memory; surfaces VRAM/RAM fit warnings inline before you load a model that won’t fit.

Hugging Face model pulls

shipped

Streaming downloads from the dashboard or CLI with live progress. No manual GGUF wrangling.

Bundled OpenWebUI

shipped

Prewired chat on :3001 pointed at the local API. Zero config — the installer wires it for you.

Cosign-signed self-update

shipped

Atomic version swap with rollback. Stable and nightly channels, GitHub OIDC-verified release tarballs.

Five-provider stack

shipped

llama.cpp (Vulkan / ROCm) for chat + embed, FLM for NPU multiplex, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation — all first-class in v1.

Image generation

shipped

POST /v1/images/generations served by a ComfyUI provider on ROCm. Curated SDXL Turbo / SD 1.5 / Flux Schnell with license badges; the img slot runs inside the same lifecycle as chat, embed, and audio.

Caddy + auth + HTTPS + mDNS

shipped

install.sh --auth=basic brings up Caddy at the edge with basic_auth, Bearer tokens for the OpenAI API, automatic HTTPS (internal CA for .local, Let’s Encrypt for real domains), and a best-effort Avahi service file so hal0.local resolves on the LAN. Off by default — opt in when you’re ready to expose the box.

Installer overhaul

shipped

ASCII banner + step counter, extracted preflight.sh (disk + ports + python + systemd), hal0 doctor for re-running it post-install, hardware cards + auto-populated slots/primary.toml, contextual ERR-trap recovery hints, and a live-hello + QR + reachability finish.

Soon

v0.2

Agent management & integration

in-progress

First-class agents stored on hal0 with conversation orchestration and deterministic tool-use loops. The platform for agents to run on, not just chat with.

Unified memory system

in-progress

Persistent, federated memory across local slots and (optionally) external models. Mem0-style API with sources, search, and scoping.

MCP support — host and server

in-progress

hal0 speaks Model Context Protocol both directions. Compose tools across local slots and external MCP services; discover them from the dashboard.

Extensions framework

in-progress

Third-party apps packaged as hal0 extensions: systemd unit + dashboard tile + healthcheck. Bring-your-own-app, lifecycle-managed.

Benchmarks & presets UI

in-progress

In-dashboard tok/s + latency runs, plus curated loadout presets you can flash onto a fresh install.

AUR PKGBUILD & Ubuntu PPA

in-progress

Native distro packages on top of the install script — pacman and apt as first-class install paths.

Exploring

v1.x +

Multi-host federation

later

A slot mesh across LAN boxes — primary on the Strix Halo, embed on the workstation, all behind one /v1/* surface.

Fine-tune & LoRA hot-swap

later

Attach and rotate LoRAs against a warm base model without unloading the underlying weights.

Per-model rate limits & budgets

later

Cost-style accounting for local inference — cap a chatty agent without taking the whole box down.

Voice mode end-to-end

later

Moonshine + agent loop + Kokoro stitched into a hands-free streaming conversation, gated through the slot lifecycle.

ChatOps adapters

later

Slack and Matrix bridges as extensions — talk to hal0 from the rooms you already live in.

Exploring items are bets we believe in, not promises. Scope and order will shift as v0.2 lands and we learn what people actually do with a local platform.

shape it

The roadmap is a conversation.

File issues, open discussions, send patches. hal0 is Apache-2.0 — the platform is yours to extend, fork, or steer. The agents / memory / MCP track in particular benefits from real-world use cases more than from our guesses.

Open an issue Browse the repo