Back to blog
Architecture

The Scavenger Hunt

By Jason Waldrip

fsmorchestratorhooksai-dlccontext

Back in January we shipped AI-DLC — our first real attempt at making AI agents do actual engineering work, not just write code snippets. It was a methodology built on skills: big, structured prompts you'd load into the agent at the start of a task, describing the whole workflow in one pass. The agent would read the skill, internalize it, and run. Researcher hat, planner hat, builder hat, reviewer hat. Units with clear success criteria. Quality gates. It was an honest attempt to give a language model the scaffolding it needed to do sustained, multi-step engineering instead of one-shot tasks.

For a while, it worked. Then it didn't. Context compacted. The agent drifted. Six units in, it would forget what the skill told it in paragraph two. Someone hit /clear and the entire plan evaporated. We saw the same failure mode over and over: the agent couldn't hold the whole plan in its head, and the moment it tried, we lost the plan.

H·AI·K·U is what AI-DLC became when we stopped trying to fix that and rewrote the whole thing. Same goal — structured, multi-agent engineering work with quality gates, review cycles, and human checkpoints. Completely different mechanism. Instead of handing the agent a map, H·AI·K·U makes it run a scavenger hunt: ask for the next clue, execute it, come back, ask for the next clue. The agent never sees the whole map. It doesn't need to.

This post is about what falls out of that flip — the architecture, the state machine, the hooks, and the two places where the user is still in the loop.

Three things to remember
  1. The agent is dumb on purpose. It never decides what's next — it asks haiku_run_next.
  2. State lives on disk, not in context. Every turn starts fresh from .haiku/intents/{slug}/.... Context compaction, /clear, crashes — none of it matters.
  3. Hooks are seatbelts. They can't decide anything, but they can stop the agent from editing the wrong files or skipping quality gates.

The download vs. the scavenger hunt

The easiest way to see the flip is to put the two side by side.

AI-DLC front-loaded everything. One big prompt, delivered once, and then the agent was on its own. H·AI·K·U inverts that. The agent sees one clue at a time — start_unit, gate_review, fix_quality_gates — executes it, and comes back for the next one. The map only exists in the orchestrator and the FSM files on disk.

That's why a H·AI·K·U run survives context compaction, /clear, crashes, and even a different agent picking it up mid-flight. The state isn't in the conversation. It's in the filesystem.

AI-DLC skillsH·AI·K·U
InstructionsFront-loaded — entire skill dumped into context at startDrip-fed — one action per haiku_run_next call
StateLives in the agent's head (conversation context)Lives on disk; agent is stateless between ticks
Control flowAgent decides what's nextOrchestrator decides what's next
RecoveryContext lost = start overContext lost = next turn reads state.json and resumes
EnforcementTrust the agent to follow the promptHooks physically block violations; orchestrator refuses invalid transitions
ParallelismOne agent, one threadWaves of units in isolated worktrees
User checkpointsWherever the skill happens to askForced at elaboration + review gates, can't be skipped
Failure modeAgent hallucinates a plan and runs off the railsAgent gets told "no, do X" on the next tick

The full architecture

Five layers: the user, the Claude Code harness (agent + hooks), the H·AI·K·U MCP server, the FSM state on disk, and external systems — git worktrees, the review UI, quality gates.

The orchestrator is the brain. Hooks are spinal reflexes. The orchestrator decides what should happen next. Hooks make sure the agent physically can't do anything else between orchestrator calls. The agent is the body that executes the current action. And everything the orchestrator knows, it knows from reading files on disk.

The per-stage phase machine

Every stage walks this state machine. The orchestrator enforces the transitions — the agent can only move forward when the preconditions are satisfied. Not when it thinks they are. When they actually are.

Four phases per stage, each with its own gate:

  • elaborate — the agent decomposes the work into unit files with depends_on. Collaborative stages force at least three user turns before the specs can be finalized. Then the DAG gets validated, unit naming gets validated, unit types get validated, declared inputs get validated, and required discovery artifacts have to exist. Miss any of that and you're stuck in elaborate.
  • execute — units run wave-by-wave based on a topological sort. Each unit gets its own git worktree. Hats rotate through the unit's bolts.
  • review — quality gates run first. Tests, lint, typecheck. If any of them fail, the agent bounces back to fix them before the adversarial review agents even look at the code. We don't waste review cycles on code that doesn't compile.
  • gate — reads the review: field from STAGE.md. auto advances. Everything else — ask, external, await, combos — opens the review UI and blocks until the user or an external event says go.

Where the user actually touches the loop

Two places. Only two.

Seam 1

Elaboration conversation

Collaborative stages require at least three back-and-forth turns with the user before units can be finalized. elaboration_turns is tracked in state.json. Until it hits the threshold, the orchestrator returns elaboration_insufficient no matter what the agent does.

Seam 2

Review gates

The orchestrator returns gate_review and handleOrchestratorTool calls _openReviewAndWait, which literally blocks the MCP tool call until the user clicks Approve, Request Changes, or Open PR in the web UI.

Everything else is the agent in a tight loop: call haiku_run_next, do the work the action described, call haiku_run_next again. No meetings. No status updates. No "what should I do next?" Just clues.

Where hooks come in

Hooks live in packages/haiku/src/hooks/*.ts and run inside Claude Code's hook system — not inside the agent's reasoning. They exist because the agent can't be trusted to enforce its own rules. That's not an insult. It's a design decision. If you have to trust the agent to follow the rules, you'll find out it didn't the moment something goes wrong.

Guardrails

Block bad moves

  • prompt-guard — rejects bad prompts
  • workflow-guard — prevents edits outside the active unit/stage
  • redirect-plan-mode — routes plan mode through haiku
  • enforce-iteration — forces bolt counter increment
  • guard-fsm-fields — blocks direct edits to FSM-managed frontmatter
  • validate-unit-type — rejects wrong-typed unit files
  • ensure-deps — verifies deps before running gates
Context injection

Keep the agent honest

  • inject-context + inject-state-file — automatically injects FSM state so the agent never has to ask
  • subagent-context / subagent-hook — scopes context for spawned subagents
  • track-outputs — records which files the agent produced for a unit
  • context-monitor — watches token budget
  • quality-gate — hook wrapper around runQualityGates

Why this works

There's a story I keep coming back to. A senior engineer once told me, "the train can only move as fast as the tracks that it's built on." You can put any engine you want on top of bad rails. It doesn't matter. The rails are the ceiling. Agent frameworks spent years trying to put a bigger engine on the same bad rails — longer context windows, better prompts, more clever reasoning. And the rails kept failing in the same places: memory, state, recovery, enforcement.

H·AI·K·U is a bet on the rails. The engine doesn't need to be smarter. It needs to be asked the right question at the right time, and it needs to be stopped when it tries to do the wrong thing. The filesystem is the memory. The orchestrator is the brain. The hooks are the reflexes. The agent is the body. And the user, when the user is needed, is called into exactly the right place — not because the agent remembered to ask, but because the FSM refuses to advance until they do.

The agent never sees the whole map. It doesn't need to. It just needs the next clue.