The Scavenger Hunt
By Jason Waldrip
Back in January we shipped AI-DLC — our first real attempt at making AI agents do actual engineering work, not just write code snippets. It was a methodology built on skills: big, structured prompts you'd load into the agent at the start of a task, describing the whole workflow in one pass. The agent would read the skill, internalize it, and run. Researcher hat, planner hat, builder hat, reviewer hat. Units with clear success criteria. Quality gates. It was an honest attempt to give a language model the scaffolding it needed to do sustained, multi-step engineering instead of one-shot tasks.
For a while, it worked. Then it didn't. Context compacted. The agent drifted. Six units in, it would forget what the skill told it in paragraph two. Someone hit /clear and the entire plan evaporated. We saw the same failure mode over and over: the agent couldn't hold the whole plan in its head, and the moment it tried, we lost the plan.
H·AI·K·U is what AI-DLC became when we stopped trying to fix that and rewrote the whole thing. Same goal — structured, multi-agent engineering work with quality gates, review cycles, and human checkpoints. Completely different mechanism. Instead of handing the agent a map, H·AI·K·U makes it run a scavenger hunt: ask for the next clue, execute it, come back, ask for the next clue. The agent never sees the whole map. It doesn't need to.
This post is about what falls out of that flip — the architecture, the state machine, the hooks, and the two places where the user is still in the loop.
- The agent is dumb on purpose. It never decides what's next — it asks
haiku_run_next. - State lives on disk, not in context. Every turn starts fresh from
.haiku/intents/{slug}/.... Context compaction,/clear, crashes — none of it matters. - Hooks are seatbelts. They can't decide anything, but they can stop the agent from editing the wrong files or skipping quality gates.
The download vs. the scavenger hunt
The easiest way to see the flip is to put the two side by side.
AI-DLC front-loaded everything. One big prompt, delivered once, and then the agent was on its own. H·AI·K·U inverts that. The agent sees one clue at a time — start_unit, gate_review, fix_quality_gates — executes it, and comes back for the next one. The map only exists in the orchestrator and the FSM files on disk.
That's why a H·AI·K·U run survives context compaction, /clear, crashes, and even a different agent picking it up mid-flight. The state isn't in the conversation. It's in the filesystem.
| AI-DLC skills | H·AI·K·U | |
|---|---|---|
| Instructions | Front-loaded — entire skill dumped into context at start | Drip-fed — one action per haiku_run_next call |
| State | Lives in the agent's head (conversation context) | Lives on disk; agent is stateless between ticks |
| Control flow | Agent decides what's next | Orchestrator decides what's next |
| Recovery | Context lost = start over | Context lost = next turn reads state.json and resumes |
| Enforcement | Trust the agent to follow the prompt | Hooks physically block violations; orchestrator refuses invalid transitions |
| Parallelism | One agent, one thread | Waves of units in isolated worktrees |
| User checkpoints | Wherever the skill happens to ask | Forced at elaboration + review gates, can't be skipped |
| Failure mode | Agent hallucinates a plan and runs off the rails | Agent gets told "no, do X" on the next tick |
The full architecture
Five layers: the user, the Claude Code harness (agent + hooks), the H·AI·K·U MCP server, the FSM state on disk, and external systems — git worktrees, the review UI, quality gates.
The orchestrator is the brain. Hooks are spinal reflexes. The orchestrator decides what should happen next. Hooks make sure the agent physically can't do anything else between orchestrator calls. The agent is the body that executes the current action. And everything the orchestrator knows, it knows from reading files on disk.
The per-stage phase machine
Every stage walks this state machine. The orchestrator enforces the transitions — the agent can only move forward when the preconditions are satisfied. Not when it thinks they are. When they actually are.
Four phases per stage, each with its own gate:
elaborate— the agent decomposes the work into unit files withdepends_on. Collaborative stages force at least three user turns before the specs can be finalized. Then the DAG gets validated, unit naming gets validated, unit types get validated, declared inputs get validated, and required discovery artifacts have to exist. Miss any of that and you're stuck in elaborate.execute— units run wave-by-wave based on a topological sort. Each unit gets its own git worktree. Hats rotate through the unit's bolts.review— quality gates run first. Tests, lint, typecheck. If any of them fail, the agent bounces back to fix them before the adversarial review agents even look at the code. We don't waste review cycles on code that doesn't compile.gate— reads thereview:field fromSTAGE.md.autoadvances. Everything else —ask,external,await, combos — opens the review UI and blocks until the user or an external event says go.
Where the user actually touches the loop
Two places. Only two.
Elaboration conversation
Collaborative stages require at least three back-and-forth turns with the user before units can be finalized. elaboration_turns is tracked in state.json. Until it hits the threshold, the orchestrator returns elaboration_insufficient no matter what the agent does.
Review gates
The orchestrator returns gate_review and handleOrchestratorTool calls _openReviewAndWait, which literally blocks the MCP tool call until the user clicks Approve, Request Changes, or Open PR in the web UI.
Everything else is the agent in a tight loop: call haiku_run_next, do the work the action described, call haiku_run_next again. No meetings. No status updates. No "what should I do next?" Just clues.
Where hooks come in
Hooks live in packages/haiku/src/hooks/*.ts and run inside Claude Code's hook system — not inside the agent's reasoning. They exist because the agent can't be trusted to enforce its own rules. That's not an insult. It's a design decision. If you have to trust the agent to follow the rules, you'll find out it didn't the moment something goes wrong.
Block bad moves
prompt-guard— rejects bad promptsworkflow-guard— prevents edits outside the active unit/stageredirect-plan-mode— routes plan mode through haikuenforce-iteration— forces bolt counter incrementguard-fsm-fields— blocks direct edits to FSM-managed frontmattervalidate-unit-type— rejects wrong-typed unit filesensure-deps— verifies deps before running gates
Keep the agent honest
inject-context+inject-state-file— automatically injects FSM state so the agent never has to asksubagent-context/subagent-hook— scopes context for spawned subagentstrack-outputs— records which files the agent produced for a unitcontext-monitor— watches token budgetquality-gate— hook wrapper aroundrunQualityGates
Why this works
There's a story I keep coming back to. A senior engineer once told me, "the train can only move as fast as the tracks that it's built on." You can put any engine you want on top of bad rails. It doesn't matter. The rails are the ceiling. Agent frameworks spent years trying to put a bigger engine on the same bad rails — longer context windows, better prompts, more clever reasoning. And the rails kept failing in the same places: memory, state, recovery, enforcement.
H·AI·K·U is a bet on the rails. The engine doesn't need to be smarter. It needs to be asked the right question at the right time, and it needs to be stopped when it tries to do the wrong thing. The filesystem is the memory. The orchestrator is the brain. The hooks are the reflexes. The agent is the body. And the user, when the user is needed, is called into exactly the right place — not because the agent remembered to ask, but because the FSM refuses to advance until they do.
The agent never sees the whole map. It doesn't need to. It just needs the next clue.