Back to blog
Architecture

One Instruction at a Time

By Jason Waldrip

v4cursorworkflow-enginedriftfeedbacksubagents

Most agent harnesses today are a markdown file thousands of words long. The agent loads it, nods at the section called CRITICAL, and tries to run the whole flow in one head — discovery, build, review, merge — while keeping every MUST in working memory at once. Halfway through, the prompt is already losing resolution.

Brian Suh's piece on this says the part out loud.

A longer prompt isn't the fix. The fix is to stop asking the model to be the runtime.

The category we're not in

Most harnesses we look at fall into one of two camps. Prompt-based ones use a single long system message describing the workflow, the roles, and the rules, and trust the model to execute it end to end. Skills-based ones use a library of named markdown files the agent loads on demand, each one again a self-contained program the model is asked to run start to finish. Same shape underneath: the workflow is prose, the model is the interpreter.

H·AI·K·U has never been that, and the post-mortems we wrote on AI-DLC are why. AI-DLC was prompts-as-runtime. We watched it walk over its own MUSTs more than once. One session ended with a model agreeing that a unit's quality gates had to be tightened, and then routing around the constraint by telling the human to do the write itself. The prompt was clear. The workflow simply had no surface that prevented the wrong move, and prose alone has never been able to.

The reframe we landed on, then and now: the workflow is software. Hats, gates, drift, feedback — real components with real contracts. Prompts describe the work inside a step, not the shape of the program. v4 is where that idea finally has the scaffolding it needed.

1 — Drift track

The drift sweep

Signed slots stamp content hashes. Every tick re-hashes them and emits drift events before any handler runs. No "remember to verify" paragraph in a prompt.

2 — Feedback track

The feedback queue

Findings get dispatched one hat against one finding. The agent never picks which finding to address next — the queue is already ordered.

3 — Main track

The intent walk

Walks stages, units, and hats in order. When drift and feedback are clean, this is the track that moves the work forward — and it's where most of the cursor's vocabulary lives.

The cursor

The center of the v4 engine is a small pure function. It does one thing: read the disk, return the next instruction. No model in the loop, no prompt, no judgement call. Given the same disk state, it returns the same answer.

It walks three tracks, in priority order. The drift track re-hashes signed slots and emits drift events on any mismatch. The feedback track walks open feedback across every prior stage and the current one. The main track is the intent walk — the forward progression through stages, units, hats. The first track that has something to say wins, and the cursor returns one position: drift detected, feedback awaiting a fix-hat, units ready to execute, stage complete, and so on. The agent does the matching action, then ticks again.

Cursor priority order
Each tick, the cursor reports its position and the agent takes the matching action. The first track that has something to say wins.

Drift track

Did another agent or a human change something we already signed off?

  • Drift DetectedFeedback can land here

    The agent inspects the change. If it warrants attention, file feedback so the fix loop can address it. If it's cosmetic, clear the drift signal so the cursor resumes forward progress.

Feedback track

Is open feedback waiting on a fix?

  • Feedback Awaiting Fix-Hat

    Dispatch the next fix-hat against this finding. One hat, one finding, one tick.

  • Feedback Resolved

    Close the feedback record. Forward progress resumes next tick.

Main track

Otherwise — walk forward through the pipeline:

Per stage, in order
  • HumanDesign Direction Required

    Present design options. Wait for the user to pick one.

  • HumanClarifying Questions Pending

    Ask the user the stage's clarifying questions.

  • Discovery Required

    Run the configured discovery agent on units missing its artifact.

  • Stage Empty

    Elaborate — write the unit specs this stage will produce.

  • Spec Review PendingFeedback can land here

    Dispatch a review-agent to read each unit's spec. They file feedback if anything's off.

  • HumanSpec GateFeedback can land here

    Surface the specs to the human reviewer for sign-off. Their feedback is the same shape as anyone else's.

  • Units Ready to Execute

    Run the next hat on every unit whose dependencies are clear. Eligible units run as a wave (parallel subagents); the wave is a mechanic, the job is to execute the units.

  • Quality Gates Pending

    Run the engine's automated checks against the produced work — tests, linting, type-checking, anything the unit declared. Pass or fail by exit code. No subagent involved; the engine runs them.

  • Approval PendingFeedback can land here

    Dispatch a review-agent to evaluate the produced work. Findings come back as feedback.

  • HumanApproval GateFeedback can land here

    Surface the work to the human reviewer for sign-off. They can leave feedback or approve.

  • Stage Complete

    Merge the stage branch into intent main.

Once every stage has merged
  • HumanIntent Review PendingFeedback can land here

    Walk roles in order: spec, continuity, then the human gate. Each role gets its own tick. Reviewers and the human can both file feedback against the whole intent.

  • Intent Reviews Signed

    Merge the intent into the delivery branch.

  • Sealed

    Nothing left. The intent is closed.

Within a stage, the cursor loops review → execute → approve until the stage merges. Then the next stage starts at the top.

The unit-level state machine is just as bare. The engine reads the unit's iteration history and decides the next hat. Last result was advance and there's another configured hat? Return that hat. Last result was reject? Walk back one. Terminal hat advanced? Done. No inference involved. The same inputs always produce the same outputs, which is exactly what prose can't give you.

One instruction at a time

This is the shape of every cursor tick: the agent ticks, the cursor returns one action, the agent does that one thing, the agent ticks again. There is no "here is the seven-step plan, please execute it." There is no skill the model loads and tries to walk to the end of. The agent's loop is small enough to fit in a card: tick, do, tick.

A few ticks in sequence:

Forward motion
Drift, then feedback, then back to forward progress.
tick 1The sweep catches a stale signed slot

Someone — another agent, a reviewer, a human — changed a unit on the design stage after the team had already signed off on it. The drift sweep runs at the start of every tick and notices the body hash no longer matches.

Cursor returnsDrift Detected
Agent doesThe agent inspects the change. If it matters, file feedback against the unit so the fix loop can address it. If it's cosmetic, clear the drift signal so the cursor resumes forward progress.
tick 2A fix-hat opens on the new feedback

Drift is clean now. The feedback track walks every stage's open feedback in priority order and finds the one the sweep just wrote. The cursor hands it to the first fix-hat configured on the stage.

Cursor returnsFeedback Awaiting Fix-Hat
Agent doesOne classifier subagent, one finding. It reads the feedback, classifies it as scope drift, and advances the feedback to the next fix-hat.
tick 3Main track resumes, two architects in parallel

Drift clean, feedback clean. The cursor walks the active stage, finds two units with their dependencies satisfied and waiting for the first hat, and dispatches them as a single wave.

Cursor returnsUnits Ready to Execute
Agent doesTwo architect subagents spawn at once — one per unit. Each writes its unit's spec and advances its hat. Then the wave is done.

The agent never picked the next move. The cursor said "drift", then "feedback", then "wave-ready architect." Each tick, the agent's only job was to dispatch and wait.

Cursor motion isn't always forward. The same shape handles "go back" — usually because a fix-hat wrote new work on a stage everyone thought was done.

The cursor moves back
Feedback on an earlier stage preempts forward progress, then the earlier stage opens new work.
tick 1Feedback on a merged stage preempts the active one

The build stage was mid-flight when a reviewer filed a finding against the design stage from earlier in the pipeline. The feedback track walks earlier stages before the active one, so design's open feedback beats build's pending work.

Cursor returnsFeedback Awaiting Fix-Hat
Agent doesA classifier subagent on the design stage reads the new feedback and decides the fix needs a brand-new unit. It writes that unit on design and advances the feedback.
tick 2The cursor sits on design now

Adding a unit to design put that stage ahead of intent main again. The next tick, the engine activates design as the current stage instead of build. Once design re-passes review and re-merges, build resumes from where it left off.

Cursor returnsUnits Ready to Execute
Agent doesOne architect subagent on design, against the brand-new unit the fix-hat just wrote. Build waits.

Nothing in either sequence is the agent "deciding to revisit." The disk changed, the cursor noticed, the next tick reflected the new state. Going back is just forward motion on a different stage.

That's the architectural shift. In a prompt-based harness, the model is the program counter. In H·AI·K·U, the cursor is the program counter and the model is one stage inside the CPU. The model picks how to do the current hat well. The cursor picks what hat is current. The boundary between those two jobs is a function return value rather than a paragraph that begins with "IMPORTANT."

You can feel the difference in what the agent stops needing to remember. It doesn't track which feedback is open across which stages — the feedback track walks them in priority order and hands the next one back. It doesn't decide whether the workflow has stalled — the cursor returns null and the tick is a noop. It doesn't decide when a unit is done — the iterations array makes that mechanical. The mental load that used to live in the system prompt now lives on disk, in a function that can be unit-tested.

Drift, watched by the sweep

Drift is the easiest place to see this. Every signed slot — a spec witness, a unit output, a discovery artifact — stamps a content hash at sign time. Markdown gets body-hashed so the engine's own frontmatter mutations don't trip false events; binaries get full-file-hashed. The sweep re-hashes every signed slot at the start of every tick and emits a drift event for any mismatch. The cursor turns those events into open feedback the next pass picks up.

The agent never reasons about whether content drifted. The sweep does it mechanically, before any handler runs, before any prompt loads. The prompt-based equivalent would be a paragraph in the system message saying "before continuing, verify that previously-signed artifacts have not been modified" — and that paragraph stops being load-bearing the moment the context window gets crowded.

Feedback, dispatched one at a time

The fix loop runs on the same shape. When adversarial review opens findings, the cursor doesn't hand the agent the whole list. It reads the stage's configured fix-hat sequence — for the development stage, that's classifier, then builder, then feedback-assessor — and dispatches one hat against one finding at a time. Each hat runs as its own subagent with a single job: read this finding, decide or do the next thing, advance or reject. When the terminal hat advances, the cursor reports the feedback as resolved and the file moves to closed.

The feedback track walks every prior stage before touching the current one, so an upstream finding gets attention before forward progress resumes. The agent never picks which FB to address next. There's no pick to make — the cursor has already enumerated them in priority order, and the next tick will hand back the head of the queue. One instruction at a time, all the way down.

What's left for prompts (and where they live)

Plenty — and the honest version of "we got prompts out of the workflow" is that most of them moved into subagents, not that they disappeared. Each hat is dispatched as a fresh Task with a clean context window. That subagent loads exactly what it needs (the hat's mandate, the unit or FB it's working on, a tool whitelist) and runs to completion. When it returns, its context dies with it. The next hat is a different subagent, also clean.

Only two things live on the main agent's context:

  • Elaboration, because elaboration is collaboration. The user and the agent talk about scope, tradeoffs, design directions, the actual question of what to build. That conversation history is the work — there's no clean-context version of it that doesn't lose what was said.
  • Orchestration, because something has to call the cursor, read what comes back, and dispatch the next subagent. But the orchestrator's prompt is small. It doesn't carry the workflow logic — that's in the cursor. It just relays.

Everything else — every hat, every reviewer, every fix-loop assessor — is a subagent that boots into a clean context, reads one mandate, does one job, returns one signal. The hat files still describe what good work looks like for each role: what a clean implementation reads like for the builder, what a coherent spec reads like for the spec-reviewer, what genuine closure smells like for the feedback-assessor. That's craft and judgement, and prose is the right tool for both. What we pulled out of the main context was the workflow load — the "first do this, then do that, and don't forget to update X" sequencing that was always going to fray the moment the conversation got long.

The cursor on disk plus the per-hat prompt in a fresh window is a different shape than one giant prompt the model parses every turn. Same number of words, maybe, across the whole system. Very different number of words competing for any single context.

Brian's piece names a real ceiling, and a smarter model isn't what gets you under it. A smaller ask is — one move at a time, decided on disk, with the prose describing the work inside each move's own clean window. v4 is the cleanest version of that idea H·AI·K·U has shipped, and it's still the same idea we started with.