Back to blog
Architecture

One Instruction at a Time

By Jason Waldrip

v4cursorworkflow-enginedriftfeedbacksubagents

Most agent harnesses today are a markdown file thousands of words long. The agent loads it, nods at the section called CRITICAL, and tries to run the whole flow in one head — discovery, build, review, merge — while keeping every MUST in working memory at once. Halfway through, the prompt is already losing resolution.

Brian Suh's piece on this says the part out loud.

A longer prompt isn't the fix. The fix is to stop asking the model to be the runtime.

The category we're not in

Most harnesses we look at fall into one of two camps.

MatchSame problem, same failure mode

Prompt-based

One long system message describing the workflow, the roles, and the rules. The model loads it and is trusted to execute end to end. Long enough that important rules get crowded out by the next turn's context.

Skills-based

A library of named markdown files the agent loads on demand. Each one is again a self-contained program the model is asked to run start to finish. Shorter individually; same shape underneath.

Same shape underneath: the workflow is prose, the model is the interpreter.

H·AI·K·U has never been that, and the post-mortems we wrote on AI-DLC are why. AI-DLC was prompts-as-runtime. We watched it walk over its own MUSTs more than once. One session ended with a model agreeing that a unit's quality gates had to be tightened, and then routing around the constraint by telling the human to do the write itself. The prompt was clear. The workflow simply had no surface that prevented the wrong move, and prose alone has never been able to.

The reframe we landed on, then and now: the workflow is software. Hats, gates, drift, feedback — real components with real contracts. Prompts describe the work inside a step, not the shape of the program. v4 is where that idea finally has the scaffolding it needed.

1 — Drift track

The drift sweep

Signed slots stamp content hashes. Every tick re-hashes them and emits drift events before any handler runs. No "remember to verify" paragraph in a prompt.

2 — Feedback track

The feedback queue

Findings get dispatched one hat against one finding. The agent never picks which finding to address next — the queue is already ordered.

3 — Main track

The intent walk

Walks stages, units, and hats in order. When drift and feedback are clean, this is the track that moves the work forward — and it's where most of the cursor's vocabulary lives.

The cursor

The center of the v4 engine is a small pure function. It does one thing: read the disk, return the next instruction. No model in the loop, no prompt, no judgement call. Given the same disk state, it returns the same answer.

It walks three tracks, in priority order. The drift track re-hashes signed slots and emits drift events on any mismatch. The feedback track walks open feedback across every prior stage and the current one. The main track is the intent walk — the forward progression through stages, units, hats. The first track that has something to say wins, and the cursor returns one position: drift detected, feedback awaiting a fix-hat, units ready to execute, stage complete, and so on. The agent does the matching action, then ticks again.

Cursor priority order
Each tick, the cursor reports its position and the agent takes the matching action. The first track that has something to say wins.

Drift track

Did another agent or a human change something we already signed off?

  • Drift DetectedFeedback can land here

    The agent inspects the change. If it warrants attention, file feedback so the fix loop can address it. If it's cosmetic, clear the drift signal so the cursor resumes forward progress.

Feedback track

Is open feedback waiting on a fix?

  • Feedback Awaiting Fix-Hat

    Dispatch the next fix-hat against this finding. One hat, one finding, one tick.

  • Feedback Resolved

    Close the feedback record. Forward progress resumes next tick.

Main track

Otherwise — walk forward through the pipeline:

Per stage, in order
  • HumanDesign Direction Required

    Present design options. Wait for the user to pick one.

  • HumanClarifying Questions Pending

    Ask the user the stage's clarifying questions.

  • Discovery Required

    Run the configured discovery agent on units missing its artifact.

  • Stage Empty

    Elaborate — write the unit specs this stage will produce.

  • Spec Review PendingFeedback can land here

    Dispatch a review-agent to read each unit's spec. They file feedback if anything's off.

  • HumanSpec GateFeedback can land here

    Surface the specs to the human reviewer for sign-off. Their feedback is the same shape as anyone else's.

  • Units Ready to Execute

    Run the next hat on every unit whose dependencies are clear. Eligible units run as a wave (parallel subagents); the wave is a mechanic, the job is to execute the units.

  • Quality Gates Pending

    Run the engine's automated checks against the produced work — tests, linting, type-checking, anything the unit declared. Pass or fail by exit code. No subagent involved; the engine runs them.

  • Approval PendingFeedback can land here

    Dispatch a review-agent to evaluate the produced work. Findings come back as feedback.

  • HumanApproval GateFeedback can land here

    Surface the work to the human reviewer for sign-off. They can leave feedback or approve.

  • Stage Complete

    Merge the stage branch into intent main.

Once every stage has merged
  • HumanIntent Review PendingFeedback can land here

    Walk roles in order: spec, continuity, then the human gate. Each role gets its own tick. Reviewers and the human can both file feedback against the whole intent.

  • Intent Reviews Signed

    Merge the intent into the delivery branch.

  • Sealed

    Nothing left. The intent is closed.

Within a stage, the cursor loops review → execute → approve until the stage merges. Then the next stage starts at the top.

The unit-level state machine is just as bare.

The unit state machine, in four rules
  1. Advance with more hats configured. Return the next hat in the sequence.
  2. Reject. Walk back one hat. The previous role gets another pass.
  3. Terminal hat advanced. Unit is done.
  4. No inference involved. Same inputs always produce the same output. That's the thing prose can't give you.

One instruction at a time

This is the shape of every cursor tick: the agent ticks, the cursor returns one action, the agent does that one thing, the agent ticks again. There is no "here is the seven-step plan, please execute it." There is no skill the model loads and tries to walk to the end of. The agent's loop is small enough to fit in a card: tick, do, tick.

A few ticks in sequence:

Forward motion
Drift, then feedback, then back to forward progress.
tick 1The sweep catches a stale signed slot

Someone — another agent, a reviewer, a human — changed a unit on the design stage after the team had already signed off on it. The drift sweep runs at the start of every tick and notices the body hash no longer matches.

Cursor returnsDrift Detected
Agent doesThe agent inspects the change. If it matters, file feedback against the unit so the fix loop can address it. If it's cosmetic, clear the drift signal so the cursor resumes forward progress.
tick 2A fix-hat opens on the new feedback

Drift is clean now. The feedback track walks every stage's open feedback in priority order and finds the one the sweep just wrote. The cursor hands it to the first fix-hat configured on the stage.

Cursor returnsFeedback Awaiting Fix-Hat
Agent doesOne classifier subagent, one finding. It reads the feedback, classifies it as scope drift, and advances the feedback to the next fix-hat.
tick 3Main track resumes, two architects in parallel

Drift clean, feedback clean. The cursor walks the active stage, finds two units with their dependencies satisfied and waiting for the first hat, and dispatches them as a single wave.

Cursor returnsUnits Ready to Execute
Agent doesTwo architect subagents spawn at once — one per unit. Each writes its unit's spec and advances its hat. Then the wave is done.

The agent never picked the next move. The cursor said "drift", then "feedback", then "two units are ready to build in parallel." Each tick, the agent's only job was to dispatch and wait.

Cursor motion isn't always forward. The same shape handles "go back" — usually because a fix-hat wrote new work on a stage everyone thought was done.

The cursor moves back
Feedback on an earlier stage preempts forward progress, then the earlier stage opens new work.
tick 1Feedback on a merged stage preempts the active one

The build stage was mid-flight when a reviewer filed a finding against the design stage from earlier in the pipeline. The feedback track walks earlier stages before the active one, so design's open feedback beats build's pending work.

Cursor returnsFeedback Awaiting Fix-Hat
Agent doesA small classifier reads the new feedback on the design stage, decides the fix needs a brand-new unit, writes that unit on design, and advances the feedback.
tick 2The cursor sits on design now

Adding a unit to design put that stage ahead of intent main again. The next tick, the engine activates design as the current stage instead of build. Once design re-passes review and re-merges, build resumes from where it left off.

Cursor returnsUnits Ready to Execute
Agent doesOne architect subagent on design, against the brand-new unit the fix-hat just wrote. Build waits.

Nothing in either sequence is the agent "deciding to revisit." The disk changed, the cursor noticed, the next tick reflected the new state. Going back is just forward motion on a different stage.

You can feel the difference in what the agent stops needing to remember. It doesn't track which feedback is open across which stages — the feedback track walks them in priority order and hands the next one back. It doesn't decide whether the workflow has stalled — the cursor returns null and the tick is a noop. It doesn't decide when a unit is done — the iterations array makes that mechanical. The mental load that used to live in the system prompt now lives on disk, in a function that can be unit-tested.

Drift, watched by the sweep

Drift is the easiest place to see this. Every signed slot — a spec witness, a unit output, a discovery artifact — stamps a content hash at sign time. The engine notices when files change out-of-band and tells the agent before the next instruction. The cursor turns those events into open feedback the next pass picks up.

The agent never reasons about whether content drifted. The sweep does it mechanically, before any handler runs, before any prompt loads. The prompt-based equivalent would be a paragraph in the system message saying "before continuing, verify that previously-signed artifacts have not been modified" — and that paragraph stops being load-bearing the moment the context window gets crowded.

Feedback, dispatched one at a time

The fix loop runs on the same shape. When adversarial review opens findings, the cursor doesn't hand the agent the whole list. It dispatches one role against one finding at a time. Each role runs as its own fresh invocation with a single job: read this finding, decide or do the next thing, advance or reject. When the terminal role advances, the cursor reports the finding as resolved.

The feedback track walks every prior stage before touching the current one, so an upstream finding gets attention before forward progress resumes. The agent never picks which finding to address next. There's no pick to make — the cursor has already enumerated them in priority order, and the next tick will hand back the head of the queue. One instruction at a time, all the way down.

What's left for prompts

Plenty. The honest version of "we got prompts out of the workflow" is that most of them moved into fresh agent invocations, not that they disappeared. Each role gets dispatched into a clean context window with the mandate, the artifact it's working on, and a tool whitelist. When it returns, that context dies with it.

The hat files still describe what good work looks like for each role: what a clean implementation reads like for the builder, what a coherent spec reads like for the reviewer, what genuine closure smells like for the assessor. That's craft and judgement, and prose is the right tool for both. What we pulled out of the main context was the workflow load — the "first do this, then do that, and don't forget to update X" sequencing that was always going to fray the moment the conversation got long.

Brian's piece names a real ceiling, and a smarter model isn't what gets you under it. A smaller ask is — one move at a time, decided on disk, with the prose describing the work inside each move's own clean window. v4 is the cleanest version of that idea H·AI·K·U has shipped, and it's still the same idea we started with.