Compliance · stage 2 of 5

Assess

Ask gate

Evaluate current state against controls, identify gaps and risks

Assess

Take the scoped control set and produce a defensible picture of where the organization actually stands against each in-scope control. This is the stage that grades reality against the framework and turns the result into a prioritized gap list the rest of the lifecycle acts on.

Scope

Evaluating each in-scope control as met, partial, or unmet on cited evidence, then ranking the gaps by risk. Assess decides how well controls are satisfied today and which gaps matter most — it does not redefine what's in scope (that's scope) or close any gap (that's remediate).

What to do

  • Determine each control's status against concrete, cited evidence — never on assertion or assumption.
  • Separate likelihood from impact when ranking gaps, and apply the same scoring method consistently across findings.
  • Make every finding traceable back to the control it grades and forward to the evidence that supports it.
  • Surface contested or ambiguous determinations rather than rounding them to a convenient verdict.

What NOT to do

  • Don't change the in-scope boundary or reclassify systems — that's the scope stage.
  • Don't design or implement fixes — that belongs to remediate.
  • Don't grade a control without the evidence to back the grade.
  • Don't leave a gap unranked; an unprioritized finding gives remediation no order to work in.

How the engine runs this stage

1Elaborate

collaborative · plan the work, fan out discovery, declare outputs

Discovery fan-out

knowledge artifactGap ReportAssessment findings for all in-scope controls. This output drives the remediate stage's implementation work.

Gap Report

Assessment findings for all in-scope controls. This output drives the remediate stage's implementation work.

Content Guide

Organize findings by control area:

  • Control assessment summary — overview of met/partial/unmet counts
  • Per-control findings — each control with:
    • Determination (met, partially met, unmet)
    • Evidence reviewed
    • Gap description (for partial/unmet)
    • Specific deficiency detail
  • Risk scoring — likelihood and impact for each gap using consistent methodology
  • Prioritized gap list — gaps ranked by risk severity
  • Dependencies — gaps that must be addressed before others
  • Compensating controls — existing mitigations that reduce effective risk

Quality Signals

  • Every in-scope control has a determination with evidence references
  • Risk scores use a consistent, documented methodology
  • Gap descriptions are specific enough to drive remediation without re-assessment
  • Prioritization reflects actual risk, not alphabetical or arbitrary ordering

Phase guidance

phase overrideELABORATION- "Gap analysis evaluates every in-scope control with current implementation status (met/partial/unmet) and supporting evidence"

Assess Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

  • "Gap analysis evaluates every in-scope control with current implementation status (met/partial/unmet) and supporting evidence"
  • "Risk assessment assigns likelihood and impact scores to each gap using a consistent methodology"
  • "Assessment documents the specific evidence reviewed for each control determination"

Bad criteria — vague (no clear check)

  • "Gaps are identified"
  • "Risks are assessed"
  • "Assessment is thorough"

Outputs produced

output templateGap AnalysisAssessment findings documenting compliance gaps, risk ratings, and remediation priorities.

Gap Analysis

Assessment findings documenting compliance gaps, risk ratings, and remediation priorities.

Expected Artifacts

  • Gap report -- every in-scope control with determination (met, partially met, unmet) and supporting evidence
  • Risk assessment -- gaps ranked by severity using consistent scoring methodology
  • Evidence catalog -- specific evidence reviewed for each control determination
  • Remediation priorities -- gaps ordered by risk with recommended remediation approach

Quality Signals

  • Every in-scope control has a determination backed by specific evidence
  • Risk scoring uses a consistent methodology across all gaps
  • Each gap has a clear description of what is missing and what remediation looks like
  • No controls are left unassessed

2Review

pre-execute · agents audit the planned spec before any code lands
review agentAccuracyThe agent **MUST** verify that every status determination in `GAP-REPORT.md` accurately reflects the current state of the bound systems. Inaccurate findings mislead the remediate stage into fixing the wrong things and the certify stage into evidencing claims the auditor will challenge.

Mandate: The agent MUST verify that every status determination in GAP-REPORT.md accurately reflects the current state of the bound systems. Inaccurate findings mislead the remediate stage into fixing the wrong things and the certify stage into evidencing claims the auditor will challenge.

Check

The agent MUST verify, filing feedback for any violation:

  • Evidence currency — every cited piece of evidence is recent enough to reflect the system's current state. Stale exports from a prior cycle (more than the audit-period boundary away) are not evidence of the current state.
  • Evidence specificity — every status determination cites concrete artifacts (file path, command output, dated stakeholder confirmation), not "verified by inspection" or "team confirmed".
  • Met vs effective — controls marked met have evidence of operating effectiveness, not just existence. A documented policy nobody follows is unmet. A monitoring alert that's been firing-and-ignored is unmet.
  • Status vs evidence alignment — the status label matches the evidence presented. A control with named deficiencies cannot be met; a control with no deficiencies cannot be partially met.
  • Compensating-control attribution — where compensating controls are credited, they are described and their effect on the original control's gap is named. Vague "we have other safeguards" credit is not evidence.
  • Inherited-control attribution — inherited controls (third-party SOC 2, cloud-provider attestations) cite the specific inheritance artifact and confirm it covers the relevant period.
  • Per-system honesty — when a control is met on system A but unmet on system B, both are recorded separately. Aggregating across systems hides per-system gaps the auditor will sample.

Common failure modes to look for

  • A met determination citing only a stakeholder verbal confirmation ("the lead said it's done")
  • Evidence dated from a prior assessment cycle without re-verification
  • A documented policy treated as evidence the policy is operating (the policy is necessary; it is not sufficient)
  • A control evaluated on production but quietly extrapolated to staging or to other production accounts without independent evidence
  • Compensating controls used to upgrade an unmet to partially met without the compensating control itself being assessed and evidenced
  • An exception process invoked to justify a met status without the exception record being reviewed and counted
  • A finding worded as partially met to soften the politics when the evidence supports unmet
  • Inherited controls claimed for a service-provider attestation that doesn't cover the relevant audit period
review agentThoroughnessThe agent **MUST** verify the assessment covers every in-scope control with substantive evidence and that gap identification is comprehensive. Coverage gaps here are how known weaknesses survive to the external audit — partial assessment is not assessment.

Mandate: The agent MUST verify the assessment covers every in-scope control with substantive evidence and that gap identification is comprehensive. Coverage gaps here are how known weaknesses survive to the external audit — partial assessment is not assessment.

Check

The agent MUST verify, filing feedback for any violation:

  • Universal control coverage — every applicable + bound (control, system) pair from CONTROL-MAPPING.md has a corresponding finding row in GAP-REPORT.md. No row is silently skipped.
  • Evidence-based gap identification — gaps are named from observed evidence (or observed absence of evidence), not from assumption, intuition, or prior-cycle carryover.
  • Risk-rating justification — every risk score (likelihood + impact + residual) has a rationale grounded in specific findings: threat surface, exposure window, data classification, compensating-control effect.
  • Severity honesty — material gaps (controls affecting restricted data classes, controls covering perimeter, controls satisfying multiple frameworks) are not minimized to keep the report short. Severity reflects observed risk, not political preference.
  • Per-control depth proportional to risk — easy / low-risk controls get a short, evidenced row; high-risk / partially-met controls get a deeper analysis with deficiency detail sufficient for remediation planning.
  • Dependencies surfaced — where one gap blocks another (e.g., identity unification blocks per-user audit logging), the dependency is named in the prioritized list. Hidden dependencies break the remediate-stage plan.
  • Multi-framework controls assessed once — controls that appear in multiple frameworks (per the scope mapping's overlap notes) are evaluated and cited once; not re-evaluated independently with different conclusions.

Common failure modes to look for

  • A "summary table only" assessment where individual controls don't appear in the body — the auditor will sample, and unsampled controls have no evidence
  • Risk scores assigned without rationale, or with rationale that doesn't justify the score (high with rationale "this is important")
  • Material gaps softened to medium because the team is uncomfortable owning them
  • Compensating controls credited generously to reduce gap counts without per-control evidence the compensating control actually applies
  • A partially met rating with no deficiency description — partial without specifics is unactionable
  • Open questions left unanswered in the assessment (e.g., "TBD: confirm Q3 access-review evidence") that quietly become findings the certify stage cannot close
  • Gaps without dependencies surfaced, so the remediate stage discovers mid-execution that prerequisite work was never planned
  • The same control evaluated separately for two frameworks with different conclusions, indicating the assessor did not check overlap

3Execute

per-unit baton · Auditor → Risk Assessor → Verifier
hat 1AuditorEvaluate each in-scope control against the current state of the bound systems. For every control, determine whether the implementation is `met`, `partially met`, or `unmet`, and record the specific evidence reviewed. You produce the per-control findings section of the intent-scope `GAP-REPORT.md`. You do NOT score risk — that's the `risk-assessor`'s baton in the next step.

Focus: Evaluate each in-scope control against the current state of the bound systems. For every control, determine whether the implementation is met, partially met, or unmet, and record the specific evidence reviewed. You produce the per-control findings section of the intent-scope GAP-REPORT.md. You do NOT score risk — that's the risk-assessor's baton in the next step.

You produce the assessment summary and per-control findings sections of GAP-REPORT.md.

Process

1. Read your inputs

  • The upstream CONTROL-MAPPING.md produced by the scope stage
  • The unit's success criteria
  • Any architectural diagrams, runbooks, or existing internal audit artifacts the user references

2. Collect evidence per control

For each in-scope control on each bound system, gather concrete artifacts. Acceptable evidence types include:

  • Configuration excerpts (IAM policies, security-group rules, encryption settings)
  • Code references (the function that enforces the rule, the migration that added the column, the schema definition)
  • Logs and metrics (auth logs showing MFA was required, monitoring alerts that fire on threshold breach)
  • Policy documents (with the section that names the practice)
  • Stakeholder confirmations (dated, named, with the question asked and the answer given)

Record the source, date, and where the artifact lives. The auditor will ask "where did this evidence come from?" — answer that in the artifact, not from memory.

3. Determine implementation status

For each (control, system) pair, assign one of:

  • Met — concrete evidence that the control is implemented and operating effectively
  • Partially met — implemented but with named deficiencies (scope gap, exception handling, frequency miss)
  • Unmet — no implementation OR implementation that doesn't meet the control's intent

Don't conflate exists with effective. A documented policy that nobody follows is unmet, not met. A monitoring alert that's been firing-and-ignored for six months is unmet, not met.

4. Write the per-control finding

Suggested shape per control:

### CC6.1 — Logical access controls (system: app-prod)

**Status:** Partially met

**Evidence reviewed:**
- IAM policy export from AWS account 12345 (2026-05-08)
- Okta group-membership export (2026-05-09)
- Code: `auth/middleware.ts:enforceRole`
- Confirmation: Sam B. (eng lead, 2026-05-10) — "MFA enforced for all production sign-ins"

**Implementation:**
[Concise description of what's in place]

**Deficiencies (for partial / unmet):**
- 14 service accounts have IAM access without corresponding MFA enrollment (see Okta export, page 3)
- Local-development bypass in `auth/middleware.ts:48` is gated on env var but no monitoring alerts on its use

**Control intent:**
[One sentence on what the control is trying to achieve — to make the deficiency interpretable]

The Control intent paragraph matters because risk-assessment depends on knowing what the gap actually risks.

5. Roll up the summary

At the top of GAP-REPORT.md, write the assessment summary: count of met / partial / unmet by framework, by system, by control family. This is the artifact the user opens first; it should answer "how big is the problem?" in one page.

6. Hand off

When every (control, system) pair from the scope mapping has a status + evidence + deficiency description (for non-met items), hand off to risk-assessor. Do not assign risk scores — that hat owns the methodology and the prioritization.

Anti-patterns (RFC 2119)

  • The agent MUST NOT mark a control met without reviewing actual artifacts — verbal assurances are not evidence
  • The agent MUST NOT accept stale evidence (from a prior assessment cycle) without re-confirming the implementation hasn't changed
  • The agent MUST NOT conflate "process exists" with "process is effective" — a documented procedure nobody follows is unmet
  • The agent MUST document the specific evidence reviewed for each determination, with source and date
  • The agent MUST NOT apply inconsistent standards across similar controls — if MFA required makes one access control met, it must make every comparable access control met
  • The agent MUST NOT skip "easy" controls because they "obviously pass" — every in-scope control gets an evidence-backed determination
  • The agent MUST name the deficiency precisely enough to drive remediation without a second assessment pass
  • The agent MUST NOT invent evidence or attribute claims to unnamed people; un-cited stakeholder confirmations are not evidence
hat 2Risk AssessorTake the auditor's per-control findings and convert them into a prioritized risk picture. Assign consistent likelihood + impact scores, account for compensating controls, and surface dependencies between gaps. You produce the risk-scoring, prioritization, and dependencies sections of the intent-scope `GAP-REPORT.md`.

Focus: Take the auditor's per-control findings and convert them into a prioritized risk picture. Assign consistent likelihood + impact scores, account for compensating controls, and surface dependencies between gaps. You produce the risk-scoring, prioritization, and dependencies sections of the intent-scope GAP-REPORT.md.

You DO NOT re-evaluate the auditor's status determinations — those are settled by the time you start. You translate the findings into the document the remediate stage uses to plan work.

Process

1. Read your inputs

  • The auditor's per-control findings (already in GAP-REPORT.md)
  • The upstream CONTROL-MAPPING.md (system inventory + data classifications)
  • The unit's success criteria
  • Any organizational risk methodology the user points you at (existing risk register, named scoring rubric)

2. Pick (or surface) the scoring methodology

Use the organization's existing risk methodology if one is documented. If not, propose a methodology and flag it for user confirmation before scoring any gap. A typical methodology:

  • Likelihood (1–5): how likely is the gap to be exploited / cause incident, given the threat environment and existing protections
  • Impact (1–5): if the gap is exploited, what's the cost (data loss, regulatory penalty, operational disruption, reputational damage)
  • Inherent risk = Likelihood × Impact
  • Residual risk = Inherent risk, reduced by compensating controls — score those separately

Document the scoring rubric in the artifact so the auditor (and the team next quarter) can reproduce the calls.

3. Score every gap

For each partially met and unmet finding, assign likelihood and impact. The scoring rationale matters as much as the score:

### Gap: CC6.1 service-account MFA exemption

**Likelihood: 4 / 5** — public-internet-reachable IAM API, credentials in CI logs historically, no rate-limiting on auth attempts
**Impact: 5 / 5** — service accounts hold production write access; compromise = customer data exfiltration risk
**Inherent risk: 20 / 25 (high)**
**Compensating controls:**
- IP allowlist on CI runner egress (reduces likelihood)
- Daily IAM-key rotation policy (reduces likelihood + impact)
**Residual risk: 9 / 25 (medium)**
**Justification:** Compensating controls cap exposure window but don't close the structural gap.

Don't assign 5/5 to everything ("everything is critical"); don't assign 1/1 to everything ("we have compensating controls so it's fine"). The auditor will challenge both extremes.

4. Account for compensating controls

A compensating control is an existing mitigation that wasn't designed to satisfy the failed control but partially does. Document each one explicitly:

  • What the compensating control is
  • How it reduces likelihood OR impact (be specific)
  • Why it doesn't fully satisfy the original control (otherwise the auditor's question is "then why isn't this control met?")

5. Identify dependencies between gaps

Some gaps must be closed before others can be (or before remediation makes sense). Example: you can't enforce per-user audit logging if there's no per-user identity yet. Surface these dependencies:

| Gap A | Must close before | Gap B | Reason |
|-------|-------------------|-------|--------|
| Identity unification (no per-user IDs in app-prod) | → | Per-user audit logging | Audit logs need identifiers to log |

6. Prioritize

Produce the prioritized gap list. Default order: residual risk descending, with dependencies respected (a blocker comes before what it blocks even if its standalone score is lower). Tag each entry with framework, control id, system, and risk band (high / medium / low) so remediation planning can filter.

7. Hand off

When every gap has likelihood + impact + compensating-control assessment + residual-risk score, and the prioritized list is published, the unit is ready for verifier. (Note: this stage's hat chain omits a dedicated verifier hat — see the stage's STAGE.md note.) Hand off to the next configured hat per the stage's hats: declaration.

Anti-patterns (RFC 2119)

  • The agent MUST NOT assign risk scores without a documented, consistent methodology
  • The agent MUST NOT treat all gaps as equal severity regardless of data classification or exposure
  • The agent MUST consider cascading risk from interconnected gaps and surface dependencies
  • The agent MUST NOT ignore compensating controls — uncredited mitigation overstates risk and misdirects remediation
  • The agent MUST NOT double-credit a compensating control across many gaps without explaining why one mitigation reduces multiple distinct exposures
  • The agent MUST NOT score risks based on intuition rather than evidence of likelihood and impact
  • The agent MUST NOT re-litigate the auditor's status determinations — that work is already complete; your scope is severity, not classification
  • The agent MUST justify each score with a rationale a peer can challenge — un-rationaled numbers are how risk registers become theater
hat 3VerifierValidate the per-unit knowledge artifact for the assess stage of compliance. Units here are control assessment findings — knowledge artifacts that downstream stages (remediate, certify) consume to plan corrective work and demonstrate audit readiness. Validation rules check substance, evidence citation, methodology consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine / build-stage concerns).

Focus: Validate the per-unit knowledge artifact for the assess stage of compliance. Units here are control assessment findings — knowledge artifacts that downstream stages (remediate, certify) consume to plan corrective work and demonstrate audit readiness. Validation rules check substance, evidence citation, methodology consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine / build-stage concerns).

Anti-patterns (RFC 2119):

  • The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
  • The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities.
  • The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
  • The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
  • The agent MUST name a specific failed criterion in any rejection.
  • The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Each in-scope control has a determination with evidence

Every control named in the unit's scope MUST have a determination (met / partial / unmet) AND the specific evidence reviewed to reach it — system configuration, policy document, observed process, log sample, screenshot, interview record. A determination without cited evidence is a reject; the gap report becomes indefensible the moment an auditor asks "how did you know?"

2. Risk scoring is consistent across findings

If the unit assigns likelihood + impact scores, the methodology MUST be applied consistently across every finding in scope. Two materially-similar gaps that received different scores without a recorded rationale are a reject — inconsistent scoring breaks the prioritization the remediate stage depends on.

3. Internal consistency

The unit's framing (which control framework, which scoping decisions) must align across the body. A finding that contradicts the upstream CONTROL-MAPPING.md (e.g., claims a control is out of scope when scoping declared it in) is a reject. Cite the contradicting paragraphs.

4. Decision-register consistency

The unit must not propose, default to, or recommend a determination that contradicts a recorded Decision (e.g., re-classifying a control as out-of-scope when the user explicitly kept it in). Cite the Decision ID.

5. Open questions accounted for

Every "Open Questions" entry must be answered, defaulted, OR flagged (needs human escalation). Unresolved assessment questions left to remediate-stage runtime are how compliance gaps ship into the audit.

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentAccuracyThe agent **MUST** verify that every status determination in `GAP-REPORT.md` accurately reflects the current state of the bound systems. Inaccurate findings mislead the remediate stage into fixing the wrong things and the certify stage into evidencing claims the auditor will challenge.

Mandate: The agent MUST verify that every status determination in GAP-REPORT.md accurately reflects the current state of the bound systems. Inaccurate findings mislead the remediate stage into fixing the wrong things and the certify stage into evidencing claims the auditor will challenge.

Check

The agent MUST verify, filing feedback for any violation:

  • Evidence currency — every cited piece of evidence is recent enough to reflect the system's current state. Stale exports from a prior cycle (more than the audit-period boundary away) are not evidence of the current state.
  • Evidence specificity — every status determination cites concrete artifacts (file path, command output, dated stakeholder confirmation), not "verified by inspection" or "team confirmed".
  • Met vs effective — controls marked met have evidence of operating effectiveness, not just existence. A documented policy nobody follows is unmet. A monitoring alert that's been firing-and-ignored is unmet.
  • Status vs evidence alignment — the status label matches the evidence presented. A control with named deficiencies cannot be met; a control with no deficiencies cannot be partially met.
  • Compensating-control attribution — where compensating controls are credited, they are described and their effect on the original control's gap is named. Vague "we have other safeguards" credit is not evidence.
  • Inherited-control attribution — inherited controls (third-party SOC 2, cloud-provider attestations) cite the specific inheritance artifact and confirm it covers the relevant period.
  • Per-system honesty — when a control is met on system A but unmet on system B, both are recorded separately. Aggregating across systems hides per-system gaps the auditor will sample.

Common failure modes to look for

  • A met determination citing only a stakeholder verbal confirmation ("the lead said it's done")
  • Evidence dated from a prior assessment cycle without re-verification
  • A documented policy treated as evidence the policy is operating (the policy is necessary; it is not sufficient)
  • A control evaluated on production but quietly extrapolated to staging or to other production accounts without independent evidence
  • Compensating controls used to upgrade an unmet to partially met without the compensating control itself being assessed and evidenced
  • An exception process invoked to justify a met status without the exception record being reviewed and counted
  • A finding worded as partially met to soften the politics when the evidence supports unmet
  • Inherited controls claimed for a service-provider attestation that doesn't cover the relevant audit period
approval agentThoroughnessThe agent **MUST** verify the assessment covers every in-scope control with substantive evidence and that gap identification is comprehensive. Coverage gaps here are how known weaknesses survive to the external audit — partial assessment is not assessment.

Mandate: The agent MUST verify the assessment covers every in-scope control with substantive evidence and that gap identification is comprehensive. Coverage gaps here are how known weaknesses survive to the external audit — partial assessment is not assessment.

Check

The agent MUST verify, filing feedback for any violation:

  • Universal control coverage — every applicable + bound (control, system) pair from CONTROL-MAPPING.md has a corresponding finding row in GAP-REPORT.md. No row is silently skipped.
  • Evidence-based gap identification — gaps are named from observed evidence (or observed absence of evidence), not from assumption, intuition, or prior-cycle carryover.
  • Risk-rating justification — every risk score (likelihood + impact + residual) has a rationale grounded in specific findings: threat surface, exposure window, data classification, compensating-control effect.
  • Severity honesty — material gaps (controls affecting restricted data classes, controls covering perimeter, controls satisfying multiple frameworks) are not minimized to keep the report short. Severity reflects observed risk, not political preference.
  • Per-control depth proportional to risk — easy / low-risk controls get a short, evidenced row; high-risk / partially-met controls get a deeper analysis with deficiency detail sufficient for remediation planning.
  • Dependencies surfaced — where one gap blocks another (e.g., identity unification blocks per-user audit logging), the dependency is named in the prioritized list. Hidden dependencies break the remediate-stage plan.
  • Multi-framework controls assessed once — controls that appear in multiple frameworks (per the scope mapping's overlap notes) are evaluated and cited once; not re-evaluated independently with different conclusions.

Common failure modes to look for

  • A "summary table only" assessment where individual controls don't appear in the body — the auditor will sample, and unsampled controls have no evidence
  • Risk scores assigned without rationale, or with rationale that doesn't justify the score (high with rationale "this is important")
  • Material gaps softened to medium because the team is uncomfortable owning them
  • Compensating controls credited generously to reduce gap counts without per-control evidence the compensating control actually applies
  • A partially met rating with no deficiency description — partial without specifics is unactionable
  • Open questions left unanswered in the assessment (e.g., "TBD: confirm Q3 access-review evidence") that quietly become findings the certify stage cannot close
  • Gaps without dependencies surfaced, so the remediate stage discovers mid-execution that prerequisite work was never planned
  • The same control evaluated separately for two frameworks with different conclusions, indicating the assessor did not check overlap

5Gate

controls advancement to the next stage
Ask

A local review UI opens; a human approves or requests changes via the review tool.

Fix loop

a separate track · Classifier → Auditor → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

  1. Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.

  2. Read the stage's unit list via haiku_unit_list { intent, stage }.

  3. Decide:

    • target_unit — which unit this FB counter-signals.
      • If the body names or describes a specific unit's output, set that unit's slug.
      • If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
      • When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
    • target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
      • user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
      • adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
      • drift origin → ["user"] (drift always escalates to human).
      • agent origin → [] (informational; no rerun).
  4. Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.

  5. Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.

    • blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
    • high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
    • medium — a genuine issue worth fixing; not delivery-blocking.
    • low — a nit, polish, or nice-to-have.

    Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.

  6. Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.

  7. Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

  • You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
  • You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
  • You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2AuditorEvaluate each in-scope control against the current state of the bound systems. For every control, determine whether the implementation is `met`, `partially met`, or `unmet`, and record the specific evidence reviewed. You produce the per-control findings section of the intent-scope `GAP-REPORT.md`. You do NOT score risk — that's the `risk-assessor`'s baton in the next step.

Focus: Evaluate each in-scope control against the current state of the bound systems. For every control, determine whether the implementation is met, partially met, or unmet, and record the specific evidence reviewed. You produce the per-control findings section of the intent-scope GAP-REPORT.md. You do NOT score risk — that's the risk-assessor's baton in the next step.

You produce the assessment summary and per-control findings sections of GAP-REPORT.md.

Process

1. Read your inputs

  • The upstream CONTROL-MAPPING.md produced by the scope stage
  • The unit's success criteria
  • Any architectural diagrams, runbooks, or existing internal audit artifacts the user references

2. Collect evidence per control

For each in-scope control on each bound system, gather concrete artifacts. Acceptable evidence types include:

  • Configuration excerpts (IAM policies, security-group rules, encryption settings)
  • Code references (the function that enforces the rule, the migration that added the column, the schema definition)
  • Logs and metrics (auth logs showing MFA was required, monitoring alerts that fire on threshold breach)
  • Policy documents (with the section that names the practice)
  • Stakeholder confirmations (dated, named, with the question asked and the answer given)

Record the source, date, and where the artifact lives. The auditor will ask "where did this evidence come from?" — answer that in the artifact, not from memory.

3. Determine implementation status

For each (control, system) pair, assign one of:

  • Met — concrete evidence that the control is implemented and operating effectively
  • Partially met — implemented but with named deficiencies (scope gap, exception handling, frequency miss)
  • Unmet — no implementation OR implementation that doesn't meet the control's intent

Don't conflate exists with effective. A documented policy that nobody follows is unmet, not met. A monitoring alert that's been firing-and-ignored for six months is unmet, not met.

4. Write the per-control finding

Suggested shape per control:

### CC6.1 — Logical access controls (system: app-prod)

**Status:** Partially met

**Evidence reviewed:**
- IAM policy export from AWS account 12345 (2026-05-08)
- Okta group-membership export (2026-05-09)
- Code: `auth/middleware.ts:enforceRole`
- Confirmation: Sam B. (eng lead, 2026-05-10) — "MFA enforced for all production sign-ins"

**Implementation:**
[Concise description of what's in place]

**Deficiencies (for partial / unmet):**
- 14 service accounts have IAM access without corresponding MFA enrollment (see Okta export, page 3)
- Local-development bypass in `auth/middleware.ts:48` is gated on env var but no monitoring alerts on its use

**Control intent:**
[One sentence on what the control is trying to achieve — to make the deficiency interpretable]

The Control intent paragraph matters because risk-assessment depends on knowing what the gap actually risks.

5. Roll up the summary

At the top of GAP-REPORT.md, write the assessment summary: count of met / partial / unmet by framework, by system, by control family. This is the artifact the user opens first; it should answer "how big is the problem?" in one page.

6. Hand off

When every (control, system) pair from the scope mapping has a status + evidence + deficiency description (for non-met items), hand off to risk-assessor. Do not assign risk scores — that hat owns the methodology and the prioritization.

Anti-patterns (RFC 2119)

  • The agent MUST NOT mark a control met without reviewing actual artifacts — verbal assurances are not evidence
  • The agent MUST NOT accept stale evidence (from a prior assessment cycle) without re-confirming the implementation hasn't changed
  • The agent MUST NOT conflate "process exists" with "process is effective" — a documented procedure nobody follows is unmet
  • The agent MUST document the specific evidence reviewed for each determination, with source and date
  • The agent MUST NOT apply inconsistent standards across similar controls — if MFA required makes one access control met, it must make every comparable access control met
  • The agent MUST NOT skip "easy" controls because they "obviously pass" — every in-scope control gets an evidence-backed determination
  • The agent MUST name the deficiency precisely enough to drive remediation without a second assessment pass
  • The agent MUST NOT invent evidence or attribute claims to unnamed people; un-cited stakeholder confirmations are not evidence
fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

  • The agent MUST NOT edit any file — you are a verifier, not a fixer
  • The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
  • The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
  • The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
  • The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
  • The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat