Review rubric

AI Coding Agent PR Review Rubric

A practical rubric for engineers and leads reviewing coding-agent pull requests after the agent runs and before the work merges.

The core rule

Review the agent's authority, not just the final diff. A coding-agent PR is merge-ready only when the approved intent, tool use, validation evidence, and human owner are all visible.

1. Approved intent matches the diff

Can a reviewer connect the final PR back to the pre-execution Goal Contract?

  • The PR states the approved outcome in plain language, not just the agent transcript summary.
  • Changed files stay inside the agreed blast radius or clearly call out where scope expanded.
  • The author explains any agent-initiated detours, retries, or abandoned paths that affected the final diff.

2. Authority and tool use are legible

Can the team see what authority the coding agent actually used?

  • The PR handoff lists high-effect tools used: shell commands, MCP writes, external sends, migrations, deploy steps, or credentialed APIs.
  • Permission expansions are linked to a human approval note instead of being buried in the transcript.
  • Untrusted context such as issues, webpages, docs, and comments did not expand agent authority.

3. Validation evidence is reviewable

Can a reviewer verify success without rerunning the entire agent session mentally?

  • The PR includes targeted test, lint, typecheck, build, or manual verification output relevant to the change.
  • Skipped checks are labeled as blockers or intentional non-applicable checks, not treated as implicit passes.
  • Validation covers both the happy path and the failure mode the agent was asked to address.

4. Human ownership is explicit

Is a human taking responsibility for what merges?

  • A human author states what they reviewed in the diff, generated content, and tool output.
  • Risky areas such as auth, billing, secrets, data migrations, and production behavior get owner review before merge.
  • The final PR description separates agent-generated claims from evidence the team can inspect.

5. Future workflow signal is captured

Did this PR teach the team anything about better agent workflow defaults?

  • Repeated agent confusion, permission asks, or validation failures are captured as template or policy follow-up.
  • The reviewer notes whether this workflow should become low-friction, stay human-gated, or be redesigned.
  • The PR leaves behind reusable examples for similar future coding-agent work.

Simple score band

Give each rubric section 0-4 points: 0 missing, 1 weak, 2 partial, 3 clear, 4 strong. Use the total to decide whether the PR is ready for normal code review or needs a workflow follow-up first.

0-8: Do not merge yet

The PR may contain useful work, but reviewers cannot yet see intent, authority, validation, or ownership clearly enough.

9-16: Needs reviewer follow-up

The change is probably reviewable, but at least one important dimension needs stronger evidence before merge.

17-20: Review-ready

The PR connects approved intent, bounded authority, validation evidence, and human accountability well enough for normal code review.

Use this with the pre-run approval

The strongest review loop starts before the agent runs. Draft a Goal Contract in Caskade, approve the outcome and blast radius, then use this rubric to compare the final PR against the approved plan. Current beta generation is sign-up gated; limited anonymous generation is a later roadmap item.

Bring us your agent workflow

FAQ

How should teams review coding-agent pull requests?

Review both the code and the agent workflow. A good review asks whether the diff matches the approved intent, whether the agent stayed inside its authority, what validation proves success, and which human owner is accountable for the merge.

Is this rubric only for AI-generated code?

No. It is most useful when a coding agent produced or edited the PR, but the same rubric helps any team that wants clearer intent, blast-radius control, validation evidence, and ownership in review.

Where does this fit with Caskade?

Caskade focuses on the pre-execution Goal Contract. This rubric is the matching post-run review habit: compare what was approved before execution with what the agent actually changed and proved before merge.