AG8

AG8 is the baseline chained protocol: steps run in one continuous, stateful session and downstream steps build on earlier outputs, so errors can propagate. It's designed for regression testing and diagnosing core failure modes (chain collapse, evidence misuse, citation integrity failures).

Mode

Chained (stateful)

Steps

8 total (d* deterministic + one j*)

Outputs

Run artifacts + per-step scores

Protocol sketch

d1 → d2 → d3 → d4 → d5 → d6 → j7 → d8

Payload admissions

p1 (anchor) admitted early; p2 (authorities) admitted later for open-book synthesis.

Integrity

The final step acts as an integrity check. A single fabricated authority is treated as an integrity failure for the run's integrity status.

Tip: the exact step prompts/contracts are defined by the active run spec.

Step breakdown

Step	Purpose	Scoring	Payload
d1	Anchor / known authority grounding	Deterministic	p1
d2	Citation network retrieval	Deterministic	—
d3	Validate authority status	Deterministic	—
d4	Extract facts / posture	Deterministic	—
d5	Distinguish and reconcile authorities	Deterministic	—
d6	Draft synthesis for review	Deterministic	—
j7	Synthesis quality (rubric)	Isolated judge	p2
d8	Citation integrity / hard constraints	Deterministic	—

What it measures

Chained reliability (does the model maintain state across steps?)
Evidence-respecting behavior under staged admissions
Citation integrity (fabricated authority penalties)
Diagnosable failure modes via per-step outputs and scores

Results

Leaderboard

Pick an AG8 run spec from the dropdown to compare models.

How it works

Methodology

Runner semantics, artifacts, scoring, and integrity policies.