AG10
AG10 is the standard run for routine comparisons: a chained protocol that extends AG8 with additional deterministic checks and scoring surfaces (coverage and consistency) while staying practical to run.
Mode
Chained (stateful)
Steps
10 total (d* deterministic + one j*)
Best for
Default comparisons
Protocol sketch
d1
→
d2
→
d3
→
d4
→
d5
→
d6
→
d7
→
d8
→
j9
→
d10
What's new vs AG8
Additional deterministic steps add coverage/consistency checks before the judge step and tighten chain-level
integrity signals at the end.
Payload admissions
p1 (anchor) admitted early; p2 (authorities) admitted
later for open-book synthesis and consistency checks.
Tip: the exact step prompts/contracts are defined by the active run spec.
When to use AG10
- Default leaderboard comparisons (a stronger standard run than AG8)
- Regression testing when you want more scoring surface area than the baseline
- Diagnosing consistency issues that only appear with extra checks
Step breakdown (high level)
| Range | Role | Scoring |
|---|---|---|
| d1–d6 | Core chain (grounding → extraction → analysis) | Deterministic |
| d7–d8 | Additional consistency / coverage checks | Deterministic |
| j9 | Synthesis quality (rubric) | Isolated judge |
| d10 | Integrity constraints | Deterministic |
Results
Leaderboard
Pick an AG10 run spec from the dropdown to compare models.
How it works
Methodology
Runner semantics, artifacts, scoring, and integrity policies.