AG10

AG10 is the standard run for routine comparisons: a chained protocol that extends AG8 with additional deterministic checks and scoring surfaces (coverage and consistency) while staying practical to run.

Mode
Chained (stateful)
Steps
10 total (d* deterministic + one j*)
Best for
Default comparisons

Protocol sketch

d1 d2 d3 d4 d5 d6 d7 d8 j9 d10
What's new vs AG8
Additional deterministic steps add coverage/consistency checks before the judge step and tighten chain-level integrity signals at the end.
Payload admissions
p1 (anchor) admitted early; p2 (authorities) admitted later for open-book synthesis and consistency checks.

Tip: the exact step prompts/contracts are defined by the active run spec.

When to use AG10

  • Default leaderboard comparisons (a stronger standard run than AG8)
  • Regression testing when you want more scoring surface area than the baseline
  • Diagnosing consistency issues that only appear with extra checks

Step breakdown (high level)

Range Role Scoring
d1–d6 Core chain (grounding → extraction → analysis) Deterministic
d7–d8 Additional consistency / coverage checks Deterministic
j9 Synthesis quality (rubric) Isolated judge
d10 Integrity constraints Deterministic
Results
Leaderboard
Pick an AG10 run spec from the dropdown to compare models.
How it works
Methodology
Runner semantics, artifacts, scoring, and integrity policies.