AG10

AG10 is the standard run for routine comparisons: a chained protocol that extends AG8 with additional deterministic checks and scoring surfaces (coverage and consistency) while staying practical to run.

Mode

Chained (stateful)

Steps

10 total (d* deterministic + one j*)

Best for

Default comparisons

Protocol sketch

d1 → d2 → d3 → d4 → d5 → d6 → d7 → d8 → j9 → d10

What's new vs AG8

Additional deterministic steps add coverage/consistency checks before the judge step and tighten chain-level integrity signals at the end.

Payload admissions

p1 (anchor) admitted early; p2 (authorities) admitted later for open-book synthesis and consistency checks.

Tip: the exact step prompts/contracts are defined by the active run spec.

When to use AG10

Default leaderboard comparisons (a stronger standard run than AG8)
Regression testing when you want more scoring surface area than the baseline
Diagnosing consistency issues that only appear with extra checks

Step breakdown (high level)

Range	Role	Scoring
d1–d6	Core chain (grounding → extraction → analysis)	Deterministic
d7–d8	Additional consistency / coverage checks	Deterministic
j9	Synthesis quality (rubric)	Isolated judge
d10	Integrity constraints	Deterministic

Results

Leaderboard

Pick an AG10 run spec from the dropdown to compare models.

How it works

Methodology

Runner semantics, artifacts, scoring, and integrity policies.