Benchmarks

LegalChain is a family of sealed, stateful benchmarks (AG8, AG10, AG15...) executed by one auditable runner. Each evaluation unit (EU) is an on-disk instance containing staged, model-visible payloads and EU-private ground truth for scoring. Every run emits reproducible artifacts (e.g. run.jsonl, audit_log.jsonl, summary.json).

Baseline
AG8
10-step chained baseline for regression testing and diagnosing core failure modes.
Standard
AG10
10-step chained standard with broader coverage and consistency checks.
Atomic
L7
7 independent skills, evaluated without chaining or error propagation.
Coming
Roadmap
AG15 flagship, practice-area variants, and infrastructure direction.
Open pitch deck View leaderboard Read methodology