Start Here
The foundational 6-pillar framework for agentic legal reasoning.
Data & Corpus
A Sealed Evaluation Universe spanning 27k SCOTUS cases. Hard-gated to prevent contamination of the model's effective context window.
Evaluation Pipeline
The 10-skill execution chain measuring Reasoning Decay. We track how legal errors compound across 10 steps of stateful logic.
Scoring & Judging
Combining deterministic field matching with hybrid IRAC rubrics and calibrated LLM-judge synthesis scoring.
Artifacts & Reproducibility
Frozen Evaluation Units (EU) and ResearchPacks (RP) that ensure exact runtime state reconstruction.
Synthetic Traps & Integrity
50,000 deterministic synthetic traps across corpora—impossible citations that serve as mathematical proof of hallucination (2 per evaluation).
Problems We Solve
Surface fluency vs reasoning divergence and the Compression Gap: failure modes that only show up in stateful, chained evaluation.