BENCHMARK ACTIVE [v3.0]
Default View 2026.01.01
Benchmark Result Summary

82.4% PEAK
ACCURACY.

Legal-10 is the frontier for multi-step agentic planning in Common Law. We verify logic gates before scoring knowledge.

Tested Foundational Models
24
Leader: GPT-4o
91%
Technical Baseline
S1–S8 Design

Legal‑10's first chained/agentic legal benchmark - "AG8" evaluates intermediate research skills through open‑book synthesis and then enforces a deterministic citation integrity gate via U.S.-Reports citations. AG8 uses deterministic citation extraction from SCOTUS opinion text and triangulates citation relevance using Shepard’s treatment labels as a universal, human-curated oracle.

First Chained Legal Benchmark
Deterministic Reference Pack
Shepard's as Relevance Oracle
Chain-Faithful Evaluation
Citation Integrity Gate
Selection Manifest as Contract
Two-Layer Architecture

Top Performance

FULL_LOGS ->
Model_ID Chain_Acc Integrity_Gate
Syncing with evaluation server...

Latest_Syntheses

"The transition from atomic prompts to autonomous legal reasoning requires a new standard of observability."