Our Identity

Who We Are

"Legal-10 is the frontier for agentic legal reasoning."

Legal-10 is a production-grade, chained benchmark designed to evaluate the limits of reasoning in Large Language Models. The project emerged from a critical realization: while models have become fluently articulate in legal prose, their ability to maintain substantive logical integrity over complex, multi-step workflows remains fragile.

Unlike existing public benchmarks that test skills in isolation—testing whether a model can identify a hearsay exception or summarize a clause—Legal-10 measures the connective tissue of legal practice. We evaluate whether a model can chain ten distinct reasoning steps without succumbing to the "Cascade Penalty."

The Problem We Solve

Fluency-Reasoning Divergence (FRD)

A model can write grammatically correct, professionally-styled legal text while producing substantively wrong analysis. Standard benchmarks often reward fluency as a proxy for intelligence; Legal-10 isolates reasoning from style.

The Compression Gap

Legal AI products are frequently deployed under aggressive quantization to reduce operational costs. This compression often preserves surface fluency while silently degrading complex reasoning. We provide the stress-test for these production-grade trade-offs.

What Makes Us Different

01. Agentic & Chained

The first US Common Law benchmark with 10-step dependent chains. We measure stateful reasoning, not stateless tasks.

02. Production-Grade

Built for enterprise observability. Every turn is traceable via Langfuse and grounded in real-world litigation schemas.

03. Grounded Standards

Aligned with the ABA MacCrate Report and AALL Principles. We test the skills that actually define legal competency.

04. Verified Integrity

Verification gates powered by the Supreme Court Database (SCDB) and Shepard's citation networks ensure ground truth.

Join the Pursuit

Independence is our greatest asset.