L7

L7 is an atomic benchmark: each skill is evaluated independently (no chaining, no stateful carry-over), producing per-skill scores that isolate what a model struggles with, without error propagation.

Mode

Atomic (stateless)

Skills

7 independent skills

Best for

Diagnosis + baseline skill profiling

The 7 skills

Known Authority Retrieval

Given a citation or case name, return canonical details.

Unknown Authority Retrieval

Given a case, predict which later cases cite it.

Validate Authority

Determine whether authority remains good law.

Fact Extraction

Extract holding/disposition and key facts from an opinion.

Distinguish

Compare cases and decide whether they meaningfully differ.

IRAC Synthesis

Write a structured legal analysis under a rubric.

Citation Integrity

Binary check: a single fabricated authority fails the skill (evaluated independently in L7).

Scoring summary

Skill	Method	Signal
S1	Exact match	0/1
S2	Ranked retrieval	MRR / hit@k
S3	Exact + partial credit	0 / 0.5 / 1
S4	Weighted fields	0–1
S5	Binary (two modalities)	0/1
S6	Rubric-based	0–1
S7	Binary integrity check	pass/fail

Modalities

S5:cb runs closed-book (metadata + extracted facts only). S5:rag runs RAG-enhanced (includes additional opinion text).

When to use L7

Isolate which skills a model struggles with (without chain-level confounds)
Track improvements/regressions per capability over time
Complement chained benchmarks (AG8/AG10) during iteration

Results

Leaderboard

Pick an L7 run spec (when available) to compare models.

How it works

Methodology

Atomic skills, scoring, and integrity policies.