Skill Reference

Complete specification for each of the 7 legal reasoning skills with input/output schemas, ground truth sources, scoring logic, and worked examples.

S1: Known Authority Retrieval

Given metadata about a case, verify it exists and retrieve its details.

Purpose

S1 establishes the anchor case for the chain. This skill tests the model's ability to correctly identify and describe a real Supreme Court case from the SCDB database.

Input Schema

{
  "citation": "347 U.S. 483",
  "case_name_hint": "Brown v. Board of Education",
  "term_hint": 1954
}

Output Schema

{
  "us_cite": "347 U.S. 483",
  "case_name": "Brown v. Board of Education of Topeka",
  "term": 1954
}

Ground Truth & Scoring

Source
scdb_sample.csv (usCite, caseName, term)
Method
Exact match on all fields (canonicalized)

Example

Input
Citation: 410 U.S. 113, Case: Roe v. Wade, Term: 1973
Expected Output
us_cite: "410 U.S. 113", case_name: "Roe v. Wade", term: 1973

S2: Unknown Authority Retrieval

Given a cited case, predict which subsequent cases cite it.

Purpose

S2 tests the model's knowledge of legal citation networks. Given a case, the model must identify which subsequent cases have cited it, demonstrating awareness of legal precedent relationships.

Input Schema

{
  "cited_case": {
    "us_cite": "347 U.S. 483",
    "case_name": "Brown v. Board of Education",
    "term": 1954
  }
}

Output Schema

{
  "citing_cases": [
    { "us_cite": "349 U.S. 294", "case_name": "Brown II" },
    { "us_cite": "358 U.S. 1", "case_name": "Cooper v. Aaron" }
  ]
}

Ground Truth & Scoring

Source
scotus_shepards_sample.csv (citing_case_us_cite)
Method
Ranked retrieval metrics
Metric Definition Storage
MRRMean Reciprocal RankStepResult.score
hit@10Ground truth in top 10StepResult.correct
hit@1, hit@5, hit@20Additional hit metricsparsed.metrics

Example

Input: Miranda v. Arizona (384 U.S. 436)
Model returns: [Harris v. New York, Michigan v. Tucker, Oregon v. Mathiason]
Ground truth: 417 U.S. 433 (Michigan v. Tucker)
Result: rank=2, MRR=0.5, hit@10=True

S3: Validate Authority

Determine if a case has been overruled and identify the overruling case.

Purpose

S3 tests the model's knowledge of precedent validity. A competent legal AI must know when a case is no longer good law and can identify what case overruled it.

Input Schema

{
  "case": {
    "us_cite": "163 U.S. 537",
    "case_name": "Plessy v. Ferguson",
    "term": 1896
  }
}

Output Schema

{
  "is_overruled": true,
  "overruling_case": "Brown v. Board of Education",
  "year_overruled": 1954
}

Ground Truth & Scoring

Source
scotus_overruled_db.csv (288 overruled cases)
Condition Score Correct
Not overruled, model says not overruled1.0True
Overruled, model says overruled + correct year1.0True
Overruled, model says overruled + wrong year0.5False
Mismatch on is_overruled0.0False

Example: Overruled

Input: Lochner v. New York (198 U.S. 45, 1905)
Expected: is_overruled: true, overruling_case: "West Coast Hotel Co. v. Parrish", year: 1937

Example: Not Overruled

Input: Marbury v. Madison (5 U.S. 137, 1803)
Expected: is_overruled: false, overruling_case: null, year: null

S4: Fact Extraction

Extract disposition, prevailing party, and holding summary from majority opinion.

Purpose

S4 tests the model's ability to read and extract structured information from legal text. This is a core legal AI skill - understanding case outcomes from opinion text.

Output Schema

{
  "disposition": "reversed and remanded",
  "party_winning": "petitioner",
  "holding_summary": "The Court held that segregation in public schools violates the Equal Protection Clause."
}

Disposition Enum (Closed)

  • stay granted
  • affirmed
  • reversed
  • reversed and remanded
  • vacated and remanded
  • affirmed and reversed in part
  • affirmed and vacated in part
  • affirmed and reversed in part and remanded
  • vacated
  • petition denied
  • certification

Party Winning Enum

  • petitioner - SCDB code 1
  • respondent - SCDB code 0
  • unclear - SCDB code 2

Scoring

0.5 per field (disposition + party_winning)

Example

Input: "...The judgment of the Court of Appeals is reversed, and the case is remanded for further proceedings consistent with this opinion. It is so ordered."
Expected: disposition: "reversed and remanded", party_winning: "petitioner"

S5: Distinguish

Determine whether the citing case agrees with or distinguishes from the cited case.

Purpose

S5 is the core legal reasoning skill. It tests whether the model can determine the doctrinal relationship between two cases based on available information.

Two Variants

S5:cb (Closed-Book)
Uses: metadata + S4 extracted facts only
Runs on: all CHAIN_CORE instances
S5:rag (RAG-Enhanced)
Uses: all S5:cb inputs + citing opinion text
Runs on: CHAIN_RAG_SUBSET only

Output Schema

{
  "agrees": true,
  "reasoning": "The citing case follows the precedent because..."
}
Note: Field is "agrees", matching ground truth field edge.agree

Ground Truth

Source: scotus_shepards_sample.csv (agree field)
True (1): followed/parallel (agrees)
False (0): distinguished/criticized/overruled
Scoring: Binary exact match

Example (S5:cb)

Cited: Brown v. Board of Education (347 U.S. 483, 1954)
Citing: Cooper v. Aaron (358 U.S. 1, 1958)
Expected: agrees: true
Reasoning: "Cooper v. Aaron reaffirmed Brown's holding that segregation is unconstitutional..."

S6: IRAC Synthesis

Synthesize all prior outputs into IRAC-structured legal analysis.

Purpose

S6 is the capstone skill. It tests whether the model can integrate information from prior steps into a coherent legal analysis using the Issue-Rule-Application-Conclusion (IRAC) framework.

Output Schema

{
  "issue": "Whether segregation in public schools violates the Equal Protection Clause...",
  "rule": "The Equal Protection Clause prohibits states from denying equal protection...",
  "application": "Applying this rule to the facts, segregation generates a feeling of inferiority...",
  "conclusion": "Therefore, the Court concludes that 'separate but equal' has no place."
}

Rubric-Based Scoring

Component Weight Criteria
Issue20%Clear, correctly framed legal question
Rule25%Accurate statement of legal rule from case
Application35%Logical application with citation support
Conclusion20%Consistent with analysis, cites outcome
Correct = score >= 0.5

S7 Gating (in L10 Agentic)

In L10 Agentic, S6 results can be voided by S7 if fabricated citations are detected. The voided flag is set but status remains "OK" (execution happened, just invalidated post-hoc). In L10 Atomic, skills are evaluated independently without gating.

S7: Citation Integrity

Verify all citations in S6 output are real cases (no hallucinations).

Professional Responsibility Gate

S7 is the hallucination gate. Fabricating citations is a serious professional ethics violation. A single fake citation results in failure and (in L10 Agentic) voids the entire S6 analysis.

Output Schema

{
  "citations_found": [
    { "cite": "347 U.S. 483", "exists": true },
    { "cite": "384 U.S. 436", "exists": true },
    { "cite": "999 U.S. 999", "exists": false }
  ],
  "all_valid": false
}

Ground Truth Sources

  • fake_cases.csv - Known fabricated citations
  • scdb_sample.csv - Known real citations
Citations extracted using eyecite library

Verification Logic

  1. Check if citation is in fake_cites set -> False
  2. Check if citation is in real_cites set -> True
  3. Unknown citation -> False (conservative)
Scoring: Binary - all citations must be valid

Metrics

Metric Definition
Void RateChains with voided=True / total chains
Hallucination RateCitations with exists=False / total citations
Clean RateChains with all_valid=True / total chains

Example: Failed Verification

S6 Output: "As established in Brown v. Board of Education (347 U.S. 483)... Following Smith v. Jones (999 U.S. 999)..."
S7 Result:
  • 347 U.S. 483 -> exists: true
  • 999 U.S. 999 -> exists: false (fabricated!)
Outcome: all_valid: false, S6 voided