Skill Reference
Complete specification for each of the 7 legal reasoning skills with input/output schemas, ground truth sources, scoring logic, and worked examples.
S1: Known Authority Retrieval
Given metadata about a case, verify it exists and retrieve its details.
Purpose
S1 establishes the anchor case for the chain. This skill tests the model's ability to correctly identify and describe a real Supreme Court case from the SCDB database.
Input Schema
{
"citation": "347 U.S. 483",
"case_name_hint": "Brown v. Board of Education",
"term_hint": 1954
}
Output Schema
{
"us_cite": "347 U.S. 483",
"case_name": "Brown v. Board of Education of Topeka",
"term": 1954
}
Ground Truth & Scoring
Example
S2: Unknown Authority Retrieval
Given a cited case, predict which subsequent cases cite it.
Purpose
S2 tests the model's knowledge of legal citation networks. Given a case, the model must identify which subsequent cases have cited it, demonstrating awareness of legal precedent relationships.
Input Schema
{
"cited_case": {
"us_cite": "347 U.S. 483",
"case_name": "Brown v. Board of Education",
"term": 1954
}
}
Output Schema
{
"citing_cases": [
{ "us_cite": "349 U.S. 294", "case_name": "Brown II" },
{ "us_cite": "358 U.S. 1", "case_name": "Cooper v. Aaron" }
]
}
Ground Truth & Scoring
| Metric | Definition | Storage |
|---|---|---|
| MRR | Mean Reciprocal Rank | StepResult.score |
| hit@10 | Ground truth in top 10 | StepResult.correct |
| hit@1, hit@5, hit@20 | Additional hit metrics | parsed.metrics |
Example
S3: Validate Authority
Determine if a case has been overruled and identify the overruling case.
Purpose
S3 tests the model's knowledge of precedent validity. A competent legal AI must know when a case is no longer good law and can identify what case overruled it.
Input Schema
{
"case": {
"us_cite": "163 U.S. 537",
"case_name": "Plessy v. Ferguson",
"term": 1896
}
}
Output Schema
{
"is_overruled": true,
"overruling_case": "Brown v. Board of Education",
"year_overruled": 1954
}
Ground Truth & Scoring
| Condition | Score | Correct |
|---|---|---|
| Not overruled, model says not overruled | 1.0 | True |
| Overruled, model says overruled + correct year | 1.0 | True |
| Overruled, model says overruled + wrong year | 0.5 | False |
| Mismatch on is_overruled | 0.0 | False |
Example: Overruled
Example: Not Overruled
S4: Fact Extraction
Extract disposition, prevailing party, and holding summary from majority opinion.
Purpose
S4 tests the model's ability to read and extract structured information from legal text. This is a core legal AI skill - understanding case outcomes from opinion text.
Output Schema
{
"disposition": "reversed and remanded",
"party_winning": "petitioner",
"holding_summary": "The Court held that segregation in public schools violates the Equal Protection Clause."
}
Disposition Enum (Closed)
- stay granted
- affirmed
- reversed
- reversed and remanded
- vacated and remanded
- affirmed and reversed in part
- affirmed and vacated in part
- affirmed and reversed in part and remanded
- vacated
- petition denied
- certification
Party Winning Enum
- petitioner - SCDB code 1
- respondent - SCDB code 0
- unclear - SCDB code 2
Scoring
Example
S5: Distinguish
Determine whether the citing case agrees with or distinguishes from the cited case.
Purpose
S5 is the core legal reasoning skill. It tests whether the model can determine the doctrinal relationship between two cases based on available information.
Two Variants
Output Schema
{
"agrees": true,
"reasoning": "The citing case follows the precedent because..."
}
Ground Truth
Example (S5:cb)
S6: IRAC Synthesis
Synthesize all prior outputs into IRAC-structured legal analysis.
Purpose
S6 is the capstone skill. It tests whether the model can integrate information from prior steps into a coherent legal analysis using the Issue-Rule-Application-Conclusion (IRAC) framework.
Output Schema
{
"issue": "Whether segregation in public schools violates the Equal Protection Clause...",
"rule": "The Equal Protection Clause prohibits states from denying equal protection...",
"application": "Applying this rule to the facts, segregation generates a feeling of inferiority...",
"conclusion": "Therefore, the Court concludes that 'separate but equal' has no place."
}
Rubric-Based Scoring
| Component | Weight | Criteria |
|---|---|---|
| Issue | 20% | Clear, correctly framed legal question |
| Rule | 25% | Accurate statement of legal rule from case |
| Application | 35% | Logical application with citation support |
| Conclusion | 20% | Consistent with analysis, cites outcome |
S7 Gating (in L10 Agentic)
In L10 Agentic, S6 results can be voided by S7 if fabricated citations are detected. The voided flag is set but status remains "OK" (execution happened, just invalidated post-hoc). In L10 Atomic, skills are evaluated independently without gating.
S7: Citation Integrity
Verify all citations in S6 output are real cases (no hallucinations).
S7 is the hallucination gate. Fabricating citations is a serious professional ethics violation. A single fake citation results in failure and (in L10 Agentic) voids the entire S6 analysis.
Output Schema
{
"citations_found": [
{ "cite": "347 U.S. 483", "exists": true },
{ "cite": "384 U.S. 436", "exists": true },
{ "cite": "999 U.S. 999", "exists": false }
],
"all_valid": false
}
Ground Truth Sources
- fake_cases.csv - Known fabricated citations
- scdb_sample.csv - Known real citations
Verification Logic
- Check if citation is in fake_cites set -> False
- Check if citation is in real_cites set -> True
- Unknown citation -> False (conservative)
Metrics
| Metric | Definition |
|---|---|
| Void Rate | Chains with voided=True / total chains |
| Hallucination Rate | Citations with exists=False / total citations |
| Clean Rate | Chains with all_valid=True / total chains |
Example: Failed Verification
- 347 U.S. 483 -> exists: true
- 999 U.S. 999 -> exists: false (fabricated!)