CAP Corpus & Metadata

Integrating the Caselaw Access Project to provide lower federal court context. LegalChain resolves Supreme Court citations to canonical records from the circuit and district court corpus.

Populations
43,043
Resolved Cases
Scale
1.37 GB
JSONL Snapshots
Resolution
98.2%
Match Rate (S2)
The CAP Repository

A Typical SCOTUS opinion cites circuit and district court rulings to establish procedural history. LegalChain maintains a subset of the CAP corpus specifically mapped to these upstream dependencies.

[ DATA TRANSFORMATION ]
STEP 1. NORMALIZE
Strip punctuation -> Canonical string
STEP 2. RESOLVE
Map to CAP_CASES_META in DuckDB
STEP 3. INGEST
Load casebody into S4 Research Pack
Case Bundle Manifest
Bundle Name Count Size
cap_appellate_text.jsonl 36,552 1.1 GB
cap_trial_text.jsonl 6,491 262 MB
Corpus Total 43,043 1.37 GB
Schema Excerpt: cap_case
{
  "id": 1403610,
  "name": "U.S. v. Jackson",
  "citations": ["202 F. 305"],
  "casebody": {
    "opinions": [{ "type": "majority" }]
  },
  "pagerank": 0.0000123
}