Unique Authorities
Resolving 378,938 raw citation occurrences into 64,548 unique legal entities to prevent redundancy and ensure accurate relevance scoring.
The Distinction: Occurrence vs. Authority
A Supreme Court opinion might cite Miranda v. Arizona twelve times in a single decision. Each mention is a separate occurrence—a specific string at a specific character offset. However, there is only one Miranda case. It is a single authority.
| Concept | Definition | Example |
|---|---|---|
| Occurrence | One citation string at one location | "384 U.S. 436" at offset 12,847 |
| Authority | One unique case entity | 1965-116 (Miranda v. Arizona) |
How Citations Become Authorities
Resolution
Stage 2 maps strings to IDs (SCDB or CAP).
Grouping
All mentions of 1965-116 in a single opinion are collapsed into one entry.
Feature Aggregation
We sum the frequency (3 mentions) and record context snippets for the strongest signal.
{
"cited_caseId": "1965-116",
"cited_caseName": "Miranda v. Arizona",
"frequency": 12,
"first_offset": 4512,
"best_signal": "discusses",
"occurrences": [
{"raw": "384 U.S. 436", "offset": 4512},
{"raw": "id, at 440", "offset": 6201},
...
]
}
Why "Unique" is Complex
True resolution requires handling edge cases that naive string matching misses:
- Pin Cites: Mapping "384 U.S. at 440" to the same parent case as the base citation.
- Parallel Cites: Resolving "93 S.Ct. 705" (Supreme Court Reporter) to the same case as "410 U.S. 113" (Official).
- Name Variants: Treating "N.Y. Times v. Sullivan" and "New York Times Co. v. Sullivan" as identical by relying on citation IDs, not text.