Unique Authorities

Resolving 378,938 raw citation occurrences into 64,548 unique legal entities to prevent redundancy and ensure accurate relevance scoring.

378,938
Raw Citation Strings
64,548
Unique Authorities
5.9:1 Compression

The Distinction: Occurrence vs. Authority

A Supreme Court opinion might cite Miranda v. Arizona twelve times in a single decision. Each mention is a separate occurrence—a specific string at a specific character offset. However, there is only one Miranda case. It is a single authority.

Concept Definition Example
Occurrence One citation string at one location "384 U.S. 436" at offset 12,847
Authority One unique case entity 1965-116 (Miranda v. Arizona)

How Citations Become Authorities

1

Resolution

Stage 2 maps strings to IDs (SCDB or CAP).

2

Grouping

All mentions of 1965-116 in a single opinion are collapsed into one entry.

3

Feature Aggregation

We sum the frequency (3 mentions) and record context snippets for the strongest signal.

Authority Record
{
  "cited_caseId": "1965-116",
  "cited_caseName": "Miranda v. Arizona",
  "frequency": 12,
  "first_offset": 4512,
  "best_signal": "discusses",
  "occurrences": [
     {"raw": "384 U.S. 436", "offset": 4512},
     {"raw": "id, at 440", "offset": 6201},
     ...
  ]
}

Why "Unique" is Complex

True resolution requires handling edge cases that naive string matching misses:

Network Centrality Leaders

Marbury v. Madison 3,847 citations
Brown v. Board of Education 1,892 citations
Miranda v. Arizona 1,634 citations
Roe v. Wade 1,423 citations