CAP Byte Index
Enabling O(1) random access to 1.4 GB of JSONL data. Stage 3 reduces case extraction time from 500ms to 1ms by pre-computing exact byte offsets for every authority.
Latency Gain
500x
500ms -> 1ms
Complexity
O(1)
Constant Time
Overhead
624 KB
In-Memory Index
The Performance Problem
CAP text is stored in massive JSONL bundles. A standard linear scan for a specific case (O(n)) requires reading gigabytes of text, making the 27k ResearchPack construction prohibitively slow (10+ hours).
[ EXTRACTION TRACE ]
01. INDEX LOOKUP
Locate `cap_id` in Parquet Index
02. FILE SEEK
f.seek(523,847,621)
03. EXACT READ
f.read(12,847 bytes)
Index Mapping Example
| CAP ID | Byte Offset | Length |
|---|---|---|
| 1403610 | 523,847,621 | 12,847 |
| 1046253 | 892,104,502 | 9,112 |
| 1098412 | 1,102,543,892 | 15,403 |
Build Manifest Excerpt
{
"cap_sha256": "a1b2c3...",
"index_sha256": "g7h8i9...",
"strategy": "constant_time"
}
Build-Time Utility
Byte Indexing is exclusive to Stage 4A (Build-Time). Runners receive fully extracted Research Packs, requiring no runtime lookups.