CAP Byte Index

Enabling O(1) random access to 1.4 GB of JSONL data. Stage 3 reduces case extraction time from 500ms to 1ms by pre-computing exact byte offsets for every authority.

Latency Gain
500x
500ms -> 1ms
Complexity
O(1)
Constant Time
Overhead
624 KB
In-Memory Index
The Performance Problem

CAP text is stored in massive JSONL bundles. A standard linear scan for a specific case (O(n)) requires reading gigabytes of text, making the 27k ResearchPack construction prohibitively slow (10+ hours).

[ EXTRACTION TRACE ]
01. INDEX LOOKUP
Locate `cap_id` in Parquet Index
02. FILE SEEK
f.seek(523,847,621)
03. EXACT READ
f.read(12,847 bytes)
Index Mapping Example
CAP ID Byte Offset Length
1403610 523,847,621 12,847
1046253 892,104,502 9,112
1098412 1,102,543,892 15,403
Build Manifest Excerpt
{
  "cap_sha256": "a1b2c3...",
  "index_sha256": "g7h8i9...",
  "strategy": "constant_time"
}
Build-Time Utility

Byte Indexing is exclusive to Stage 4A (Build-Time). Runners receive fully extracted Research Packs, requiring no runtime lookups.