Stateful Delivery Model
Eliminating information contamination through controlled, multi-stage context injection within a single continuous session.
The Contamination Problem
When evaluating an LLM's legal reasoning, it is difficult to distinguish between genuine understanding and mere retrieval of training data. If a model is asked to analyze Strickland v. Washington, it may produce a correct answer simply because it has memorized the case, effectively "peeking" at the answer sheet.
Architecture of State
LegalChain runs each evaluation as a single continuous session. Unlike traditional benchmarks that treat prompts as stateless snapshots, our ChainExecutor maintains context accumulation across the entire S1-S10 lifecycle.
| Delivery | Timeline | Payload Contents | Purpose |
|---|---|---|---|
| D1 | Pre-S1 | Target citation, court year, case name, legal issue. | Baseline |
| D2 | Post-S4 | Full ResearchPack: majority opinion, authorities, Shepard's data. | Synthesis |
| D3 | Post-S7 | Synthetic trap cases, modified holdings, canary strings. | Integrity |
Execution Flow: Temporal Audit Trail
By timing these deliveries, we create a temporal audit trail. We can compare the model's reasoning in D1 (relying on internal memory) against its D2 refined analysis (using external evidence).
Reasoning Quality (S1-S7)
Focuses on substantive accuracy and legal synthesis. Minor citation formatting errors are tolerated if the underlying logic is sound.
Integrity Gate (S8)
Binary validation. Any citation of a synthetic case or failure to detect a canary trap triggers an immediate 100% chain failure.