No Hidden Variables
Why we freeze scoring constants and enforce calculating 75% of metrics using code, not judges.
If the score changes, the model must have changed. That is the golden rule of benchmarking. But with non-deterministic aggregators and "vibe-based" judging, scores often drift due to randomness in the evaluation process itself. LegalChain maximizes metric stability by relying on deterministic, audit-trail-verified scoring logic for all structural and citation components.
The Formula is Fixed
When ranking precedents for a ResearchPack, or verifying a citation, we use frozen code paths. There is no LLM "deciding" if a case is relevant; there is a formula.
Frozen Constants
Every weight, every threshold, and every penalty is explicitly versioned. We do not tweak these "under the hood."
Signal Weights
| "followed" | 1.0 |
| "cites" | 0.7 |
| "explained" | 0.5 |
| "overruled" | 0.0 |
Budget Limits
| Max Anchor Chars | 80,000 |
| Max Pack Chars | 120,000 |
| Top-K Authorities | 12 |
What IS Judged?
We reserve LLM judging only for qualities that cannot be measured via code:
- Quality of Reasoning: Is the argument logical and persuasive?
- Doctrinal Nuance: Does the analysis capture the subtlety of a holding?
- Completeness: Did the model address all relevant factors?
For everything else—Did it cite real cases? Did it follow IRAC structure? Did it find the right precedent?—we use code.