Calibrated Judgment

A balanced scoring architecture that blends deterministic rigor with expert LLM assessment.

LegalChain relies on a "Composite Score" that aggregates three distinct signals: structure, consistency, and reasoning quality. This approach prevents models from gaming the metric with well-formatted hallucinations or technically accurate but irrelevant answers.

The Three Phases

The composite score calculates a weighted average of three critical dimensions.

Phases feeding the Composite Score
Phase Weight Evaluator
Phase 1: Structure 10% Deterministic
Phase 2: Consistency 40% Deterministic
Phase 3: Quality 50% LLM Judge
1

Phase 1: Structure (10%)

"Did the model follow instructions?"
This deterministic check validates the presence of required IRAC sections (Issue, Rule, Application, Conclusion). It ensures basic format compliance.

2

Phase 2: Consistency (40%)

"Did the model use the materials?"
This phase checks internal coherence. Does the Conclusion answer the Issue? Does the Rule section cite cases provided in the ResearchPack? It penalizes generic answers that ignore the specific evidence.

3

Phase 3: Quality (50%)

"Is the reasoning sound?"
This is the only phase that uses an LLM judge. It evaluates the logic, nuance, and persuasiveness of the argument using a calibrated rubric.

Why These Weights?

"Quality matters most. Engagement matters significantly. Structure matters, but is a baseline."

We weight Quality (50%) highest because legal analysis is fundamentally about reasoning. A perfectly formatted document with bad logic is useless.

Consistency (40%) is weighted heavily to punish "lazy" LLMs that hallucinate generic principles instead of synthesizing the specific cases provided (RAG adherence).

Structure (10%) is kept low because satisfying format requirements is trivial for modern models and does not correlate strongly with legal intelligence.

Interaction with Gates

The Composite Score is only calculated if the response passes all Integrity Gates. If a response contains a fabricated citation (S8 failure), the entire chain is voided.

IF Integrity_Gate_Failed THEN
Score = 0
Regardless of Quality or Structure