Interactivity Demo
Interactive Research Visualization
This document showcases interactive components developed for the LegalChain benchmark. We blend data-rich Epoch-style visualizations with our scholarly "Document-Sized" design system.
Performance Trajectories
Powered by
ECharts
Highlight Groups
Granular Skill Surface
Hover over labels for technical definitions of each benchmark milestone.
87.3%
Citation Pass Rate
92.1%
Structural IRAC Score
0.94
Stability Index
Methodological Framework
Phase 1: Authority Identification
During the primary detection phase (Steps S1-S3), the runner evaluates a model's ability to resolve citations and identify governing law:
- S1: Exact Case Resolution from Metadata
- S2: Authority Retrieval from Fact Patterns
- S3: Authority Validation (Overrule Checks)
Phase 2: Reasoning & Application
The reasoning phase (Steps S4-S7) transforms raw retrieval into legal application:
-
Delta AnalysisThe benchmark isolates the performance gain when a model provides an "Open-Book" Synthesis (S7) compared to a "Closed-Book" reasoning task (S6).
Developer Interface
LegalChain can be integrated into existing pipelines using our Python SDK.
# Initialize the Agentic Chain
from legal10 import BenchmarkRunner
runner = BenchmarkRunner(
model="claude-3-5-sonnet",
temperature=0.0,
runs=1
)
# Run end-to-end evaluation
results = runner.evaluate(
dataset="supreme_court_2024",
steps=["S1", "S4", "S8"]
)
# Capture integrity metrics
print(f"Citation Pass: {results.s8_success:.1%}")
Comparative Analysis
| Model Identification | Aggregate Score | Citation (S8) | IRAC Integrity |
|---|