Quantization
Under development. This benchmark is intended to evaluate changes in legal reasoning performance across different deployment precision tiers (e.g. int8, int4) and quantization methods.
Related methodology
Status
Planned
Focus
Reasoning vs fluency
Output
Precision-tier comparison
Planned evaluation shape
- Run the same task set across multiple precision tiers (example tiers: int8, int4, int2).
- Compare reasoning-oriented performance to fluency-oriented indicators to detect divergence.
- Record configuration metadata for each run so comparisons are attributable to a defined regime.
Benchmarks
Roadmap
Other benchmarks under development.
How it works
Methodology
Runner semantics and artifacts.