Quantization

Under development. This benchmark is intended to evaluate changes in legal reasoning performance across different deployment precision tiers (e.g. int8, int4) and quantization methods.

Related methodology

Status

Planned

Focus

Reasoning vs fluency

Output

Precision-tier comparison

Planned evaluation shape

Run the same task set across multiple precision tiers (example tiers: int8, int4, int2).
Compare reasoning-oriented performance to fluency-oriented indicators to detect divergence.
Record configuration metadata for each run so comparisons are attributable to a defined regime.

Other benchmarks under development.

Runner semantics and artifacts.