Empirical Final Design (Long Report Test)

This page exists to stress-test report typography and the right-side Contents TOC (scrolling, active section highlight, tables, lists, code blocks, and quotes).

Summary

We suspect aggressive quantization can create a “silent defect zone” where fluency remains stable while multi-step legal reasoning degrades. This report format is designed to make those failures observable with consistent structure.

Key metrics

Metric Definition Why it matters
Chain completion Percent of instances that pass all steps and gates Captures compounding error across steps
Citation integrity Percent of responses without fabricated authorities Hard constraint for legal work product
Surface fluency vs reasoning proxy Drop in “validate + synthesize” under compression Detects reasoning fragility vs surface fluency

Background

In production, vendors routinely trade precision for throughput. The user sees faster responses; the risk is a selective drop in reasoning capacity. A benchmark must be chain-aware to detect that.

Design principle

Don’t grade the plan; grade the memo. Planning is implicit in whether synthesis succeeds.

Method

The report structure below is intentionally repetitive: each section has the same density and spacing so that visual rhythm stays stable while you scroll, and the TOC remains useful.

Inputs

Outputs

Implementation notes

This block checks code styling in report mode: inline code, and fenced blocks.

run_id = "l10_demo_2025_01_01"
model = {"name": "example", "precision": "4-bit", "ptq": "awq"}
thresholds = {"citation_integrity": "strict"}

Findings

The sections below are intentionally long to test scroll behavior and TOC highlighting over many headings.

F1: Surface fluency stability

We often observe stable grammar, formatting, and confident tone even when legal validation quality drops. This mismatch is exactly why a report layout needs strong structure and navigability.

F2: Reasoning fragility under compression

Under aggressive PTQ, rare but critical weight directions can clip first. In legal tasks, those are often activated during validation, distinguishing, and synthesis—where multi-factor tests and exception handling matter.

Section stress test A

Scroll rhythm test paragraph. Repeatable spacing helps you notice anomalies in charts/tables rather than in inconsistent typography.

A.1

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

A.2

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

Section stress test B

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

B.1

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

B.2

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

Section stress test C

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

C.1

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

C.2

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

Section stress test D

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

D.1

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

D.2

Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test. Dense paragraph test.

Appendix

Quick checklist for report UI:

  1. TOC stays visible and scrollable
  2. Active section highlight updates smoothly
  3. Tables remain readable in light and dark
  4. Code blocks don’t overflow horizontally