Sample Report

This is a placeholder report page showing the intended reading experience: larger serif text and a left “On this page” TOC.

Summary

Report summary goes here: run id, model, dataset slice, and headline metrics (void rate, per-step accuracy, etc.).

Method

Describe the evaluation method used for this report. Keep this consistent across reports so comparisons are easy.

Inputs

Outputs

Findings

Main narrative findings. Use short paragraphs and clear headings so the TOC remains useful.

Step performance

Discuss where accuracy drops and why. Link to supporting tables/charts if needed.

Failure modes

Highlight common error clusters (e.g., S4 disposition confusion, S6 incoherent synthesis, citation integrity issues).

Appendix

Include additional details (definitions, thresholds, or per-step rubric notes).