Closed-Book vs Open-Book

Step variants that measure parametric knowledge versus retrieval-augmented capability using the same ground truth.

CB Mode
CLOSED-BOOK
Parametric Only
RAG Mode
OPEN-BOOK
+ Retrieved Evidence
The Split

Some steps run in both CB (closed-book) and RAG variants. CB tests what the model knows from pretraining alone. RAG tests whether the model can effectively use provided evidence. Same ground truth, different input conditions.

[ VARIANT STEPS ]
S5:cb Metadata + Shepard's only
S5:rag + Full opinion text
S9:cb / S9:rag Same pattern
Diagnostic Value

The gap between CB and RAG scores is diagnostic. High RAG but low CB may indicate "copying" rather than reasoning. Similar scores in both may suggest genuine understanding—or data contamination (which the Canary mechanism detects).

Key Insight

Measuring both modes reveals whether a model is reasoning from evidence or merely pattern-matching against pretraining. The delta is often more informative than either score alone.