Closed-Book vs Open-Book

Step variants that measure parametric knowledge versus retrieval-augmented capability using the same ground truth.

CB Mode

CLOSED-BOOK

Parametric Only

RAG Mode

OPEN-BOOK

+ Retrieved Evidence

The Split

Some steps run in both CB (closed-book) and RAG variants. CB tests what the model knows from pretraining alone. RAG tests whether the model can effectively use provided evidence. Same ground truth, different input conditions.

[ VARIANT STEPS ]

S5:cb Metadata + Shepard's only

S5:rag + Full opinion text

S9:cb / S9:rag Same pattern

Diagnostic Value

The gap between CB and RAG scores is diagnostic. High RAG but low CB may indicate "copying" rather than reasoning. Similar scores in both may suggest genuine understanding—or data contamination (which the Canary mechanism detects).

Key Insight

Measuring both modes reveals whether a model is reasoning from evidence or merely pattern-matching against pretraining. The delta is often more informative than either score alone.