Concept 04 · Context & caching
Working memoryone fixed size
your ruleshow Claude should work8%
your 6 profile filesloaded every sessionheavy34%
the taskwhat you asked for6%
↑ older chat · summarized
the chat so fargrows every turn
0%
full
01 the desk

A desk with a fixed amount of space.

~200Ktokens

Every conversation gets one working memory of a set size. Everything Claude can see at once — and reason over — has to fit on this desk.

02 what fills it

Everything you bring shares it.

your rules8%
6 profile files34%
the task6%
the chat so fargrows

Instructions, files, the task, and the running conversation all compete for the same desk.

03 when it overflows

Full desk? The oldest gets cleared.

Once it fills, the oldest material is summarized or dropped to make room. And loading all six profile files every session burns a third of the desk before you've even started — that's the part worth trimming.

04 the hidden cost

Re-reading, every single turn.

1turn
cost & waitrising…
re-reads the whole stack

Without help, every turn re-reads your rules, files, and task from scratch — so each reply gets slower and more expensive.

05 caching

Hold the unchanged parts ready.

Caching keeps the stable blocks — your rules, files, and the task — ready to reuse. Claude skips re-reading them and jumps straight to the new turn.

rules · files · task — held ready
06 the payoff

Same work, far less cost.

full
no cache
re-reads every turn
cached
reuses the stack
↓ up to ~90% less cost & wait on repeat turns
07 the catch

But the cache expires.

5:00
~5 minutes of idle

Keep the session moving and you stay in the cache. Pause too long and it clears — the next turn pays full price again.