C · Chunking · context · contextual retrieval
Context window
The maximum text a model can read at once. Measured in tokens.
In one sentence
The context window is the maximum amount of text (measured in tokens, roughly word-fragments) that a language model can process in a single request — input prompt and output reply combined.
When it matters
When designing the retrieval budget: send too much and you waste latency + cost. Send too little and you lose recall.
A real-world example
Claude Sonnet 4.6 has a 1M-token window; helpcode KB retrieves the top 12 chunks (~6k tokens) by default to stay fast.
Curated by helpcode research team · Last reviewed 2026-05-22