C · Chunking · context · contextual retrieval

Context window

The maximum text a model can read at once. Measured in tokens.

In one sentence

The context window is the maximum amount of text (measured in tokens, roughly word-fragments) that a language model can process in a single request — input prompt and output reply combined.

When it matters

When designing the retrieval budget: send too much and you waste latency + cost. Send too little and you lose recall.

A real-world example

Claude Sonnet 4.6 has a 1M-token window; helpcode KB retrieves the top 12 chunks (~6k tokens) by default to stay fast.


Curated by helpcode research team · Last reviewed 2026-05-22