Context Window

Simon BudziakCTO
The Context Window (also called "context length" or "context limit") is the maximum amount of text—measured in tokens—that a language model can process and remember in a single interaction. It represents the model's "working memory" and is one of the most important architectural constraints in LLM applications.
The context window includes everything the model sees: your system prompt, the conversation history, any documents you've provided, and the model's own responses. For example, if a model has a 128,000 token context window and you provide a 100,000 token document, you have only 28,000 tokens left for conversation, instructions, and the model's response.
Modern models have dramatically expanded context windows:
The context window includes everything the model sees: your system prompt, the conversation history, any documents you've provided, and the model's own responses. For example, if a model has a 128,000 token context window and you provide a 100,000 token document, you have only 28,000 tokens left for conversation, instructions, and the model's response.
Modern models have dramatically expanded context windows:
- Early Models (2020-2022): 2K-4K tokens (roughly 1,500-3,000 words) - enough for short conversations.
- Mid-Generation (2023): 32K-100K tokens - enabling analysis of entire research papers or codebases.
- Latest Models (2024+): 200K+ tokens (Claude 4.5), 1M+ tokens (Gemini 3 Pro) - processing entire books or complex multi-file projects.
- Cost: More context = more computation = higher API costs.
- Latency: Processing massive contexts takes longer.
- Attention Dilution: Models may struggle to utilize information from the middle of very long contexts (the "lost in the middle" problem).