4. Key tools, technologies, and terms

Reasoning and context windows

Reasoning is the process of drawing conclusions based on information provided. Your weather app tells you that it’s raining and you can see water splattering your windows, so you reason your way to the conclusion that you should wear a raincoat. In AI, reasoning refers to a large language model (LLM)’s ability to reach logical conclusions based on the data or knowledge it has access to. The data you provide when asking your question is part of the context window—its working memory for that session—which it can use in addition to its training data to better reason out its responses.

A context window is a fixed amount of content that an LLM can use as input when understanding and generating language. Context windows are measured in tokens, which can be words, fragments of words, individual characters, special characters, and punctuation. In essence, a context window is an LLM’s working memory: all the information it has access to in order to formulate responses to your prompts.

Context windows for leading models are getting longer

Context window length is a crucial consideration for AI applications that need to understand long texts at a deep level and/or generate extensive content. Longer context windows enable the model to provide more nuanced, intelligible responses because it can consider more information before answering. In May 2024, Google doubled the size of the context window for its Gemini 1.5 Pro model from one million tokens to two. That same month, OpenAI released its new multimodal model, GPT-4o, which can reason across audio, visual, and text content at twice the speed and half the cost of the previous generation model, GPT-4 Turbo.

The growing length of context windows has changed how developers interact with LLMs, for instance by creating much more detailed “mega prompts,” according to AI expert Andrew Ng. “The reasoning capability of GPT-4 and other advanced models makes them quite good at interpreting complex prompts with detailed instructions,” he writes. “Many people are used to dashing off a quick, 1- to 2-sentence query to an LLM. In contrast, when building applications, I see sophisticated teams frequently writing prompts that might be 1 to 2 pages long (my teams call them ‘mega-prompts’) that provide complex instructions to specify in detail how we’d like an LLM to perform a task.”

Bigger isn’t always better

There are inevitable tradeoffs with longer context windows. They require more computing power, memory, and storage. They increase operational cost and consume more resources. Longer context windows also don’t necessarily translate to a better-performing model or more accurate answers. In fact, longer context windows introduce more opportunities for the model to hallucinate. “Beyond hardware and performance, larger context windows can affect the data processing pipeline, model fine-tuning, and even the design of applications that utilize these AI models,” cautions AI researcher Matt White.

Retrieval-augmented generation (RAG) is one way of improving AI output without lengthening context windows and shouldering the additional costs that entails. RAG allows you to narrow the dataset that the LLM queries when it looks for information to form its answer. Even if the model is working within a small context window, it can draw on contextually relevant information from sources outside the context window. RAG systems ground LLM responses in specific data and back them up with sources, hugely enhancing the model’s capacity to generate accurate, comprehensive, and context-rich answers.

Prompt engineering, or “carefully crafting prompts to include only the most relevant information,” as White puts it, is another way developers can maximize results with smaller context windows: After all, White writes, “The goal is not merely to process a larger swath of text but to enhance the relevance and coherence of what’s generated.”

The recent buzz is that OpenAI researchers think they’re on the verge of creating AI models that can reason at a human level. As models become more advanced in their reasoning abilities, they could unlock the ability to do more with less, requiring shorter context windows to provide equally accurate and insightful responses.