Context Engineering
RAG, embeddings, function calling, and MCP
Understanding context is essential for building AI applications that actually work. This chapter covers everything you need to know about giving AI the right information at the right time.
AI models are stateless. They don't remember past conversations. Every time you send a message, you need to include everything the AI needs to know. This is called "context engineering."
What is Context?
Context is all the information you give to AI alongside your question. Think of it like this:
No Context
What's the status?
With Context
You are a project manager assistant. The user is working on Project Alpha, which is due Friday. The last update was: 'Backend complete, frontend 80% done.' User: What's the status?
Without context, the AI has no idea what "status" you're asking about. With context, it can give a useful answer.
The Context Window
Remember from earlier chapters: AI has a limited "context window" - the maximum amount of text it can see at once. This includes:
Instructions that define AI behavior
Previous messages in this chat
Documents, data, or knowledge fetched for this query
The user's actual question
The answer (also counts toward the limit!)
AI is Stateless
AI doesn't remember anything between conversations. Every API call starts fresh. If you want the AI to "remember" something, YOU have to include it in the context every time.
This is why chatbots send your entire conversation history with each message. It's not that the AI remembers - it's that the app re-sends everything.
Pretend this is a new conversation with no history. What did I just ask you about?
The AI will say it doesn't know because it truly doesn't have access to any previous context.
RAG: Retrieval-Augmented Generation
RAG is a technique for giving AI access to knowledge it wasn't trained on. Instead of trying to fit everything into the AI's training, you:
- Store your documents in a searchable database
- Search for relevant documents when a user asks a question
- Retrieve the most relevant pieces
- Augment your prompt with those pieces
- Generate an answer using that context
How RAG Works:
Why RAG?
RAG Advantages
- Uses your actual, current data
- Reduces hallucinations
- Can cite sources
- Easy to update (just update documents)
- No expensive fine-tuning needed
When to Use RAG
- Customer support bots
- Documentation search
- Internal knowledge bases
- Any domain-specific Q&A
- When accuracy matters
Embeddings: How Search Works
How does RAG know which documents are "relevant"? It uses embeddings - a way to turn text into numbers that capture meaning.
What Are Embeddings?
An embedding is a list of numbers (a "vector") that represents the meaning of text. Similar meanings = similar numbers.
Embeddings Visualization
Click a word to see its vector and similarity to other words:
"happy" vector
Similarity to "happy"
Words with similar meanings (like "happy" and "joyful") have similar vectors, resulting in high similarity scores.
Semantic Search
With embeddings, you can search by meaning, not just keywords:
Keyword Search
Query: 'return policy' Finds: Documents containing 'return' and 'policy' Misses: 'How to get a refund'
Semantic Search
Query: 'return policy' Finds: All related documents including: - 'Refund guidelines' - 'How to send items back' - 'Money-back guarantee'
This is why RAG is so powerful - it finds relevant information even when the exact words don't match.
Function Calling / Tool Use
Function calling lets AI use external tools - like searching the web, checking a database, or calling an API.
Different AI providers call this different things: "function calling" (OpenAI), "tool use" (Anthropic/Claude), or "tools" (general term). They all mean the same thing.
How It Works
- You tell the AI what tools are available
- AI decides if it needs a tool to answer
- AI outputs a structured request for the tool
- Your code runs the tool and returns results
- AI uses the results to form its answer
This prompt shows how AI decides to use a tool:
You have access to these tools: 1. get_weather(city: string) - Get current weather for a city 2. search_web(query: string) - Search the internet 3. calculate(expression: string) - Do math calculations User: What's the weather like in Tokyo right now? Think step by step: Do you need a tool? Which one? What parameters?
Summarization: Managing Long Conversations
As conversations get longer, you'll hit the context window limit. Since AI is stateless (it doesn't remember anything), long conversations can overflow. The solution? Summarization.
The Problem
Without Summarization
Message 1 (500 tokens) Message 2 (800 tokens) Message 3 (600 tokens) ... 50 more messages ... ──────────────────── = 40,000+ tokens = OVER THE LIMIT!
With Summarization
[Summary]: 200 tokens Recent messages: 2,000 tokens Current query: 100 tokens ──────────────────── = 2,300 tokens = Fits perfectly!
Summarization Strategies
Different approaches work for different use cases. Click each strategy to see how it processes the same conversation:
Summarization Strategies
Summarize oldest messages, keep recent ones intact
Original Conversation
After Rolling Summary
Summary (20t)
User learning Python for data analysis. Covered: variables, numbers, lists basics.
Kept Messages (42t)
Saved 47%(116t → 62t)
What to Capture in Summaries
A good conversation summary preserves what matters:
Try It: Create a Summary
Practice creating a context-preserving summary from this conversation:
Summarize this conversation for context management. The summary will replace the full conversation in the AI's memory. CONVERSATION: User: Hi, I'm learning Python for data analysis Assistant: Welcome! Python is great for data analysis. What's your current experience level? User: I know basic Excel. Complete beginner with programming. Assistant: Perfect starting point! Let's begin with variables - they're like Excel cells that store data. User: Can you explain variables? Assistant: Variables are containers for storing data. In Python: name = "Alice" or age = 25 User: What about lists? I need to handle multiple values. Assistant: Lists are like Excel columns! Create one like: prices = [10, 20, 30]. Access items with prices[0]. User: Can I do calculations on lists? Assistant: Yes! Use sum(prices), len(prices), or max(prices). For complex analysis, we'll use pandas later. User: What's pandas? Assistant: Pandas is a library for data analysis - think "Excel on steroids". It has DataFrames (like spreadsheets). CREATE A SUMMARY that captures: 1. User's goal and background (1 sentence) 2. Topics covered so far (1 sentence) 3. User's learning style/preferences (1 sentence) 4. What to cover next (1 sentence)
When to Summarize
You are managing a conversation's context window. Given these conditions, decide when to trigger summarization: CONTEXT WINDOW: 8,000 tokens max CURRENT USAGE: - System prompt: 500 tokens - Conversation history: 6,200 tokens - Buffer for response: 1,500 tokens RULES: - Summarize when history exceeds 70% of available space - Keep last 5 messages intact - Preserve all user preferences and decisions Should you summarize now? If yes, what messages should be summarized vs kept intact?
MCP: Model Context Protocol
MCP (Model Context Protocol) is a standard way to connect AI to external data and tools. Instead of building custom integrations for each AI provider, MCP provides a universal interface.
Why MCP?
Build separate integrations for ChatGPT, Claude, Gemini... Maintain multiple codebases. Break when APIs change.
Build once, works everywhere. Standard protocol. AI can discover and use your tools automatically.
MCP Provides
- Resources: Data the AI can read (files, database records, API responses)
- Tools: Actions the AI can take (search, create, update, delete)
- Prompts: Pre-built prompt templates
This platform has an MCP server! You can connect it to Claude Desktop or other MCP-compatible clients to search and use prompts directly from your AI assistant.
Building Context: The Complete Picture
Context Playground
Toggle context blocks on/off to see how they combine. Watch the token count!
--- SYSTEM PROMPT --- You are a helpful customer support agent for TechStore. Be friendly and concise. --- RETRIEVED DOCUMENTS (RAG) --- From knowledge base: - Return policy: 30 days, original packaging required - Shipping: Free over $50 - Warranty: 1 year on electronics --- CONVERSATION HISTORY --- [Summary] User asked about order #12345. Product: Wireless Mouse. Status: Shipped yesterday. User: When will it arrive? Assistant: Based on standard shipping, it should arrive in 3-5 business days. --- USER QUERY --- Can I return it if I don't like it?
Best Practices
Summary
Context engineering is about giving AI the right information:
- AI is stateless - include everything it needs every time
- RAG retrieves relevant documents to augment prompts
- Embeddings enable semantic search (meaning, not just keywords)
- Function calling lets AI use external tools
- Summarization manages long conversations
- MCP standardizes how AI connects to data and tools
The quality of AI output depends on the quality of context you provide. Better context = better answers.