Advanced

Context Engineering

RAG, embeddings, function calling, and MCP

Understanding context is essential for building AI applications that actually work. This chapter covers everything you need to know about giving AI the right information at the right time.

Why Context Matters

AI models are stateless. They don't remember past conversations. Every time you send a message, you need to include everything the AI needs to know. This is called "context engineering."

What is Context?

Context is all the information you give to AI alongside your question. Think of it like this:

No Context

What's the status?

With Context

You are a project manager assistant. The user is working on Project Alpha, which is due Friday. The last update was: 'Backend complete, frontend 80% done.'

User: What's the status?

Without context, the AI has no idea what "status" you're asking about. With context, it can give a useful answer.

The Context Window

Remember from earlier chapters: AI has a limited "context window" - the maximum amount of text it can see at once. This includes:

System Prompt

Instructions that define AI behavior

Conversation History

Previous messages in this chat

Retrieved Information

Documents, data, or knowledge fetched for this query

Current Query

The user's actual question

AI Response

The answer (also counts toward the limit!)

AI is Stateless

Important Concept

AI doesn't remember anything between conversations. Every API call starts fresh. If you want the AI to "remember" something, YOU have to include it in the context every time.

This is why chatbots send your entire conversation history with each message. It's not that the AI remembers - it's that the app re-sends everything.

Pretend this is a new conversation with no history.

What did I just ask you about?

The AI will say it doesn't know because it truly doesn't have access to any previous context.

RAG: Retrieval-Augmented Generation

RAG is a technique for giving AI access to knowledge it wasn't trained on. Instead of trying to fit everything into the AI's training, you:

  1. Store your documents in a searchable database
  2. Search for relevant documents when a user asks a question
  3. Retrieve the most relevant pieces
  4. Augment your prompt with those pieces
  5. Generate an answer using that context

How RAG Works:

1User asks: "What's our refund policy?"
2System searches your documents for "refund policy"
3Finds relevant section from your policy document
4Sends to AI: "Based on this policy: [text], answer: What's our refund policy?"
5AI generates accurate answer using your actual policy

Why RAG?

RAG Advantages

  • Uses your actual, current data
  • Reduces hallucinations
  • Can cite sources
  • Easy to update (just update documents)
  • No expensive fine-tuning needed

When to Use RAG

  • Customer support bots
  • Documentation search
  • Internal knowledge bases
  • Any domain-specific Q&A
  • When accuracy matters

Embeddings: How Search Works

How does RAG know which documents are "relevant"? It uses embeddings - a way to turn text into numbers that capture meaning.

What Are Embeddings?

An embedding is a list of numbers (a "vector") that represents the meaning of text. Similar meanings = similar numbers.

Embeddings Visualization

Click a word to see its vector and similarity to other words:

"happy" vector

d1:
0.82
d2:
0.75
d3:
0.15
d4:
0.91

Similarity to "happy"

happy
100%
joyful
100%
delighted
100%
angry
66%
furious
62%
unhappy
45%
sad
42%

Words with similar meanings (like "happy" and "joyful") have similar vectors, resulting in high similarity scores.

Semantic Search

With embeddings, you can search by meaning, not just keywords:

Keyword Search

Query: 'return policy'
Finds: Documents containing 'return' and 'policy'
Misses: 'How to get a refund'

Semantic Search

Query: 'return policy'
Finds: All related documents including:
- 'Refund guidelines'
- 'How to send items back'
- 'Money-back guarantee'

This is why RAG is so powerful - it finds relevant information even when the exact words don't match.

Function Calling / Tool Use

Function calling lets AI use external tools - like searching the web, checking a database, or calling an API.

Also Called

Different AI providers call this different things: "function calling" (OpenAI), "tool use" (Anthropic/Claude), or "tools" (general term). They all mean the same thing.

How It Works

  1. You tell the AI what tools are available
  2. AI decides if it needs a tool to answer
  3. AI outputs a structured request for the tool
  4. Your code runs the tool and returns results
  5. AI uses the results to form its answer
Function Calling Example

This prompt shows how AI decides to use a tool:

You have access to these tools:

1. get_weather(city: string) - Get current weather for a city
2. search_web(query: string) - Search the internet
3. calculate(expression: string) - Do math calculations

User: What's the weather like in Tokyo right now?

Think step by step: Do you need a tool? Which one? What parameters?

Summarization: Managing Long Conversations

As conversations get longer, you'll hit the context window limit. Since AI is stateless (it doesn't remember anything), long conversations can overflow. The solution? Summarization.

The Problem

Without Summarization

Message 1 (500 tokens)
Message 2 (800 tokens)
Message 3 (600 tokens)
... 50 more messages ...
────────────────────
= 40,000+ tokens
= OVER THE LIMIT!

With Summarization

[Summary]: 200 tokens
Recent messages: 2,000 tokens
Current query: 100 tokens
────────────────────
= 2,300 tokens
= Fits perfectly!

Summarization Strategies

Different approaches work for different use cases. Click each strategy to see how it processes the same conversation:

Summarization Strategies

Summarize oldest messages, keep recent ones intact

Original Conversation

U: Hi, I want to learn Python(8t)
A: Great choice! What's your goal?(10t)
U: Data analysis for my job(7t)
A: Perfect. Let's start with variables.(12t)
U: What are variables?(5t)
A: Variables store data like name = 'Alice'(14t)
U: Can I store numbers?(6t)
A: Yes! age = 25 or price = 19.99(12t)
U: What about lists?(5t)
A: Lists hold multiple values: [1, 2, 3](14t)
U: How do I loop through them?(7t)
A: Use for loops: for x in list: print(x)(16t)

After Rolling Summary

Summary (20t)

User learning Python for data analysis. Covered: variables, numbers, lists basics.

Kept Messages (42t)

U: What about lists?
A: Lists hold multiple values: [1, 2, 3]
U: How do I loop through them?
A: Use for loops: for x in list: print(x)

Saved 47%(116t → 62t)

What to Capture in Summaries

A good conversation summary preserves what matters:

Summary Checklist0/6

Try It: Create a Summary

Conversation Summarizer

Practice creating a context-preserving summary from this conversation:

Summarize this conversation for context management. The summary will replace the full conversation in the AI's memory.

CONVERSATION:
User: Hi, I'm learning Python for data analysis
Assistant: Welcome! Python is great for data analysis. What's your current experience level?
User: I know basic Excel. Complete beginner with programming.
Assistant: Perfect starting point! Let's begin with variables - they're like Excel cells that store data.
User: Can you explain variables?
Assistant: Variables are containers for storing data. In Python: name = "Alice" or age = 25
User: What about lists? I need to handle multiple values.
Assistant: Lists are like Excel columns! Create one like: prices = [10, 20, 30]. Access items with prices[0].
User: Can I do calculations on lists?
Assistant: Yes! Use sum(prices), len(prices), or max(prices). For complex analysis, we'll use pandas later.
User: What's pandas?
Assistant: Pandas is a library for data analysis - think "Excel on steroids". It has DataFrames (like spreadsheets).

CREATE A SUMMARY that captures:
1. User's goal and background (1 sentence)
2. Topics covered so far (1 sentence)  
3. User's learning style/preferences (1 sentence)
4. What to cover next (1 sentence)

When to Summarize

You are managing a conversation's context window. Given these conditions, decide when to trigger summarization:

CONTEXT WINDOW: 8,000 tokens max
CURRENT USAGE:
- System prompt: 500 tokens
- Conversation history: 6,200 tokens  
- Buffer for response: 1,500 tokens

RULES:
- Summarize when history exceeds 70% of available space
- Keep last 5 messages intact
- Preserve all user preferences and decisions

Should you summarize now? If yes, what messages should be summarized vs kept intact?

MCP: Model Context Protocol

MCP (Model Context Protocol) is a standard way to connect AI to external data and tools. Instead of building custom integrations for each AI provider, MCP provides a universal interface.

Why MCP?

Without MCP

Build separate integrations for ChatGPT, Claude, Gemini... Maintain multiple codebases. Break when APIs change.

With MCP

Build once, works everywhere. Standard protocol. AI can discover and use your tools automatically.

MCP Provides

  • Resources: Data the AI can read (files, database records, API responses)
  • Tools: Actions the AI can take (search, create, update, delete)
  • Prompts: Pre-built prompt templates
prompts.chat Uses MCP

This platform has an MCP server! You can connect it to Claude Desktop or other MCP-compatible clients to search and use prompts directly from your AI assistant.

Building Context: The Complete Picture

Context Playground

137 / 200 tokens

Toggle context blocks on/off to see how they combine. Watch the token count!

--- SYSTEM PROMPT ---
You are a helpful customer support agent for TechStore. Be friendly and concise.

--- RETRIEVED DOCUMENTS (RAG) ---
From knowledge base:
- Return policy: 30 days, original packaging required
- Shipping: Free over $50
- Warranty: 1 year on electronics

--- CONVERSATION HISTORY ---
[Summary] User asked about order #12345. Product: Wireless Mouse. Status: Shipped yesterday.

User: When will it arrive?
Assistant: Based on standard shipping, it should arrive in 3-5 business days.

--- USER QUERY ---
Can I return it if I don't like it?

Best Practices

Context Engineering Checklist0/7

Summary

Context engineering is about giving AI the right information:

  • AI is stateless - include everything it needs every time
  • RAG retrieves relevant documents to augment prompts
  • Embeddings enable semantic search (meaning, not just keywords)
  • Function calling lets AI use external tools
  • Summarization manages long conversations
  • MCP standardizes how AI connects to data and tools
Remember

The quality of AI output depends on the quality of context you provide. Better context = better answers.