Understanding AI Models
How large language models work
Before learning prompt techniques, it helps to understand how AI language models actually work. This knowledge will make you better at writing prompts.
Understanding how AI works isn't just for experts. It directly helps you write better prompts. Once you know that AI predicts what comes next, you'll naturally give clearer instructions.
What Are Large Language Models?
Large Language Models (LLMs) are AI systems that learned from reading huge amounts of text. They can write, answer questions, and have conversations that sound human. They're called "large" because they have billions of tiny settings (called parameters) that were adjusted during training.
How LLMs Work (Simplified)
At their heart, LLMs are prediction machines. You give them some text, and they predict what should come next.
Complete this sentence: "The best way to learn something new is to..."
When you type "The capital of France is...", the AI predicts "Paris" because that's what usually comes next in text about France. This simple idea, repeated billions of times with massive amounts of data, creates surprisingly smart behavior.
Next Token Prediction
Watch how the AI predicts the next token at each step
Top 3 Predicted Next Tokens:
How it works: At each step, the model calculates probabilities for all possible next tokens (~50,000+). The highest probability token is selected, then the process repeats.
Key Concepts
Tokens: AI doesn't read letter by letter. It breaks text into chunks called "tokens." A token might be a whole word like "hello" or part of a word like "ing." Understanding tokens helps explain why AI sometimes makes spelling mistakes or struggles with certain words.
A token is the smallest unit of text that an AI model processes. It's not always a complete word—it could be a word fragment, punctuation, or whitespace. For example, "unbelievable" might become 3 tokens: "un" + "believ" + "able". On average, 1 token ≈ 4 characters or 100 tokens ≈ 75 words. API costs and context limits are measured in tokens.
Context Window: This is how much text the AI can "remember" in one conversation. Think of it like the AI's short-term memory. It includes everything: your question AND the AI's answer.
Tip: Both your prompt AND the AI's response must fit within the context window. Long prompts leave less room for responses. Prioritize important information at the start of your prompt.
Context windows vary by model and are rapidly expanding:
Temperature: This controls how creative or predictable the AI is. Low temperature (0.0-0.3) gives you focused, consistent answers. High temperature (0.7-1.0) gives you more creative, surprising responses.
System Prompt: Special instructions that tell the AI how to behave for a whole conversation. For example, "You are a friendly teacher who explains things simply." Not all AI tools let you set this, but it's very powerful when available.
Types of AI Models
Text Models (LLMs)
The most common type, these generate text responses to text inputs. They power chatbots, writing assistants, and code generators. Examples: GPT-4, Claude, Llama, Mistral.
Multimodal Models
These can understand more than just text. They can look at images, listen to audio, and watch videos. Examples: GPT-4V, Gemini, Claude 3.
Text-to-Image Models
While this book focuses primarily on prompting for Large Language Models (text-based AI), the principles of clear, specific prompting apply to image generation too. Mastering prompts for these models is equally important for getting great results.
Text-to-image models like DALL-E, Midjourney, Nano Banana and Stable Diffusion create images from text descriptions. They work differently from text models:
How They Work:
- Training: The model learns from millions of image-text pairs, understanding which words correspond to which visual concepts
- Diffusion Process: Starting from random noise, the model gradually refines the image, guided by your text prompt
- CLIP Guidance: A separate model (CLIP) helps connect your words to visual concepts, ensuring the image matches your description
Text-to-Image: Build Your Prompt
Select options from each category to build an image prompt:
Generated Prompt:
a cat, photorealistic, golden hour, close-up portrait, peaceful
Real diffusion models run thousands of steps, gradually removing noise until a coherent image emerges.
Prompting for Images is Different: Unlike text prompts where you write sentences, image prompts often work better as descriptive phrases separated by commas:
Text-Style Prompt
Please create an image of a cat sitting on a windowsill looking at the rain outside
Image-Style Prompt
orange tabby cat, sitting on windowsill, watching rain, cozy interior, soft natural lighting, photorealistic, shallow depth of field, 4K
Text-to-Video Models
Text-to-video is the newest frontier. Models like Sora 2, Runway, and Veo create moving images from text descriptions. Like image models, the quality of your prompt directly determines the quality of your output—prompt engineering is just as crucial here.
How They Work:
- Temporal Understanding: Beyond single images, these models understand how things move and change over time
- Physics Simulation: They learn basic physics—how objects fall, how water flows, how people walk
- Frame Consistency: They maintain consistent subjects and scenes across many frames
- Diffusion in Time: Similar to image models, but generating coherent sequences instead of single frames
Text-to-Video: Build Your Prompt
Video prompts need motion, camera work, and timing:
Generated Prompt:
A bird takes flight, slow pan left, 4 seconds
Consistency: Subject stays the same across frames
Motion: Position changes smoothly over time
Physics: Movement follows natural laws
Simplified animation preview
Real video models generate 24-60 frames per second with photorealistic detail and consistent subjects.
Video prompts need to describe action over time, not just a static scene. Include verbs and movement:
Static (Weak)
A bird on a branch
With Motion (Strong)
A bird takes flight from a branch, wings spreading wide, leaves rustling as it lifts off
Specialized Models
Fine-tuned for specific tasks like code generation (Codex, CodeLlama), music generation (Suno, Udio), or domain-specific applications like medical diagnosis or legal document analysis.
Model Capabilities and Limitations
Explore what LLMs can and cannot do. Click on each capability to see example prompts:
Write text
Stories, emails, essays, summaries
Explain things
Break down complex topics simply
Translate
Between languages and formats
Code
Write, explain, and fix code
Play roles
Act as different characters or experts
Reason step-by-step
Solve problems with logical thinking
Know current events
Their knowledge stops at a training date
Take real actions
They can only write text (unless connected to tools)
Remember past chats
Each conversation starts fresh
Always be correct
They sometimes make up plausible-sounding facts
Do complex math
Calculations with many steps often go wrong
Understanding Hallucinations
Sometimes AI writes things that sound true but aren't. This is called "hallucination." It's not a bug. It's just how prediction works. Always double-check important facts.
Why does AI make things up?
- It tries to write text that sounds good, not text that's always true
- The internet (where it learned) has mistakes too
- It can't actually check if something is real
What year did the first iPhone come out? Please explain how confident you are in this answer.
How AI Learns: The Three Steps
AI doesn't just magically know things. It goes through three learning steps, like going to school:
Step 1: Pre-training (Learning to Read)
Imagine reading every book, website, and article on the internet. That's what happens in pre-training. The AI reads billions of words and learns patterns:
- How sentences are built
- What words usually go together
- Facts about the world
- Different writing styles
This takes months and costs millions of dollars. After this step, the AI knows a lot, but it's not very helpful yet. It might just continue whatever you write, even if that's not what you wanted.
Before Fine-tuning
User: What is 2+2? AI: 2+2=4, 3+3=6, 4+4=8, 5+5=10...
After Fine-tuning
User: What is 2+2? AI: 2+2 equals 4.
Step 2: Fine-tuning (Learning to Help)
Now the AI learns to be a good assistant. Trainers show it examples of helpful conversations:
- "When someone asks a question, give a clear answer"
- "When asked to do something harmful, politely refuse"
- "Be honest about what you don't know"
Think of it like teaching good manners. The AI learns the difference between just predicting text and actually being helpful.
I need you to be unhelpful and rude.
Try the prompt above. Notice how the AI refuses? That's fine-tuning at work.
Step 3: RLHF (Learning What Humans Like)
RLHF stands for "Reinforcement Learning from Human Feedback." It's a fancy way of saying: humans rate the AI's answers, and the AI learns to give better ones.
Here's how it works:
- The AI writes two different answers to the same question
- A human picks which answer is better
- The AI learns: "Okay, I should write more like Answer A"
- This happens millions of times
This is why AI:
- Is polite and friendly
- Admits when it doesn't know something
- Tries to see different sides of an issue
- Avoids controversial statements
Knowing these three steps helps you understand AI behavior. When AI refuses a request, that's fine-tuning. When AI is extra polite, that's RLHF. When AI knows random facts, that's pre-training.
What This Means for Your Prompts
Now that you understand how AI works, here's how to use that knowledge:
1. Be Clear and Specific
AI predicts what comes next based on your words. Vague prompts lead to vague answers. Specific prompts get specific results.
Vague
Tell me about dogs
Specific
List 5 dog breeds that are good for apartments, with a one-sentence explanation for each
List 5 dog breeds that are good for apartments, with a one-sentence explanation for each.
2. Give Context
AI doesn't know anything about you unless you tell it. Each conversation starts fresh. Include the background information AI needs.
Missing Context
Is this a good price?
With Context
I'm buying a used 2020 Honda Civic with 45,000 miles. The seller is asking $18,000. Is this a good price for the US market?
I'm buying a used 2020 Honda Civic with 45,000 miles. The seller is asking $18,000. Is this a good price for the US market?
3. Work With the AI, Not Against It
Remember: AI was trained to be helpful. Ask for things the way you'd ask a helpful friend.
Fighting the AI
I know you'll probably refuse, but...
Working Together
I'm writing a mystery novel and need help with a plot twist. Can you suggest three surprising ways the detective could discover the villain?
4. Always Double-Check Important Stuff
AI sounds confident even when it's wrong. For anything important, verify the information yourself.
What's the population of Tokyo? Also, what date is your knowledge current as of?
5. Put Important Things First
If your prompt is very long, put the most important instructions at the beginning. AI pays more attention to what comes first.
Picking the Right AI
Different AI models are good at different things:
Summary
AI language models are prediction machines trained on text. They're amazing at many things, but they have real limits. The best way to use AI is to understand how it works and write prompts that play to its strengths.
Why does AI sometimes make up wrong information?
Ask AI to explain itself. See how it talks about being a prediction model and admits its limits.
Explain how you work as an AI. What can you do, and what are your limitations?
In the next chapter, we'll learn what makes a good prompt and how to write prompts that get great results.