A good prompt gets the job done. An optimized prompt gets the job done efficiently—faster, cheaper, more consistently. This chapter teaches you how to systematically improve prompts across multiple dimensions.

Try the Prompt Enhancer

Want to optimize your prompts automatically? Use our Prompt Enhancer tool. It analyzes your prompt, applies optimization techniques, and shows you similar community prompts for inspiration.

The Optimization Trade-offs

Every optimization involves trade-offs. Understanding these helps you make intentional choices:

Quality vs. Cost

Higher quality often requires more tokens or better models

Adding examples improves accuracy but increases token count

Speed vs. Quality

Faster models may sacrifice some capability

GPT-4 is smarter but slower than GPT-4o-mini

Consistency vs. Creativity

Lower temperature = more predictable but less creative

Temperature 0.2 for facts, 0.8 for brainstorming

Simplicity vs. Robustness

Edge case handling adds complexity

Simple prompts fail on unusual inputs

Measuring What Matters

Before optimizing, define success. What does "better" mean for your use case?

Accuracy

How often is the output correct?

90% of code suggestions compile without errors

Relevance

Does it address what was actually asked?

Response directly answers the question vs. tangents

Completeness

Are all requirements covered?

All 5 requested sections included in output

Latency

How long until the response arrives?

p50 < 2s, p95 < 5s for chat applications

Token Efficiency

How many tokens for the same result?

500 tokens vs. 1500 tokens for equivalent output

Consistency

How similar are outputs for similar inputs?

Same question gets structurally similar answers

What Do p50 and p95 Mean?

Percentile metrics show response time distribution. p50 (median) means 50% of requests are faster than this value. p95 means 95% are faster—it catches slow outliers. If your p50 is 1s but p95 is 10s, most users are happy but 5% experience frustrating delays.

Define Your Success Metrics

Use this template to clarify what you're optimizing for before making changes.

Help me define success metrics for my prompt optimization.

**My use case**: ${useCase}
**Current pain points**: ${painPoints}

For this use case, help me define:

1. **Primary metric**: What single metric matters most?
2. **Secondary metrics**: What else should I track?
3. **Acceptable trade-offs**: What can I sacrifice for the primary metric?
4. **Red lines**: What quality level is unacceptable?
5. **How to measure**: Practical ways to evaluate each metric

Token Optimization

Tokens cost money and add latency. Here's how to say the same thing with fewer tokens.

The Compression Principle

Verbose (67 tokens)

I would like you to please help me with the following task. I need you to take the text that I'm going to provide below and create a summary of it. The summary should capture the main points and be concise. Please make sure to include all the important information. Here is the text:

[text]

Concise (12 tokens)

Summarize this text, capturing main points concisely:

[text]

Same result, 82% fewer tokens.

Token-Saving Techniques

Cut Pleasantries

"Please" and "Thank you" add tokens without improving output

"Please summarize" → "Summarize"

Eliminate Redundancy

Don't repeat yourself or state the obvious

"Write a summary that summarizes" → "Summarize"

Use Abbreviations

Where meaning is clear, abbreviate

"for example" → "e.g."

Reference by Position

Point to content instead of repeating it

"the text above" instead of re-quoting

Prompt Compressor

Paste a verbose prompt to get a token-optimized version.

Compress this prompt while preserving its meaning and effectiveness:

Original prompt:
"${verbosePrompt}"

Instructions:
1. Remove unnecessary pleasantries and filler words
2. Eliminate redundancy
3. Use concise phrasing
4. Keep all essential instructions and constraints
5. Maintain clarity—don't sacrifice understanding for brevity

Provide:
- **Compressed version**: The optimized prompt
- **Token reduction**: Estimated percentage saved
- **What was cut**: Brief explanation of what was removed and why it was safe to remove

Quality Optimization

Sometimes you need better outputs, not cheaper ones. Here's how to improve quality.

Accuracy Boosters

Add Verification

Ask the model to check its own work

"...then verify your answer is correct"

Request Confidence

Make uncertainty explicit

"Rate your confidence 1-10 and explain any uncertainty"

Multiple Approaches

Get different perspectives, then choose

"Provide 3 approaches and recommend the best one"

Explicit Reasoning

Force step-by-step thinking

"Think step by step and show your reasoning"

Consistency Boosters

Detailed Format Specs

Show exactly what output should look like

Include a template or schema

Few-Shot Examples

Provide 2-3 examples of ideal output

"Here's what good looks like: [examples]"

Lower Temperature

Reduce randomness for more predictable output

Temperature 0.3-0.5 for consistent results

Output Validation

Add a validation step for critical fields

"Verify all required fields are present"

Quality Enhancer

Add quality-improving elements to your prompt.

Enhance this prompt for higher quality outputs:

Original prompt:
"${originalPrompt}"

**What quality issue I'm seeing**: ${qualityIssue}

Add appropriate quality boosters:
1. If accuracy is the issue → add verification steps
2. If consistency is the issue → add format specifications or examples
3. If relevance is the issue → add context and constraints
4. If completeness is the issue → add explicit requirements

Provide the enhanced prompt with explanations for each addition.

Latency Optimization

When speed matters, every millisecond counts.

Model Selection by Speed Need

Real-time (< 500ms)

Use smallest effective model + aggressive caching

GPT-4o-mini, Claude Haiku, cached responses

Interactive (< 2s)

Fast models, streaming enabled

GPT-4o-mini with streaming

Tolerant (< 10s)

Mid-tier models, balance quality/speed

GPT-4o, Claude Sonnet

Async/Batch

Use best model, process in background

GPT-4, Claude Opus for offline processing

Speed Techniques

Shorter Prompts

Fewer input tokens = faster processing

Compress prompts, remove unnecessary context

Limit Output

Set max_tokens to prevent runaway responses

max_tokens: 500 for summaries

Use Streaming

Get first tokens faster, better UX

Stream for any response > 100 tokens

Cache Aggressively

Don't recompute identical queries

Cache common questions, template outputs

Cost Optimization

At scale, small savings multiply into significant budget impact.

Understanding Costs

Use this calculator to estimate your API costs across different models:

API Cost Calculator

Input Tokens (per request)

Input Price ($ per 1M tokens)

Output Tokens (per request)

Output Price ($ per 1M tokens)

Requests per Day

Per Request

$0.0002

Daily Cost

$0.195

Monthly Cost

$5.85

(500 × $0.15/1M) + (200 × $0.6/1M) = $0.0002/request

Cost Reduction Strategies

Model Routing

Use expensive models only when needed

Simple questions → GPT-4o-mini, Complex → GPT-4

Prompt Efficiency

Shorter prompts = lower cost per request

Cut 50% of tokens = 50% input cost savings

Output Control

Limit response length when full detail isn't needed

"Answer in 2-3 sentences" vs. unlimited

Batching

Combine related queries into single requests

Analyze 10 items in one prompt vs. 10 separate calls

Pre-filtering

Don't send requests that don't need AI

Keyword matching before expensive classification

The Optimization Loop

Optimization is iterative. Here's a systematic process:

Step 1: Establish Baseline

You can't improve what you don't measure. Before changing anything, document your starting point rigorously.

Prompt Documentation

Save the exact prompt text, including system prompts and any templates

Version control your prompts like code

Test Set

Create 20-50 representative inputs that cover common cases and edge cases

Include easy, medium, and hard examples

Quality Metrics

Score each output against your success criteria

Accuracy %, relevance score, format compliance

Performance Metrics

Measure tokens and timing for each test case

Avg input: 450 tokens, Avg output: 200 tokens, p50 latency: 1.2s

Baseline Documentation Template

Use this to create a comprehensive baseline before optimizing.

Create a baseline documentation for my prompt optimization project.

**Current prompt**:
"${currentPrompt}"

**What the prompt does**: ${promptPurpose}

**Current issues I'm seeing**: ${currentIssues}

Generate a baseline documentation template with:

1. **Prompt Snapshot**: The exact prompt text (for version control)

2. **Test Cases**: Suggest 10 representative test inputs I should use, covering:
 - 3 typical/easy cases
 - 4 medium complexity cases  
 - 3 edge cases or difficult inputs

3. **Metrics to Track**:
 - Quality metrics specific to this use case
 - Efficiency metrics (tokens, latency)
 - How to score each metric

4. **Baseline Hypothesis**: What do I expect the current performance to be?

5. **Success Criteria**: What numbers would make me satisfied with optimization?

Step 2: Form a Hypothesis

Vague goal

I want to make my prompt better.

Testable hypothesis

If I add 2 few-shot examples, accuracy will improve from 75% to 85% because the model will learn the expected pattern.

Step 3: Test One Change

Change one thing at a time. Run both versions on the same test inputs. Measure the metrics that matter.

Step 4: Analyze and Decide

Did it work? Keep the change. Did it hurt? Revert. Was it neutral? Revert (simpler is better).

Step 5: Repeat

Generate new hypotheses based on what you learned. Keep iterating until you hit your targets or reach diminishing returns.

Optimization Checklist

Before Deploying an Optimized Prompt0/8

You have a prompt that works well but costs too much at scale. What's the FIRST thing you should do?