A Common-Sense Guide to AI Engineering
Build Production-Ready LLM Applications
by Jay Wengrow
Want to build an LLM-powered app but don’t know where to begin? Can’t
get past a proof-of-concept? With this step-by-step guide, you can
master the underlying principles of AI engineering by building an
LLM-powered app from the ground up. Tame unpredictable models with
prompt and context engineering. Use evals to keep them on track. Give
chatbots the knowledge to answer anything a user wants to know. Equip
agents with the tools and smarts to actually get the job done. By the
end, you’ll have the intuition and the confidence to build on top of
LLMs in the real world.
Fragmented documentation, obsolete tutorials, and frameworks that
deliver a prototype but flop in production can make AI engineering feel
overwhelming. But it doesn’t have to be that way. With real-world code
and step-by-step instructions as your guide, you can learn to build
robust LLM-powered apps from the ground up while mastering both the
how and why of the most crucial underlying concepts.
Harness context engineering and retrieval systems to create AI
assistants that understand your proprietary data. Create chatbots that
answer organization-specific questions and help solve users’ issues.
Design agents that conduct research, make decisions, and take action in
the real world. Level up your prompt engineering and get an LLM to do
your bidding—-not its own. Use automated evals to keep constant tabs
on your app’s quality while setting up guardrails to protect your users
and organization. And implement observability systems that make it easy
to debug your app when things do go wrong.
With a systematic approach grounded in the core principles of building
AI apps for real users, you’ll easily evolve and adapt even as the hype
and tools come and go.
What You Need
To explore and execute the book’s code, you’ll need your favorite text
editor and an environment that can run Python 3. You’ll also be setting
up a paid account with OpenAI (for just a few dollars).
Resources
Releases:
Note: Contents and extracts of beta books will change as the book is developed.
- Table of
Contents

- Foundations
- HeLLMo,
World!
- Signing Up for an LLM-as-a-Service
- Creating Our First App
- Tweaking the Model and Temperature
- Checking API Usage
- Wrapping Up
- Understanding How LLMs Work
- Diving Deeper into LLMs
- Diving into Tokens
- Diving into Embeddings
- Diving into Fine-Tuning Behavior
- Wrapping Up
- Selecting an LLM
- Getting Your Hands on an LLM
- Comparing Different LLMs
- Deciding on an LLM
- Wrapping Up
- Chatbots
- Building a Chatbot
- Getting User Input
- Augmenting the Prompt
- Adding Multi-Turn Dialogue
- Managing State with Memory Systems
- Adding a Developer Message
- Treating the Prompt as an Array
- Wrapping Up
- Augmenting a Prompt with
Knowledge
- Building a Chatbot
- Augmenting with Knowledge
- Avoiding Context Window Limitations
- Preparing the Data
- Implementing the Knowledge Chatbot
- Running into PACKing Problems
- Wrapping Up
- Efficiently Adding Knowledge with RAG
- Augmenting with Documentation Chunks
- Getting into Search Engines, Retrieval, and RAG
- Searching with Meaning: Keywords Versus Semantics
- Using Embedding-Similarity Search
- Building a Starter Search Engine
- Implementing a RAG Chatbot
- Choosing the Right K
- Wrapping Up
- Measuring Quality with Evals
- Introducing Evals
- Setting Up Our App
- Conducting Error Analysis
- Open Coding
- Axial Coding
- Creating an Eval Test Framework
- Running Human Evals
- Wrapping Up
- Prompt Engineering
- Eliminating Ambiguity
- Utilizing the System Prompt
- Rewriting History
- Using Delimiters and Bullet Points
- Reordering Prompt Components
- Wrapping Up
- Reducing Hallucinations
- Understanding Why Our App Hallucinates
- Instructing the LLM to be Faithful
- Pleading and Threatening
- Upgrading the Model
- Citing Sources and Few-Shot Prompting
- Iterate, Iterate, Iterate
- Reviewing Our Current Chatbot Implementation
- Final Prompt Engineering Thoughts
- Checking On Our Evals
- Wrapping Up
- Evaluating and Optimizing RAG
- Discovering a RAG Failure
- Evaluating RAG
- Expanding the Query
- Metadata-Based Filtering
- Evaluating RAG Subcomponents
- Dreaming Up an Agentic RAG Wish List
- Wrapping Up
- Agents
- Equipping an LLM with
Tools
- Understanding an LLM’s Limitations
- Triggering a Function
- Defining “Agents”
- Feeding Tool Results Back to the LLM
- Building a Website Reader Tool
- Deciding to Use a Tool
- Using the Tools API
- Wrapping Up
- Running the Agent Loop
- Solving a Complex Problem
- Constructing an Agent Loop
- Building a News Podcast Agent
- Exploring Agent Failure Modes and Evals
- Giving the Agent a Plan
- Asking the Agent to Create a Plan
- Wrapping Up
- Architecting Agentic
Workflows
- Designing an LLM Assembly Line
- Implementing an LLM Assembly Line
- Weighing Agentic Workflows Against Classic Agent Loops
- Workflow Routing
- Performing Tasks in Parallel
- Wrapping Up
- Enhancing Retrieval with Agentic RAG
- Architecting an Agentic RAG Plan
- Implementing a RAG Agent
- Avoiding Unnecessary RAG
- Generating Structured Outputs
- Researching as an Agent
- Conducting Multi-Hop Research
- Wrapping Up
- Building System-Integrated Agents
- Production
- Observing AI Systems
- Adding Guardrails
- Exception Handling
- Using an LLM-as-Judge
Author
Jay Wengrow is an experienced educator and software engineer. He is the
founder of Actualize, a software and AI engineering education company,
and specializes in making advanced technical topics approachable for
professionals across industries. He is also the author of the popular
Common-Sense Guide to Data Structures and Algorithms book series.