High Performance PostgreSQL for Rails
Reliable, Scalable, Maintainable Database Applications
by Andrew Atkinson
Build faster, more reliable Rails apps by taking the best advanced
PostgreSQL and Active Record capabilities, and using them to solve your
application scale and growth challenges. Gain the skills needed to
comfortably work with multi-terabyte databases, and with complex Active
Record, SQL, and specialized Indexes. Develop your skills with
PostgreSQL on your laptop, then take them into production, while keeping
everything in sync. Make slow queries fast, perform any schema or data
migration without errors, use scaling techniques like read/write
splitting, partitioning, and sharding, to meet demanding workload
requirements from Internet scale consumer apps to enterprise SaaS.
Deepen your firsthand knowledge of high-scale PostgreSQL databases and
Ruby on Rails applications with dozens of practical and hands-on
exercises. Unlock the mysteries surrounding complex Active Record. Make
any schema or data migration change confidently, without downtime. Grow
your experience with modern and exclusive PostgreSQL features like SQL
Merge, Returning, and Exclusion constraints. Put advanced capabilities
like Full Text Search and Publish Subscribe mechanisms built into
PostgreSQL to work in your Rails apps. Improve the quality of the data
in your database, using the advanced and extensible system of types and
constraints to reduce and eliminate application bugs. Tackle complex
topics like how to improve query performance using specialized indexes.
Discover how to effectively use built-in database functions and write
your own, administer replication, and make the most of partitioning and
foreign data wrappers. Use more than 40 well-supported open source tools
to extend and enhance PostgreSQL and Ruby on Rails. Gain invaluable
insights into database administration by conducting advanced
optimizations—including high-impact database maintenance—all while
solving real-world operational challenges.
Take your new skills into production today and then take your PostgreSQL
and Rails applications to a whole new level of reliability and
performance.
What You Need
PostgreSQL version 16, Ruby version 3.2, and Ruby on Rails 7.1. Docker.
A text editor.
- A computer running macOS, Linux, or Windows and WSL
- PostgreSQL version 16, installed by package manager, compiled, or
running with Docker
- An Internet connection
Resources
Releases:
- P1.0 2024/06/24
- B8.0 2024/05/24
- B7.0 2024/01/31
- B6.0 2023/12/22
Preface
- Getting Started
- An App to Get You Started
- What Is Rideshare?
- Active Record Schema Management Refresher
- Exploring Rideshare Dependencies
- Installing Application Dependencies
- Installing PostgreSQL on macOS
- Installing Rideshare
- Configuring PostgreSQL for Rideshare
- Configuring Database Access
- Learning PostgreSQL Terminology
- Learning SQL Terminology
- Ruby on Rails Terminology
- Conventions Used in This Book
- SQL Formatting Conventions
- Ruby and Rails Formatting Conventions
- You’re Ready
- Design and Build
- Administration Basics
- Touring psql Features
- Modifying Your PostgreSQL Config File
- Getting Started with Observability
- Glancing at Current Lock Behavior
- Generating Fake Data for Experiments
- Creating Indexes Using SQL
- Rolling Back Schema Modifications
- Exploring and Experimenting Safely in Production
- Building a Performance-Testing Database
- Generating Bigger Data
- Replacement Values That Are Statistically Similar
- Tracking Columns with Sensitive Information
- Comparing Direct Updates and Clone and Replace
- Starting an Email Scrubber Function
- Implementing the Scrub Email Function
- Understanding Clone and Replace Trade-Offs
- Speeding Up Inserts for Clone and Replace
- Using Direct Updates for Text Replacement
- Performing Database Maintenance
- Performing Updates in Batches
- What’s Next for Your Performance Database
- Data Correctness and Consistency
- Multiple Column Uniqueness
- Fixing Constraint Violations
- Enforcing Relationships with Foreign Keys
- The Versatile Check Constraint
- Deferring Constraint Checks
- Preventing Overlaps with an Exclusion Constraint
- Creating Active Record Custom Validators
- Significant Casing and Unique Constraints
- Storing Transformations in Generated Columns
- Constraining Values with Database Enums
- Sharing Domains Between Tables
- Automating Consistency Checks in Development
- Operate and Grow
- Modifying Busy Databases Without Downtime
excerpt
- Identifying Dangerous Migrations
- Learning from Unsafe Migrations
- Learning to Use CONCURRENTLY by Default
- Adopting a Migration Safety Check Process
- Exploring Strong Migrations Features
- Locking, Blocking, and Concurrency Refresher
- Prevent Excessive Queueing with a Lock Timeout
- Exploring Lock Type Queues
- Setting a statement_timeout
- Avoiding Schema Cache Errors
- Backfilling Large Tables Without Downtime
- Backfilling and Double Writing
- Separating Reads and Writes for Backfills
- Specialized Tables for Backfills
- Practicing Backfilling Techniques
- Wrapping Up
- Optimizing Active Record
- Preferring Active Record over SQL
- Query Logs to Connect SQL to App Code
- Common Active Record Problems
- Tooling to Find Problematic Query Patterns
- Use Eager Loading to Reduce Queries
- Eager Loading with .includes()
- Prefer Strict Loading over Lazy Loading
- Optimizing Active Record Queries
- Backgrounding Queries Using load_async
- Save a SELECT by Using RETURNING
- Restricting Queries Using a LIMIT
- Advanced Query Support in Active Record
- Using Common Table Expressions (CTE)
- Introducing Database Views for Rideshare
- Creating the Search Result Model with Scenic
- Improving Performance with Materialized Views
- Reducing Queries with Active Record Caches
- Prepared Statements with Active Record
- Replacing Slow Counts with Counter Caches
- Performing Aggregations in the Database
- Object Allocations in Active Record
- Wrapping Up
- Improving Query Performance
excerpt
- Active Support Instrumentation for Queries
- Capture Query Statistics in Your Database
- Using Query Statistics
- Introducing PgHero as a Performance Dashboard
- EXPLAIN Basics
- Reading Query Execution Plans
- Finding Missing Indexes
- Logging Slow Queries
- Automatically Gathering Execution Plans
- Perform Maintenance First
- What Are Index Scans?
- Tricks for Fast COUNT() Queries
- Query Plan Hints
- Using Code and SQL Analysis Tools
- Wrapping Up
- Optimized Indexes for Fast Retrieval
- Generating Data for Experiments
- Single Column and Multiple Column Indexes
- Understanding Index Column Ordering
- Indexing Boolean Columns
- Filtering Rows with Partial Indexes
- Transform Values with an Expression Index
- Using GIN Indexes with JSON
- Maintaining Unstructured JSON Data
- Using BRIN Indexes
- Hash Indexes over B-Tree?
- Using Indexes for Sorting
- Using Covering Indexes
- Wrapping Up
- High-Impact Database Maintenance
- Basics of Autovacuum
- Tuning Autovacuum Parameters
- Rebuilding Indexes Without Downtime
excerpt
- Running Manual Vacuums
- Simulating Bloat and Understanding Impact
- Removing Unused Indexes
- Pruning Duplicate and Overlapping Indexes
- Removing Indexes on Insert-Only Tables
- Scheduling Jobs Using pg_cron
- Conducting Maintenance Tune-Ups
- Reaching Greater Concurrency
- Monitoring Database Connections
- Exploring Current Activity
- Managing Idle Connections
- Setting Active Record Pool Size
- Running Out of Connections
- Working with PgBouncer
- Choosing Your PgBouncer Pooling Mode
- Identifying Connection Errors and Problems
- More Lock Monitoring with pg_locks
- Monitoring Row Locks
- Finding Lock Conflicts
- Using PgBadger for Lock Analysis
- Active Record Optimistic Locking
- Using Advisory Locks
- Lock Up on Your Way Out
- Optimize and Scale
- Scalability of Common Features
- Analyzing Schema Designs from Gems
- Understanding Queries from Tagging Gem
- LIMIT and OFFSET Pagination
- Database CURSOR Pagination
- Improved Performance with Keyset Pagination
- Wrapping Up
- Working with Bulk Data
- Creating a Bulk Data Generator Rake Task
- Batching with Active Record
- Handling Upsert Violations in Active Record
- Handling Conflicts with ON CONFLICT
- Beyond Active Record with activerecord-import
- Performing SQL Multirow Operations
- Upserts with SQL MERGE
- Working with pg_dump and pg_restore
- Populating Table Data with \COPY
- Creating a File Foreign Data Wrapper (FDW)
- Wrapping Up
- Scaling with Replication and Sharding
- Categorizing Query Workloads
- Enabling Physical Replication
- Creating a Replication User on the Primary
- Allowing Access for the Replication User
- Configuring the Replica Instance
- Creating the Replication Slot
- Active Record Multiple Databases Background
- Configuring Active Record Multiple Databases
- Multiple Roles with Active Record Models
- Using Automatic Role Switching
- Replication Slots and the Write Ahead Log (WAL)
- Sharding at the Application Level
- Migrating Multiple Database Schemas
- Using Horizontal Sharding for Multitenancy
- Using Subdomain-Based Routing
- Switching Shards Automatically
- Simulating Joins Across Databases
- Creating a Replica Using Logical Replication
- Customizing Replication Database Parameters
- Wrapping Up
- Boosting Performance with Partitioning
- Structure of Partitioned Tables
- Ruby on Rails Partitioning Support
- Choose Declarative Partitioning
- Deciding When to Partition
- Estimating Growth of Time-Oriented Data
- Use Partitioning to Help with Archiving
- Choosing Your Partition Column
- Range Partitioning with pgslice
- Data Migration Preparation for Rideshare
- Online Data Migration
- Row Copying Operational Tips
- Partitioning Gotcha: Primary Key Definition
- Partitioning Gotcha: Logical Replication Replica Identity
- Automate Partition Creation and Monitoring
- Retiring Unneeded Partitions
- Use LIST Partitioning for Known Divisions
- Use HASH Partitioning for a Fixed Amount of Buckets
- Performance Benefits from Partitioning
- Let’s Split
- Advanced Usages
- Advanced Uses and What’s Next
- Why You Shouldn’t Operate a Database Zoo
- Why You Should Just Use PostgreSQL
- Basic Analytics with PostgreSQL
- Pattern Match Searching
- Implementing Full-Text Search (FTS)
- Fuzzy Searching with tsvector
- Expanding FTS with Extensions
- Optimizing FTS with Specialized Indexes
- Using Trigrams with FTS
- Expanding FTS with Mixed Accents and Collations
- Storing and Searching Vector Embeddings
- Session Persistence and Rails Cache Without Redis
- Background Jobs Without Sidekiq
- Using Change Data Capture (CDC) and wal2json
- Zero Downtime Cutovers and Upgrades
- Closing Remarks
Author
Andrew Atkinson has worked as a Software Engineer with Ruby on Rails
and PostgreSQL for more than a decade. At Microsoft, Groupon, and
various startups, he’s built, mentored, and influenced teams, and
operated and scaled systems, improving their quality and reliability.
He’s presented on PostgreSQL and Ruby on Rails at conferences, appeared
on podcasts, and has written for the official Ruby on Rails weblog, with
the goal of helping developers solve their challenges using these
powerful open source technologies.