High Performance PostgreSQL for Rails
Reliable, Scalable, Maintainable Database Applications
by Andrew Atkinson
Build faster, more reliable Rails apps by taking the best advanced
PostgreSQL and Active Record capabilities, and using them to solve your
application scale and growth challenges. Gain the skills needed to
comfortably work with multi-terabyte databases, and with complex Active
Record, SQL, and specialized Indexes. Develop your skills with
PostgreSQL on your laptop, then take them into production, while keeping
everything in sync. Make slow queries fast, perform any schema or data
migration without errors, use scaling techniques like read/write
splitting, partitioning, and sharding, to meet demanding workload
requirements from Internet scale consumer apps to enterprise SaaS.
Deepen your firsthand knowledge of high-scale PostgreSQL databases and
Ruby on Rails applications with dozens of practical and hands-on
exercises. Unlock the mysteries surrounding complex Active Record. Make
any schema or data migration change confidently, without downtime. Grow
your experience with modern and exclusive PostgreSQL features like SQL
Merge, Returning, and Exclusion constraints. Put advanced capabilities
like Full Text Search and Publish Subscribe mechanisms built into
PostgreSQL to work in your Rails apps. Improve the quality of the data
in your database, using the advanced and extensible system of types and
constraints to reduce and eliminate application bugs. Tackle complex
topics like how to improve query performance using specialized Indexes.
Discover how to effectively use built-in database functions and write
your own, administer replication, and make the most of Partitioning and
Foreign Data Wrappers. Use more than 40 well-supported open source tools
to extend and enhance PostgreSQL and Ruby on Rails. Gain invaluable
insights into database administration by conducting advanced
optimizations—including high-impact database maintenance—all while
solving real-world operational challenges.
Take your new skills into production today and then take your PostgreSQL
and Rails applications to a whole new level of reliability and
performance.
What You Need
Modern versions of the PostgreSQL database, the Ruby programming
language, and the Ruby on Rails framework. Docker. A text editor.
Exercises are tested on the latest versions available.
- A computer running Mac OS, Linux, or Windows and WSL2
- Latest version of PostgreSQL, installed by package manager,
compiled, or running with Docker
- An Internet connection to clone repositories, download data dumps,
and install gems and extensions
- Latest versions of Ruby on Rails and Ruby
Resources
Releases:
- B3.0 2023/09/26
- B2.0 2023/09/13
- B1.0 2023/08/29
Note: Contents and extracts of beta books will change as the book is developed.
Preface
- Getting Started
- An App to Get You Started
- What Is Rideshare?
- Active Record Schema Management Refresher
- Installing Rideshare
- Working With PostgreSQL Locally
- Learning PostgreSQL Terminology
- Learning SQL Terminology
- Ruby on Rails Terminology
- Conventions Used In This Book
- You’re Ready
- Design and Build
- Administration Basics
- Learning Meta Commands In psql With Rideshare
- Modifying Your PostgreSQL Config File
- Getting Started With Query Observability
- Glancing At Current Lock Behavior
- Generating Fake Data For Experiments
- Rolling Back Schema Modifications
- Exploring and Experimenting Safely In Production
- Building a Performance Testing Database
- Downloading Production Data Dumps
- Replacement Values That Are Statistically Similar
- Tracking Columns With Sensitive Information
- Comparing Direct Updates and Table Copying Strategies
- Starting An Email Scrubber Function
- Implementing the Scrub Email Function
- Understanding Table Copying Trade-Offs
- Speeding Up Inserts For Table Copying
- Using Direct Updates For Text Replacement
- Performing Database Maintenance
- Performing Modifications In Batches
- What’s Next For Your Performance Database
- Data Correctness and Consistency
- Multiple Column Uniqueness
- Fixing Constraint Violations
- Enforcing Relationships With Foreign Keys
- The Versatile Check Constraint
- Deferring Constraint Checks
- Preventing Overlaps With An Exclusion Constraint
- Creating Active Record Custom Validators
- Significant Casing And Unique Constraints
- Storing Transformations In Generated Columns
- Constraining Values With Database Enums
- Sharing Domains Between Tables
- Automating Consistency Checks In Development
- Operate and Grow
- Modifying Busy Databases Without Downtime
excerpt
- Identifying Dangerous Migrations
- Learning From Unsafe Migrations
- Learning To Use CONCURRENTLY By Default
- Adopting A Migration Safety Check Process
- Exploring Strong Migrations Features
- Locking, Blocking, and Concurrency Refresher
- Preventing Excessive Queueing With A Lock Timeout
- Exploring Lock Type Queues
- Setting Statement Timeout
- Avoiding Schema Cache Errors
- Backfilling Large Tables Without Downtime
- Backfilling And Double Writing
- Separating Reads and Writes for Backfills
- Specialized Tables For Backfills
- Practicing Backfilling Techniques
- Wrapping Up
- Optimizing Active Record
- Preferring Active Record Over SQL
- Query Logs To Connect SQL to App Code
- Common Active Record Problems
- Use Eager Loading To Reduce Queries
- Eager Loading With .includes()
- Prefer Strict Loading Over Lazy Loading
- Optimizing Individual Active Record Queries
- Save A SELECT With A RETURNING
- Bounding Query Results Using LIMIT
- Advanced Query Support In Active Record
- Using Common Table Expressions (CTE)
- Introducing Database Views for Rideshare
- Creating the Search Result Model With Scenic
- Improving Performance With Materialized Views
- Reducing Queries With Active Record Caches
- Prepared Statements With Active Record
- Eliminating Slow Count Queries With Counter Caches
- Performing Aggregations In the Database
- Reduced Object Allocations With SQL In Active Record
- Wrapping Up
- Improving Query Performance
excerpt
- Logging Slow Queries With Active Support Notifications
- Capture Query Statistics In Your Database
- Rideshare Query Statistics
- Introducing PgHero As a Performance Dashboard
- Analyzing Query Execution Plans
- Finding Missing Indexes
- Logging Slow Queries
- Automatically Gathering Execution Plans
- Performing Maintenance First
- Interpreting Index Scan Execution Plan Info
- EXPLAIN Scan Nodes and Bitmap Scans
- Adding Query Boundaries With Filtering and LIMIT
- Performing Fast COUNT() Queries
- Using Code and SQL Analysis Tools
- Wrapping Up
- Optimized Indexes For Fast Retrieval
- Generating Data for Experiments
- Transforming Ruby to SQL
- Why Are My Indexes Not Being Used?
- Using Single Column And Multiple Column Indexes
- Understanding Index Column Ordering
- Indexing Boolean Columns
- Transform Values with an Expression Index
- Using GIN Indexes with JSON
- Using GIN Indexes for Full Text Search
- Using Partial Indexes
- Using BRIN Indexes
- Using Indexes In Less Common Ways
- Using Covering Indexes
- High Impact Database Maintenance
- Basics of Autovacuum
- Tuning Autovacuum Parameters
- Rebuilding Indexes Without Downtime
excerpt

- Running Manual Vacuums
- Simulating Bloat and Understanding Impact
- Removing Unused Indexes
- Removing Duplicate And Overlapping Indexes
- Removing Indexes On Insert Only Tables
- Scheduling Jobs Using pg_cron
- Monitoring pg_cron Scheduled Jobs
- Conducting Maintenance Tune-Ups
- Handling Errors From Increased Concurrency
- Monitoring Database Connections
- Managing Idle Connections
- Setting Active Record Pool Size
- Using the Active Record Connection Pool
- Running Out Of Connections
- Working With PgBouncer
- Choosing A PgBouncer Pooling Mode
- Identifying Connection Errors and Problems
- More Lock Monitoring With pg_locks
- Monitoring Row Locks
- Finding Lock Conflicts
- Using PgBadger For Lock Analysis
- Active Record Optimistic Locking
- Using Advisory Locks
- Wrapping Up
- Optimize and Scale
- Scalability of Common Features
- Analyzing Schema Designs from Gems
- Understanding Queries From Tagging Gem
- LIMIT and OFFSET Pagination
- Database CURSOR Pagination
- Keyset Pagination
- Working With Bulk Data
- Creating A Bulk Data Generator Rake Task
- Batching With Active Record
- Performing an Upsert in Active Record
- Handling Conflicts With ON CONFLICT
- Beyond Active Record With activerecord-import
- Performing SQL Multi-Row Operations
- Upserts With SQL MERGE
- Working With pg_dump And pg_restore
- Populating Table Data With COPY
- Creating A File Foreign Data Wrapper (FDW)
- Wrapping Up
- Scaling With Replication And Sharding
- Categorizing Query Workloads
- Configuring a Primary With Physical Replication
- Configuring a Replica With Physical Replication
- Configuring Multiple Databases with Ruby on Rails
- Using Automatic Role Switching
- Using Replication Slots and the Write-Ahead Log (WAL)
- Sharding At The Application Level
- Migrating Multiple Database Schemas
- Using Horizontal Sharding for Multi Tenancy
- Using Subdomain Based Routing
- Switching Shards Manually and Automatically
- Simulating Joins Across Databases
- Creating A Replica Using Logical Replication
- Customizing Replication Database Parameters
- Wrapping Up
- Boosting Performance With Partitioning
- Ruby on Rails Partitioning Support
- Use Declarative Partitioning
- Deciding When To Partition
- Time Based Data and the RANGE Type
- Use Partitioning To Help With Archiving
- Partitioning the Trip Positions Table
- Choosing Your Partition Column
- Range Partitioning with pgslice
- Data Migration Preparation for Rideshare
- Online Data Migration
- Row Copying Operational Tips
- Transition Gotchas
- Automating Partition Creation With pgslice
- Retiring Unneeded Partitions
- Use LIST Partitioning For Known Divisions
- Performance Benefits From Partitioning
- Advanced Uses
- Advanced Usage and What’s Next
- Basic Analytics On PostgreSQL
- Implementing Full Text Search (FTS)
- Fuzzy Searching With Vectors
- Expanding FTS With Extensions
- Optimizing FTS With Specialized Indexes
- Using Trigrams With FTS
- Expanding FTS With Mixed Accents and Collations
- Storing and Searching Vector Embeddings
- Session Persistence and Rails Cache Without Redis
- Background Jobs Without Sidekiq
- Using Change Data Capture (CDC) and wal2json
- Closing Remarks
Author
Andrew Atkinson has worked as a Software Engineer with Ruby on Rails
and PostgreSQL for more than a decade. At Microsoft, Groupon, and
various startups, he’s built, mentored, and influenced teams, and
operated and scaled systems, improving their quality and reliability.
He’s presented on PostgreSQL and Ruby on Rails at conferences, appeared
on podcasts, and has written for the official Ruby on Rails weblog, with
the goal of helping developers solve their challenges using these
powerful open source technologies.