Data is getting bigger and more complex by the day, and so are your choices in handling it. Explore some of the most cutting-edge databases available—from a traditional relational database to newer NoSQL approaches—and make informed decisions about challenging data storage problems. This is the only comprehensive guide to the world of NoSQL databases, with in-depth practical and conceptual introductions to seven different technologies: Redis, Neo4J, CouchDB, MongoDB, HBase, Postgres, and DynamoDB. This second edition includes a new chapter on DynamoDB and updated content for each chapter.
Seven Databases in Seven Weeks, Second Edition: A Guide to Modern Databases and the NoSQL Movement
by Luc Perkins, Jim Wilson, Eric Redmond
Seven Databases in Seven Weeks, Second Edition
A Guide to Modern Databases and the NoSQL Movement
by Luc Perkins, Jim Wilson, Eric Redmond
Join Luc Perkins in this episode of the Test & Code podcast at https://testandcode.com/53.
Choosing a database is perhaps one of the most important architectural decisions a developer can make. Seven Databases in Seven Weeks provides a fantastic tour of different technologies and makes it easy to add each to your engineering toolbox
- Dave Parfitt
Senior Site Reliability Engineer, Mozilla
By comparing each database technology to a tool you’d find in any workshop, the authors of Seven Databases in Seven Weeks provide a practical and well-balanced survey of a very diverse and highly varied datastore landscape. Anyone looking to get a handle on the database options available to them as a data platform should read this book and consider the trade-offs presented for each option.
- Matthew Oldham
Director of Data Architecture, Graphium Health
Reading this book felt like some of my best pair-programming experiences. It showed me how to get started, kept me engaged, and encouraged me to experiment on my own.
- Jesse Hallett
Open Source Mentor
This book will really give you an overview of what’s out there so you can choose the best tool for the job.
- Jesse Anderson
Managing Director, Big Data Institute
About this Title
Release: P1.0 (2018-04-03)
While relational databases such as MySQL remain as relevant as ever, the alternative, NoSQL paradigm has opened up new horizons in performance and scalability and changed the way we approach data-centric problems. This book presents the essential concepts behind each database alongside hands-on examples that make each technology come alive.
With each database, tackle a real-world problem that highlights the concepts and features that make it shine. Along the way, explore five database models—relational, key/value, columnar, document, and graph—from the perspective of challenges faced by real applications. Learn how MongoDB and CouchDB are strikingly different, make your applications faster with Redis and more connected with Neo4J, build a cluster of HBase servers using cloud services such as Amazon’s Elastic MapReduce, and more. This new edition brings a brand new chapter on DynamoDB, updated code samples and exercises, and a more up-to-date account of each database’s feature set.
Whether you’re a programmer building the next big thing, a data scientist seeking solutions to thorny problems, or a technology enthusiast venturing into new territory, you will find something to inspire you in this book.
Q&A with Author Luc Perkins
Q: Why did you choose to work on the second edition of Seven Databases in Seven Weeks?
A: Well, I began becoming intimately acquainted with the NoSQL space about five years ago, when I took on the role of technical writer at Basho Technologies, the company behind NoSQL database Riak. From the very beginning I found the space endlessly interesting, so full of promise and inspiring technology yet also very tricky to navigate.
Relational databases are quite interesting to me as well, but they tend to be very structurally similar. NoSQL databases on the other hand, tend to be much more individualistic, you could say. Each has its own special strengths and weakness and quirks and presents you with a set of trade-offs you’ve probably never encountered in another database. So the book was an opportunity to take my more localized knowledge of the space and really stretch my knowledge and my thinking outward.
Q: What was the hardest part about working on the book?
A: In general, I’d say making the book up to date. Unsurprisingly, a ton has changed since the original edition. The NoSQL space is notoriously fast moving and it’s hard enough to keep up with one database, let alone seven. That means that I had to check every single code snippet and CLI command and claim and diagram in the book to make sure that it still worked, presented accurate information, etc. Then I had to make sure that newer features are mentioned or showcased when necessary. For a book that’s really seven books in one, this was quite a task, though an extremely rewarding one.
Q: What are some of the main differences between the first and second edition?
A: First, and most importantly, everything in the book works now. We’re all used to bit rot in code but it happens in books, too. Database systems change a lot over time. Many of the CLI commands and code snippets from the first edition eventually started throwing cryptic errors or flat-out not working at all.
But there are some other, more specific changes. The chapter on Riak was removed and replaced with a chapter on Amazon’s DynamoDB. Riak is a fascinating database but its future is very uncertain. DynamoDB is also a fascinating database but it feels like a living, breathing project. Furthermore, the querying language for Neo4j was updated to Cypher (instead of the original and now largely defunct Gremlin).
Q: What’s your favorite database in the book?
A: Oh gosh, that’s very tricky, because I have a special fondness and a place in my heart reserved for each of them. But if I had to pick I’d say Redis. It has a pretty small surface area for such a widely used system and a very well-defined domain of problems that it seeks to address. If I had to build a new application that used all seven databases in the book, the Redis portion of the application would be the one I’d be most eager to work on.
Q: Do you have any general advice for readers? Databases are complex and it may not be readily apparent how even an extremely technically savvy reader should proceed.
A: I’d say take it nice and slow. The content is spread across “days” for a reason. You don’t have to follow the schema we present, of course, but this is not single-sitting material. Take a minute to really absorb the diagrams and technical definitions. Try to understand each database’s “worldview,” so to speak, and use that as a thinking cap for each chapter’s material. Try to imagine times when each database would be indispensable. And if you and a database just aren’t getting along, skip to the next one and come back later. You may come back with fresh insight and a new slate of questions.
What You Need
You’ll need a *nix shell (Mac OS or Linux preferred, Windows users will need Cygwin), Java 6 (or greater), and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.
Contents & Extracts
- Why a NoSQL Book
- Why Seven Databases
- What’s in This Book
- What This Book Is Not
- Code Examples and Conventions
- Online Resources
- It Starts with a Question
- The Genres
- Onward and Upward
- That’s Post-greS-Q-L
- Day 1: Relations, CRUD, and Joins
- Day 2: Advanced Queries, Code, and Rules
- Day 3: Full Text and Multidimensions
- Introducing HBase
- Day 1: CRUD and Table Administration
- Day 2: Working with Big Data excerpt
- Day 3: Taking It to the Cloud
- Day 1: CRUD and Nesting
- Day 2: Indexing, Aggregating, Mapreduce excerpt
- Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS
- Relaxing on the Couch
- Day 1: CRUD, Fauxton, and cURL Redux
- Day 2: Creating and Querying Views
- Day 3: Advanced Views, Changes API, and Replicating Data
- Neo4j Is Whiteboard Friendly
- Day 1: Graphs, Cypher, and CRUD
- Day 2: REST, Indexes, and Algorithms excerpt
- Day 3: Distributed High Availability
- DynamoDB: The “Big Easy” of NoSQL
- Day 1: Let’s Go Shopping!
- Day 2: Building a Streaming Data Pipeline
- Day 3: Building an “Internet of Things” System Around DynamoDB
- Data Structure Server Store
- Day 1: CRUD and Datatypes
- Day 2: Advanced Usage, Distribution
- Day 3: Playing with Other Databases
- Wrapping Up
- Genres Redux
- Making a Choice
- Where Do We Go from Here?
- Database Overview Tables
- The CAP Theorem
- Eventual Consistency
- CAP in the Wild
- The Latency Trade-Off
Luc Perkins is a customer success engineer at Reflect Technologies, a data reporting and visualization startup in Portland, OR. In the past, he has worked as a technical writer for companies such as Twitter and Basho, and is actively involved in the Write the Docs community of technical writers.
Eric Redmond has been in the software industry for more than 20 years, working with Fortune 500 companies, governments, and many startups. He is a coder, illustrator, international speaker, and active organizer of several technology groups.
Jim R. Wilson is a software engineer at Google creating machine learning visualizations on the Big Picture team. He’s contributed to TensorFlow’s visualization suite, TensorBoard, and other open source projects.