Data is getting bigger and more complex by the day, and so are your choices in handling it. From traditional RDBMS to newer NoSQL approaches, Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate’s Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core of each technology.
About this Book
- 354 pages
- Release: P2.0 (2013-01-28)
- ISBN: 978-1-93435-692-0
Redis, Neo4J, CouchDB, MongoDB, HBase, Riak, and Postgres: with each database, you’ll tackle a real-world data problem that highlights the concepts and features that make it shine. You’ll explore the five data models employed by these databases: relational, key/value, columnar, document, and graph. See which kinds of problems are best suited to each, and when to use them.
You’ll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon’s Elastic Compute Cloud (EC2).
Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that’s more than the sum of its parts, or find one that meets all your needs at once.
Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.
What You Need:
You’ll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.
Contents and Extracts
Q&A with authors Eric Redmond and Jim Wilson:
1. How did you pick the seven databases?
We did have some criteria, if not explicit. The databases had to be open source—we didn’t want to cover any databases that would tie readers to a company. We wanted at least one implementation for each of the five database genres (Relational, Key-Value, Columnar, Document, Graph). Then we chose databases that exemplified some general concepts we wanted to cover, like the CAP theorem, or mapreduce. Finally, we chose databases that were good counterpoints to each other—so we chose MongoDB and CouchDB (different ways of implementing document stores). Or we chose Riak because it was a Dynamo (Amazon’s database) implementation to compare to HBase as a BigTable (Google’s database) implementation.
Our goal with the book was principally to introduce readers to the field of choices they now have. Our selections were largely in service of that goal. Even so, it was a pretty long and iterative process. We knew that no matter which ones we picked there’d be people asking why we did or didn’t include their favorite. It came down to choosing the genres we wanted to discuss and then picking databases that had the right combination of (A) representing their genre and (B) relative popularity.
For example, we picked PostgreSQL since it sticks very closely to the SQL standard and is relatively less well known than other OSS competitors like MySQL. Similarly, even though both HBase and Cassandra are column-oriented databases, we went with HBase because Cassandra is more of a hybrid that incorporates elements from both the BigTable paper and the Dynamo paper.
2. Databases are rapidly changing. What do you wish you’d included now?
There are hundreds of database options, but I’m glad to see that our choices are still going strong a year later. However, if I had to do it over again, I would like to have added a Triplestore (like Mulgara), since the semantic web is slowly popularizing this method of data storage. I also would have liked to spend more time on Neo4j’s Cypher language, or have covered Hadoop in a bit of detail, since analytics is a huge part of data storage.
Yes, databases are rapidly changing, in two senses. First, the field of available data storage technology has been seeing an explosion in recent years. More and more different sorts of databases are cropping up to fill in various niche needs. In the other sense, the databases themselves are rapidly evolving. Even between minor version releases, modern NoSQL databases incorporate more and more features in order to claim more of the market and remain competitive. In that regard, there’s a bit of convergence happening and it makes choosing one even harder as there are more that can meet your needs all the time.
I still think the five genres and seven databases we chose satisfy the criteria that we set out to achieve. But there are others I’d like to write about as well. These include some old favorites like SQLite and some databases you might not think of as such, like OpenLDAP and SOLR (an inverted index/search engine).
3. Why did you decide to write this book?
Jim and I discussed writing a book like this for quite some time. About a year and a half ago he sent me an email with no body—the subject was “Seven Databases in Seven Weeks?” The title sold me. We both loved Bruce’s “Seven Languages” book, and this seemed the perfect format to explore this emerging field.
As early as March of 2010, Eric and I brainstormed about writing a NoSQL book of some kind. At the time there was a lot of buzz around the term, but also a lot of confusion. We thought we could bring some structure to the discussion and educate people who might not be up to speed yet on all the latest developments.
After reading Bruce A. Tate’s Seven Languages in Seven Weeks I thought, “What about Seven Databases?” Eric submitted a proposal and a few weeks later we were off to the races.
4. What do you see as up and coming databases?
I’ve become a big fan of Neo4j. It’s one we covered in the book, but in all honesty we chose it because we wanted to explore an open source graph database. But over the past year it’s really come into its own. I really do believe this is the year we’ll see wider adoption of graph databases.
As for ones we did not cover, I think ElasticSearch is clearly gaining traction. OrientDB is also interesting, as it can act as a relational, key-value, document, or a graph database. I think you’ll see more of this multi-genre behavior in the future. And as I hinted at before, Triplestores are gaining a bit of traction, too, though their problem-set greatly overlaps with general graph databases.
There are many, of course, but there are at least two that I personally look forward to exploring in more detail: ElasticSearch and doozer.
ElasticSearch is a distributed, peer-based, REST/JSON powered document search engine. Using a distributed Lucene index at its core, ElasticSearch allows REST clients to find documents based on fuzzy criteria. Everyone needs a search engine, and ElasticSearch makes it easy.
doozer is a fast, headless consensus engine. It’s written in the Go programming language by the smart folks at Heroku. doozer is great for storing small blobs of important information that absolutely must be consistent (like cluster configuration metadata), but without a single point of failure.
Comments and Reviews
Help Net Security said:
This book gives great and structured overview of modern databases, and doesn’t delve too deep. Nor should it, as it currently gives all the knowledge you need to choose one database to suit your needs.
If you have any reason to use or consider using anything other than a more traditional relational database, and aren’t sure which one to try out of the exploding number of new options, this book will help you make sense of the field and better evaluate your options against your current needs. I recommend it.
Reading this book was like going on “Mr. Toad’s Wild Ride” at Disney Land. There are turns and twists, you never know what’s around the next corner, but it is a lot of fun.
—Ian Dees Coauthor, "Using JRuby"
The flow is perfect. On Friday, you’ll be up and running with a new database. On Saturday, you’ll see what it’s like under daily use. By Sunday, you’ll have learned a few tricks that might even surprise the experts! And next week, you’ll vault to another database and have fun all over again.
—Sean Copenhaver Lead Code Commodore backgroundchecks.com
Provides a great overview of several key databases that will multiply your data modeling options and skills. Read if you want database envy seven times in a row.
—Loren Sands-Ramshaw Software Engineer U.S. Department of Defense
This is by far the best substantive overview of modern databases. Unlike the host of tutorials, blog posts, and documentation I have read, this book taught me why I would want to use each type of database and the ways in which I can use them in a way that made me easily understand and retain the information. It was a pleasure to read.
—Jan Lehnardt Apache CouchDB Developer and Author
This is one of the best CouchDB introductions I have seen.
—Dr Nic Williams VP of Technology Engine Yard
In an ideal world, the book cover would have been big enough to call this book “Everything you never thought you wanted to know about databases that you can’t possibly live without.” To be fair, “Seven Databases in Seven Weeks” will probably sell better.
—Jerry Sievert Director of Engineering Daily Insight Group
‘Seven Databases in Seven Weeks’ is an excellent introduction to all aspects of modern database design and implementation. Even spending a day in each chapter will broaden understanding at all skill levels, from novice to expert— there’s something there for everyone.