small medium large xlarge

Processing Big Data with MapReduce


Cover image for Processing Big Data with MapReduce

Processing Big Data with MapReduce


MapReduce is a programming paradigm that uses multiple machines to process large data sets. Apache Hadoop is the most popular MapReduce framework and this series takes you from zero MapReduce knowledge all the way to writing and running Hadoop programs.

In these screencasts, Jesse teaches MapReduce with his own novel method that makes it easy to understand. After you learn the basics, Jesse teaches you Hadoop using Java, Ruby, Python, and Perl code. No matter which technology stack you choose, you’ll have the understanding and tools you need to use to use Hadoop on your next project.

Together we’ll write code in Java, Ruby, Python and Perl.

Source code for the first and third episodes is available at

Source code for the second episode is available at

Free Preview Video:

Customer Reviews

A brilliant distillation of the MapReduce concepts. This screencast gives the MapReduce concepts without getting bogged down. Jesse makes learning MapReduce so easy even cats could learn it.

- Eric Sammer

Author of "Hadoop Operations"

I love hands-on exercises for introducing new data scientists to MapReduce, and the playing card method is the most hands-on method I’ve seen.

- Josh Wills

Founder and VP, Apache Crunch

Jesse provides a great introduction to get started with Hadoop MapReduce. The use of physical analogies like playing cards provides a visual and tactile understanding of the fundamental architectural concepts. Programmatic examples driven in the Eclipse Java development environment ground this theory in practical terms for the developer.

- Aaron Kimball

CTO, WibiData

Jesse’s use of playing cards to illustrate how MapReduce achieves parallelism is ingenious, I highly recommend anyone looking at MapReduce for first time to watch this video.

- Amr Awadallah

Founder, CTO, Cloudera, Inc.

See All Reviews

Choose Episodes

All the episodes in this series have been released.

  • Screencasts are DRM free.

About this Title

Available in: DRM-free iPod/iPhone 3 Video, Quicktime Video, and Theora Ogg
Download and watch when and where you want

Every industry is dealing with more data every day. The data comes from more and more devices and we need to both store and process the data efficiently. We’ll see how Apache Hadoop MapReduce works and scales to process these vast quantities of data.

Working with software libraries saves time and effort, and this is especially true with distributed computing systems like Hadoop. However, learning the underlying concepts and API takes time, and that often holds teams back. Jesse’s novel approach to MapReduce uses playing cards to illustrate the workflow in a simple, understandable way. This allows you to move physical objects while learning the concepts behind MapReduce. Then we’ll move from conceptual to practical and write code to do the same thing using Hadoop. We’ll work through several examples in several programming languages to ensure you have the knowledge you need to use MapReduce on your next project.

This series of screencasts is a focused look at how MapReduce works and the APIs behind it. Although Hadoop is written in Java, we’ll see how to use it with any language. Jesse teaches using examples in Java, Ruby, Python and Perl.


Jesse Anderson is a Creative Engineer in Reno with many years of experience in creating products and helping companies improve their software engineering. He works at Cloudera on the Educational Services team as a Curriculum Developer and Instructor. Jesse works on both professional and personal projects. A recent personal project, Million Monkeys Shakespeare project, went viral and gained international notoriety. Jesse’s interviews with national media outlets including the Wall Street Journal and Fox News. He volunteers his time as the President of the Northern Nevada Software Developers Group and he sits on the Technology Advisory Committee at Morrison University.

His blog and website is and his Github account contains other example MapReduce programs