small medium large xlarge

Data Crunching: Solve Everyday Problems using Java, Python, and More


Data Crunching

Solve Everyday Problems using Java, Python, and More


Cover image for Data Crunching
Pages 208
Release P1.0 (2005-06-13)
ISBN 978-0-9745-1407-9

Learn how to approach real-world legacy data conversion problems, see which programming languages are better at data-handling tasks, design, build, and test programs for searching log files, converting data sources, configuring other programs, and more!

About This Title

Every day, all around the world, programmers have to recycle legacy data, translate from one vendor’s proprietary format into another’s, check that configuration files are internally consistent, and search through web logs to see how many people have downloaded the latest release of their product.

This kind of data crunching, may not be glamorous, but knowing how to do it efficiently is essential to being a good programmer. This book describes the most useful data crunching techniques, explains when you should use them, and shows how they will make your life easier. Along the way, it will introduce you to some handy, but under-used, features of Java, Python, and other languages. It will also show you how to test data crunching programs, and how data crunching fits into the larger software development picture.

Data Crunching covers areas of most interest to working programmers:

  • Using Plain Text Files
  • Learning Regular Expression syntax
  • Parsing XML using SAX, DOM, and XSLT
  • Encoding data in binary files
  • Handling relational databases using SQL

Read an interview with the author.


Each of our books has its own dedicated discussion area, where readers help each other out. Many authors also choose to drop by.

Join in…

Here are some recent topics:

Brought to You By

Greg Wilson holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is the author of Practical Parallel Programming (MIT Press, 1995), is a contributing editor at Doctor Dobb’s Journal, and an adjunct professor in Computer Science at the University of Toronto.