small medium large xlarge

Errata for Data Crunching

 

The latest version of the book is P1.0, released almost 11 years ago. If you've bought a PDF of the book and would like to upgrade it to this version (for free), visit your home page.

By default this page displays the errata for the latest version of the book. If you have a previous version, select it here:

If you've found a new error, please submit it.

  • Typo
  • Tech. error
  • Suggestion
  • Maybe next edition
  • Not a problem
  • Reported in: P1.0 (10-Sep-11)
#47531
PDF page: 2rand
Paper page: 2rand

+z$8frZdyL%68pSU/:>w<:E3.lG-!XIB--KaixOUrhRosMVyyn

  • Reported in: P1.0 (13-Jun-09)
#39437
PDF page: 10

The PDB URL has suffered from link rot.--Andrew Grimm

  • Reported in: P1.0 (09-Nov-07)
#29860
PDF page: 10

"And do the CMPND and AUTHOR lines.."
CMPND should be COMPND--Cristian Vat

  • Reported in: P1.0 (30-Aug-12)
#49772
PDF page: 11
Trying to read the .mobi version using the Kindle for PC reader, it seems that all pages consisting of only source code show black boxes instead of wo...more...
  • Reported in: P1.0 (29-Jan-06)
#23997
Paper page: 23
Page 23 Near the middle it says: ... keys didn't actually have to be unique within sections. Should be: ... keys only have to be unique within eac...more...
  • Reported in: P1.0 (04-Jun-12)
#49406
Paper page: 33
Shell script at top and text at bottom mention using cut -f 5 to extract the fifth field, but as field numbers start from 1, the sixth field contains ...more...
  • Reported in: P1.0 (29-Jan-06)
#23998
Paper page: 35
Near the bottom it says: It will read from all the files the user specifies (...), unless the -o argument is used, ... Should be: It will read f...more...
  • Reported in: P1.0 (20-Aug-05)
#23190
Paper page: 49
The listing of the Regular Expressions search pattern contains an error - at least according to running the program on page 48 in python22 on windows....more...
  • Reported in: P1.0 (24-Oct-05)
#23457
Paper page: 53-59
To the best of my knowledge, the domain component of an email address has to contain two segments, e.g. a@b.c is OK but a@b isn't. Your RE matches a@b...more...
  • Reported in: P1.0 (14-Jun-05)
#22791
Paper page: 54

The horizontal bracket showing the extent of group 1 is too wide: it should not include the final "(.*)" of the regular expression.--Matthew Wilson

  • Reported in: P1.0 (14-Jun-05)
#22792
Paper page: 54
The horizontal bracket showing the extent of group 2 is too wide: it should not include the trailing "*)", which is actually the end of group 1.--Matt...more...
  • Reported in: P1.0 (14-Jun-05)
#22793
Paper page: 66

"if val > 127" in code fragment should be "if val > 255"--Mathias Meyer

  • Reported in: P1.0 (14-Jun-05)
#22794
Paper page: 67-68

The last "\d{3}" in each RE should be "\d{4}" (to capture the last four digits in each phone number).--Matthew Wilson

  • Reported in: P1.0 (21-Aug-05)
#23202
Paper page: 67
2nd paragraph end with: "And once again, the only way to solve the problem is case by case (Figure 3.3)." There are other options, e.g. the patter...more...
  • Reported in: P1.0 (21-Aug-05)
#23201
Paper page: 70

First paragraph reads:
"Bytes in the range 0-255 (hex 0000-007f)"
Should read:
"Bytes in the range 0-127 (hex 0000-007f)"
--Per Holst

  • Reported in: P1.0 (09-Sep-05)
#23238
Paper page: 81
In "Joe Asks..." you mix up the counting in the last paragraph. Your writing "The third is mostly.." probably meant "The fourth is mostly..". An expla...more...
  • Reported in: P1.0 (28-Jul-06)
#25582
Paper page: 104
In the first paragraph that starts on the page, "/project/*/author" will match only authors of tickets. It will NOT match authors of comments. To ma...more...
  • Reported in: P1.0 (25-Aug-05)
#23217
Paper page: 125

In the "Joe Asks" block it says:
"15 (base 10) (negative acknowledge)"
Which should be:
"15 (base 16) (negative acknowledge)"

--Per Holst

  • Reported in: P1.0 (25-Aug-05)
#23218
Paper page: 125
Near the bottom it says: "32-bit integer 1027(base 10) (which is 0x0407 (base 16))" Should be: "32-bit integer 1031(base 10) (which is 0x0407 (ba...more...
  • Reported in: P1.0 (18-Jun-05)
#22867
Paper page: 128-129
The methods for storing integers as binary data are namend packVec() and unpackVec(), but the text relates to them as packIntVec() and unpackIntVec() ...more...
  • Reported in: P1.0 (14-Jun-05)
#22795
Paper page: 132
All implementations of Java in widespread use today will optimize repeated string concatenation, so that "abc"+"def"+"ghi" doesn't allocate an unneces...more...
  • Reported in: P1.0 (26-Oct-09)
#41107
Paper page: 138
Under section "6.1 Simple Queries". Original Text: "[...] To get them, we need to know which table to examine, and which *columns* to get from it. Th...more...
  • Reported in: P1.0 (27-Oct-05)
#23489
Paper page: 139
You have included a CustId column in the Assigned table. I don't think it is ever used from there, and it really shouldn't be there because it is not ...more...
  • Reported in: P1.0 (26-Aug-05)
#23221
Paper page: 150
The query "Find people in 904 or 905, but not both." is incorrect. SELECT Person.FirstName, Person.LastName FROM Person, Assigned WHERE (Person...more...
  • Reported in: P1.0 (27-Oct-05)
#23490
Paper page: 154
According to SQL standards, as I understand them, the statement near the bottom of the page that the condition CustId <> 1027 will match rows where Cu...more...
  • Reported in: P1.0 (18-Jun-05)
#22866
Paper page: 156

The description for the SQL type DECIMAL should be "...one that has a fixed number of digits..."--Mathias Meyer

  • Reported in: P1.0 (18-Jun-05)
#22868
Paper page: 166

The first paragraph of section 6.6 should end with "...who didn't understand that data is for grepping".--Mathias Meyer

  • Reported in: P1.0 (07-Mar-07)
#27367
Paper page: 178
Base-64 Encoding The order of the characters for base64 is not arbitrary. The author wonders, why the list of characters doesn't start with digits ...more...