By Developers, For Developers

Historical errata for Data Science Essentials in Python

PDF PgPaper PgTypeDescriptionFixed onComments
2713ERROR

Unit 5, Choosing the Right Data Structure, first page of that section

The following code uses a non-existing variable `big`:

bigList = [str(i) for i in range(10000000)]
“abc” in big # Takes 0.2 sec
bigSet = set(bigList)
“abc” in bigSet # Takes 15–30 μsec—10000 times faster!

The second line of code above checks if “abc” is in the collection `big`. But it should be `bigList`.

2016-05-27
19TYPO

I suspect you mean to open the URL “www.networksciencelab.com” in the example, not “www.networkscience.com”. The latter times out and doesn’t connect.

2016-05-27
13TYPO

bigList = [str(i) for i in range(10000000)]
“abc” in big # Takes 0.2 sec >>>>> should be “in bigList”
bigSet = set(bigList)
“abc” in bigSet # Takes 15–30 μsec—10000 times faster!

2016-05-27
62ERROR

Just like with range, the value of stop can be larger than start
Should be
Just like with range, the value of stop can be smaller than start

2016-05-27
21TYPO

Word should be pattern not patter in second paragragh, second sentence.

2016-05-27
19TYPO

code example uses urlopen(“xxxxxx…networkscience…”). networkscience… doesn’t resolve. i think it should be urlopen(“xxxxxxx….networksciencelab….”)

2016-06-02
2914TYPO

Chapter 2 Unit 5
In the second line of the top paragraph , the variable name of vseq is misspell as vsec .

13ERROR

The search time for Python sets and dicts is O (1), not O (log(N)) as stated in Unit 5. The underlying algorithm is a hash table.

15SUGGEST

At the bottom of the page, you mention that a comprehension enclosed in parenthesis evaluates to a “list generator object”. This is non-standard terminology.

My suggestions:
1) tell readers the standard name for that construct (generator expression) and explain that it evaluates to a generator object (not a “list generator object”, as the text says, because it’s not designed just to generate lists)
2) explain what is the advantage of a generator object: producing items lazily, on-demand
3) give an example; an excellent example is to modify the nested comprehension that was just shown to use a nested generator expression, thus avoiding the cost of building a list just to provide input for the outer listcomp. like this:

[line for line in (l.strip() for l in infile) if line]

35ERROR

In the code on this page there is a parameter on the “try” section: file=sys.err, it doesn’t work on Anaconda 4.1, I use file=sys.stderr and it worked.

58ERROR

On the code at the end of the book, when the Stemmer is created I got an error on Python 3.5, I just used:

ls = LancasterStemmer()

127TYPO

In the paragraph were the “qcut()” function is introduced (second paragraph) the qcut() and cut() function are misspelled as “qcuts()” and “cuts()”.

150SUGGEST

The plotting example is nice, however in my opinion it’s a little bit complex for beginners reading this book, I suggest to introduce this section with a simpler plot, like a scatter plot or a bar char.

172TYPO

There is a type in the first paragraph on this phrase: “You cannot just claim that your data mode predicts something…” I guess the last part of the sentence should be “… you data models predicts…”

43TYPO

I believe the code:
words = [ls.stem(w) for w in text if w not in
should be:
words = [ls.stem(w) for w in words if w not in

Or am I mistaking?

104105ERROR

Chapter 6. Working with Data Series and Frames - 104page

unit 34. Combining Data

Deleting Duplicates

The duplicated([subset]) function returns a Boolean series denoting if each row in
all or subset (a list of column names) columns is duplicated. The optional
parameter keep controls whether the “first”, “last”, or each (True) duplicate is marked.
The drop_duplicates() function returns a copy of a frame or series with duplicates
from all or subset (a list of column names) columns removed. The optional
parameter keep controls whether the “first”, “last”, or each (True) duplicate is
removed. You can use the optional parameter inplace=True to remove duplicates
from the original object.

=> The default value is “first” in the part corresponding to “each” that returns
True on all duplicated rows, and if it is set to True, an error will occur at all.
The original text seems to be omitted too. The same is true for drop_duplicated ().

59ERROR

In the code example on this page there is no variable ‘text’, it should be ‘words’

  1. Eliminate stop words and stem the rest of the words
    words = [ls.stem(w) for w in text if w not in stopwords.words(“english”) and w.isalnum()]

should be

  1. Eliminate stop words and stem the rest of the words
    words = [ls.stem(w) for w in words if w not in stopwords.words(“english”) and w.isalnum()]

Additionally at the top of this page ‘ntlk’ is not imported so this code sample doesn’t run without adding “import nltk” at the top of the code sample on PDF page 58.

32TYPO

Book says: “The value of the name object of the title’s parent element is
soup.title.parent.name.string”, which gives me an error when trying it in the Python shell.

For me, the statement that worked is “soup.title.parent.name”, then the corrected text could be like this:

“The value of the name object of the title’s parent element is
soup.title.parent.name”

Thank you!

100ERROR

This is error in code:
File “~/code/borders.py”, line 16, in
countryA = list(row[0].strings)[1]

IndexError: list index out of range

171SUGGEST

I found very annoying that in the solution folder there are not all the data you need to run them.
For example in the code at page 171 (paper book), at the very beginning it loads two files called ‘alco2QQ9.pickle’ and ‘states.csv’ but these files do not appear in the solutions folder.
I tried to find other code examples where these files are saved but I failed.
It would be of big help if ALL the solutions code could be run independently.

Categories: