Historical errata for Data Science Essentials in Python

PDF Pg	Paper Pg	Type	Description	Fixed on
27	13	ERROR	Unit 5, Choosing the Right Data Structure, first page of that section The following code uses a non-existing variable `big`: bigList = [str(i) for i in range(10000000)] “abc” in big # Takes 0.2 sec bigSet = set(bigList) “abc” in bigSet # Takes 15–30 μsec—10000 times faster! The second line of code above checks if “abc” is in the collection `big`. But it should be `bigList`.	2016-05-27
19		TYPO	I suspect you mean to open the URL “www.networksciencelab.com” in the example, not “www.networkscience.com”. The latter times out and doesn’t connect.	2016-05-27
13		TYPO	bigList = [str(i) for i in range(10000000)] “abc” in big # Takes 0.2 sec >>>>> should be “in bigList” bigSet = set(bigList) “abc” in bigSet # Takes 15–30 μsec—10000 times faster!	2016-05-27
62		ERROR	Just like with range, the value of stop can be larger than start Should be Just like with range, the value of stop can be smaller than start	2016-05-27
	21	TYPO	Word should be pattern not patter in second paragragh, second sentence.	2016-05-27
19		TYPO	code example uses urlopen(“xxxxxx…networkscience…”). networkscience… doesn’t resolve. i think it should be urlopen(“xxxxxxx….networksciencelab….”)	2016-06-02
29	14	TYPO	Chapter 2 Unit 5 In the second line of the top paragraph , the variable name of vseq is misspell as vsec .
13		ERROR	The search time for Python sets and dicts is O (1), not O (log(N)) as stated in Unit 5. The underlying algorithm is a hash table.
15		SUGGEST	At the bottom of the page, you mention that a comprehension enclosed in parenthesis evaluates to a “list generator object”. This is non-standard terminology. My suggestions: 1) tell readers the standard name for that construct (generator expression) and explain that it evaluates to a generator object (not a “list generator object”, as the text says, because it’s not designed just to generate lists) 2) explain what is the advantage of a generator object: producing items lazily, on-demand 3) give an example; an excellent example is to modify the nested comprehension that was just shown to use a nested generator expression, thus avoiding the cost of building a list just to provide input for the outer listcomp. like this: [line for line in (l.strip() for l in infile) if line]
35		ERROR	In the code on this page there is a parameter on the “try” section: file=sys.err, it doesn’t work on Anaconda 4.1, I use file=sys.stderr and it worked.
58		ERROR	On the code at the end of the book, when the Stemmer is created I got an error on Python 3.5, I just used: ls = LancasterStemmer()
127		TYPO	In the paragraph were the “qcut()” function is introduced (second paragraph) the qcut() and cut() function are misspelled as “qcuts()” and “cuts()”.
150		SUGGEST	The plotting example is nice, however in my opinion it’s a little bit complex for beginners reading this book, I suggest to introduce this section with a simpler plot, like a scatter plot or a bar char.
172		TYPO	There is a type in the first paragraph on this phrase: “You cannot just claim that your data mode predicts something…” I guess the last part of the sentence should be “… you data models predicts…”
	43	TYPO	I believe the code: words = [ls.stem(w) for w in text if w not in should be: words = [ls.stem(w) for w in words if w not in Or am I mistaking?
104	105	ERROR	Chapter 6. Working with Data Series and Frames - 104page unit 34. Combining Data Deleting Duplicates The duplicated([subset]) function returns a Boolean series denoting if each row in all or subset (a list of column names) columns is duplicated. The optional parameter keep controls whether the “first”, “last”, or each (True) duplicate is marked. The drop_duplicates() function returns a copy of a frame or series with duplicates from all or subset (a list of column names) columns removed. The optional parameter keep controls whether the “first”, “last”, or each (True) duplicate is removed. You can use the optional parameter inplace=True to remove duplicates from the original object. => The default value is “first” in the part corresponding to “each” that returns True on all duplicated rows, and if it is set to True, an error will occur at all. The original text seems to be omitted too. The same is true for drop_duplicated ().
59		ERROR	In the code example on this page there is no variable ‘text’, it should be ‘words’ Eliminate stop words and stem the rest of the words words = [ls.stem(w) for w in text if w not in stopwords.words(“english”) and w.isalnum()] should be Eliminate stop words and stem the rest of the words words = [ls.stem(w) for w in words if w not in stopwords.words(“english”) and w.isalnum()] Additionally at the top of this page ‘ntlk’ is not imported so this code sample doesn’t run without adding “import nltk” at the top of the code sample on PDF page 58.
32		TYPO	Book says: “The value of the name object of the title’s parent element is soup.title.parent.name.string”, which gives me an error when trying it in the Python shell. For me, the statement that worked is “soup.title.parent.name”, then the corrected text could be like this: “The value of the name object of the title’s parent element is soup.title.parent.name” Thank you!
100		ERROR	This is error in code: File “~/code/borders.py”, line 16, in countryA = list(row[0].strings)[1] IndexError: list index out of range
	171	SUGGEST	I found very annoying that in the solution folder there are not all the data you need to run them. For example in the code at page 171 (paper book), at the very beginning it loads two files called ‘alco2QQ9.pickle’ and ‘states.csv’ but these files do not appear in the solutions folder. I tried to find other code examples where these files are saved but I failed. It would be of big help if ALL the solutions code could be run independently.

Historical errata for Data Science Essentials in Python

Categories: