small medium large xlarge

Data Science Essentials in Python: Collect → ​Organize​ → ​Explore​ → ​Predict​ → Value​


Cover image for Data Science Essentials in Python

Data Science Essentials in Python

Collect → ​Organize​ → ​Explore​ → ​Predict​ → Value​


Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python.

Customer Reviews

This book does a fantastic job at summarizing the various activities when wrangling
data with Python. Each exercise serves an interesting challenge that is fun to
pursue. This book should no doubt be on the reading list of every aspiring data

- Peter Hampton

, Ulster University

Data Science Essentials in Python gets you up to speed with the most common
tasks and tools in the data science field. It’s a quick introduction to many different
techniques for fetching, cleaning, analyzing, and storing your data. This book
helps you stay productive so you can spend less time on technology research and
more on your intended research.

- Jason Montojo

Coauthor of "Practical Programming: An Introduction to Computer Science Using Python 3"

For those who are highly curious and passionate about problem solving and
making data discoveries, Data Science Essentials in Python provides deep insights
and the right set of tools and techniques to start with. Well-drafted examples and
exercises make it practical and highly readable.

- Lokesh Kumar Makani

CASB expert, Skyhigh Networks

See All Reviews

Choose Your Format(s)

  • $19.00 In Stock
  • Ebooks are DRM free.

  • Ebook delivery options.

What You Need

You need a decent distribution of Python 3.3 or above that includes at least NLTK, Pandas, NumPy, Matplotlib, Networkx, SciKit-Learn, and BeautifulSoup. A great distribution that meets the requirements is Anaconda, available for free from If you plan to set up your own database servers, you also need MySQL ( and MongoDB ( Both packages are free and run on Windows, Linux, and Mac OS.

Contents & Extracts

  • Acknowledgments
  • Preface
    • About This Book
    • About the Audience
    • About the Software
    • Notes on Quotes
    • The Book Forum
    • Your Turn
  • What Is Data Science
    • Data Analysis Sequence
    • Data Acquisition Pipeline
    • Report Structure
    • Your Turn
  • Core Python for Data Science excerpt
    • Understanding Basic String Functions
    • Choosing the Right Data Structure
    • Comprehending Lists through List Comprehension
    • Counting with Counters
    • Working with Files
    • Reaching the Web
    • Pattern Matching with Regular Expressions
    • Globbing File Names and Other Strings
    • Pickling and Unpickling Data
    • Your Turn
  • Working with Text Data
    • Processing HTML Files
    • Handling CSV Files
    • Reading JSON Files
    • Processing Texts in Natural Languages
    • Your Turn
  • Working with Databases
    • Setting Up a MySQL Database
    • Using a MySQL Database: Command Line
    • Using a MySQL Database: PyMySQL
    • Taming Document Stores: MongoDB
    • Your Turn
  • Working with Tabular Numeric Data excerpt
    • Creating Arrays
    • Transposing and Reshaping
    • Indexing and Slicing
    • Broadcasting
    • Demystifying Universal Functions
    • Understanding Conditional Functions
    • Aggregating and Ordering Arrays
    • Treating Arrays as Sets
    • Saving and Reading Arrays
    • Generating a Synthetic Sine Wave
    • Your Turn
  • Working with Data Series and Frames
    • Getting Used to Pandas Data Structures
    • Reshaping Data
    • Handling Missing Data
    • Combining Data
    • Ordering and Describing Data
    • Transforming Data
    • Taming Pandas File I/O
    • Your Turn
  • Working with Network Data
    • Dissecting Graphs
    • Network Analysis Sequence
    • Harnessing Networkx
    • Your Turn
  • Plotting excerpt
    • Basic Plotting with PyPlot
    • Getting to Know Other Plot Types
    • Mastering Embellishments
    • Plotting with Pandas
    • Your Turn
  • Probability and Statistics
    • Reviewing Probability Distributions
    • Recollecting Statistical Measures
    • Doing Stats the Python Way
    • Your Turn
  • Machine Learning
    • Designing a Predictive Experiment
    • Fitting a Linear Regression
    • Grouping Data with k-Means Clustering
    • Surviving In Random Decision Forests
    • Your Turn
  • Further Reading
  • Solutions to Single-Star Projects


Dmitry Zinoviev has an MS in Physics from Moscow State University and a PhD in Computer Science from Stony Brook University. His research interests include computer simulation and modeling, network science, social network analysis, and digital humanities. He has been teaching at Suffolk University in Boston, MA since 2001.