19 Books Every Aspiring Data Scientist Should Read

Whether you’re an aspiring data scientist or a seasoned professional, these are 19 books that will help you improve your skills in machine learning, data analysis, visualization, statistics, and more. I’ve read and can highly recommend them all.

I’ve split this data science reading list into a few categories, but some may naturally belong in several groups (R for Data Science, for example, is great for learning both data visualization and analysis.) And just so you know, I may collect a commission from Amazon for any books you purchase using these links.

Machine Learning (Theoretical)

1. Introduction to Statistical Learning — James, Tibshirani, and Hastie - $21.72

introduction to statistical learning book cover Source: Amazon

A good first book on machine learning. Shows how most popular machine learning algorithms work, and also teaches a proper workflow for training and evaluating models (e.g. train/test splits, cross validtion, picking a loss function.) Allows a reader to get an intuitive grasp of what is going on inside the “black box”, but is a little too far on qualitative side if one hopes to gain a full understanding. For a deeper dive, see the advanced version Elements of Statistical Learning.

Get it on Amazon for $21.72

2. Elements of Statistical Learning — James, Tibshirani, and Hastie [HIGHLY RECOMMEND] - $30.60

Elements of statistical learning book cover

Source: Amazon

Similar to Introduction to Statistical Learning, but much more mathematically dense. This is my favorite reference book for machine learning theory. Topics include generalized linear models, additive models, bagging, boosting, tree-fitting algorithms, random forests, gradient boosting, and much more. It’s worth reading several times.

Get it on Amazon for $30.60

3. Learning from Data – Abu-Mostafa and Magdon-Ismail - $45.00

Learning from Data book cover

Source: Amazon

Another machine learning book that focuses on theory. It won’t show you how to train your own models, but it will help to understand why models work and what guarantees we’re able to make about learning and generalization. Less focused on specific ML algorithms, and more focused on the properties of learning and generalization.

Get it on Amazon for $45.00

4. Deep Learning – Goodfellow, Bengio and Courville [HIGHLY RECOMMEND] - $25.01

Deep Learning book cover

Source: Amazon

A balance of intuition, applicability, and theory that this field has been lacking. Begins with the nuts and bolts of feedforward networks, and then goes into depth about the state of the art in model regularization, optimization, and various model classes and architectures. Filled with useful tips and tricks for implementing models. This is the best book out there right now for learning how deep learning works.

Get it on Amazon for $25.01

Machine Learning (Practical / Applied)

5. Deep Learning with Python – Francois Chollet - $24.48

Deep Learning with Python book cover

Source: Amazon

A handy reference for Keras. This book is helpful for bridging the gap between beginner deep learning tutorials and more advanced / state-of-the-art methods. It’s not the best for learning theory, but will help you to implement what you read in papers.

Get it on Amazon for $24.48

Working with Data (Python an R Programming)

6. R for Data Science – Wickham and Grolemund [HIGHLY RECOMMEND] - $18.17

R for Data Science book cover

Source: Amazon

An absurdly useful book for learning how to manipulate data with R and the Tidyverse (dplyr, ggplot, forcats, etc.) I read this once when I was first learning R and again after a few years of experience and learned new things each time. This book will make anyone better at data analysis, visualization, manipulation, and cleaning.

Get it on Amazon for $18.17

7. Advanced R – Hadley Wickham - $53.73

Advanced R book cover

Source: Amazon

This book will teach you how the R language works on a much lower, more technical level. It’s a useful book for helping advanced users write more performant code. It’s also useful for people learning R whose background is primarily in other languages, as it will help to draw parallels between R and other languages.

Get it on Amazon for $53.73

8. Analyzing Baseball Data with R – Albert and Marchi - $49.66

Analyzing Baseball Data with R book cover

Source: Amazon

This book focuses on baseball data specifically, but is filled with data analysis and visualistion examples in R. It’s filled with real-world examples of munging data to answer questions and understand rich data sets (in this case, baseball data).

Get it on Amazon for $49.66

9. Python for Data Analysis – Wes McKinney - $23.09

Python for Data Analysis book cover

Source: Amazon

This is a great book for learning the ins and outs of Pandas. It will teach you to clean, aggregate, transform, and visualize data in Python using Pandas dataframes. It’s Python’s closest equivalent to R for Data Science.

Get it on Amazon for $23.09

Data Visualization

10. The Visual Display of Quantitative Information – Edward Tufte - $32.95

Visual Display of Quantitative Information book cover

Source: Amazon

Communicating what your data have to say with clarity, precision, and efficiency. Its pretty graphics also make it a great coffee table book.

Get it on Amazon for $32.95

Econometrics / Applied Statistics

11. Mostly Harmless Econometrics: An Empiricist’s Companion – Angrist and Pischke [HIGHLY RECOMMEND] - $26.28

Mostly Harmless Econometrics book cover

Source: Amazon

A handbook on advanced econometrics. Useful for brushing up on linear models (simple and multiple linear regression) and experiment design (instrumental variables, difference-in-difference models, answering causal questions.)

Get it on Amazon for $26.28

12. Mastering Metrics: The Path from Cause to Effect – Angrist and Pischke - $27.67

Mastering Metrics book cover

Source: Amazon

This book is very similar to Mostly Harmless Econometrics, but more beginner-friendly. Get this one instead if you’re learning econometrics for the first time.

Get it on Amazon for $27.67

Experiment Design

13. Bit by Bit: Social Research in the Digital Age – Matthew Salganik - $27.73

Bit By Bit book cover

Source: Amazon

This book felt like a greatest hits compliation of all the most useful and exciting things I learned about experiment design as an undergrad. It’s the best book I’ve found to date for marrying the strengths of old-school statisticians and newer-school data scientists.

Get it on Amazon for $27.73

Miscellanious (lighter reads)

14. The Signal and the Noise — Nate Silver - $11.28

The Signal and the Noise book cover

Source: Amazon

What is data science? How can it be used to solve real-world problems?

Get it on Amazon for $11.28

15. The Book of Why – Judea Pearl - $21.23

Book of Why book cover

Source: Amazon

Learn the basics of causality

Get it on Amazon for $21.23

16. The Information – James Gleick - $13.98

The Information book cover

Source: Amazon

A short, digestible history of and introduction to information theory. It won’t make you an expert, but you’ll get the main ideas.

Get it on Amazon for $13.98

17. The Book: Playing the Percentages in Baseball - Tango, Lichtman and Dolphin - $19.95

The Book: Playing the Percentages in Baseball book cover

Source: Amazon

Learn how data science has revolutionized the game of baseball, with plenty of applied examples using real data.

Get it on Amazon for $19.95

18. Superforecasting: the Art and Science of Prediction — Phillip Tetlock - $13.11

Superforecasting book cover

Source: Amazon

What makes a good forecast? See how a mix of quantitative and qualitative techniques can be combined to see the future.

Get it on Amazon for $13.11

19. Chasing Perfection - Andy Clockner - $12.99

Chasing Perfection book cover

Source: Amazon

A mostly-qualitative run through the current state of basketball analytics, detailing recent phenomena such as the decline of the mid-range jumper, tanking for draft picks, and the specialized medical analyses being used to ensure player longevity.

Get it on Amazon for $12.99


Share