Whether you’re an aspiring data scientist or a seasoned professional, these are 19 books that will help you improve your skills in machine learning, data analysis, visualization, statistics, and more. I’ve read and can highly recommend them all.
I’ve split this data science reading list into a few categories, but some may naturally belong in several groups (R for Data Science, for example, is great for learning both data visualization and analysis.) And just so you know, I may collect a commission from Amazon for any books you purchase using these links.
Machine Learning (Theoretical)
A good first book on machine learning. Shows how most popular machine learning algorithms work, and also teaches a proper workflow for training and evaluating models (e.g. train/test splits, cross validtion, picking a loss function.) Allows a reader to get an intuitive grasp of what is going on inside the “black box”, but is a little too far on qualitative side if one hopes to gain a full understanding. For a deeper dive, see the advanced version Elements of Statistical Learning.
Get it on Amazon for $21.72
2. Elements of Statistical Learning — James, Tibshirani, and Hastie [HIGHLY RECOMMEND] - $30.60
Similar to Introduction to Statistical Learning, but much more mathematically dense. This is my favorite reference book for machine learning theory. Topics include generalized linear models, additive models, bagging, boosting, tree-fitting algorithms, random forests, gradient boosting, and much more. It’s worth reading several times.
Get it on Amazon for $30.60
Another machine learning book that focuses on theory. It won’t show you how to train your own models, but it will help to understand why models work and what guarantees we’re able to make about learning and generalization. Less focused on specific ML algorithms, and more focused on the properties of learning and generalization.
Get it on Amazon for $45.00
4. Deep Learning – Goodfellow, Bengio and Courville [HIGHLY RECOMMEND] - $25.01
A balance of intuition, applicability, and theory that this field has been lacking. Begins with the nuts and bolts of feedforward networks, and then goes into depth about the state of the art in model regularization, optimization, and various model classes and architectures. Filled with useful tips and tricks for implementing models. This is the best book out there right now for learning how deep learning works.
Get it on Amazon for $25.01
Machine Learning (Practical / Applied)
5. Deep Learning with Python – Francois Chollet - $24.48
A handy reference for Keras. This book is helpful for bridging the gap between beginner deep learning tutorials and more advanced / state-of-the-art methods. It’s not the best for learning theory, but will help you to implement what you read in papers.
Get it on Amazon for $24.48
Working with Data (Python an R Programming)
6. R for Data Science – Wickham and Grolemund [HIGHLY RECOMMEND] - $18.17
An absurdly useful book for learning how to manipulate data with R and the Tidyverse (dplyr, ggplot, forcats, etc.) I read this once when I was first learning R and again after a few years of experience and learned new things each time. This book will make anyone better at data analysis, visualization, manipulation, and cleaning.
Get it on Amazon for $18.17
7. Advanced R – Hadley Wickham - $53.73
This book will teach you how the R language works on a much lower, more technical level. It’s a useful book for helping advanced users write more performant code. It’s also useful for people learning R whose background is primarily in other languages, as it will help to draw parallels between R and other languages.
Get it on Amazon for $53.73
This book focuses on baseball data specifically, but is filled with data analysis and visualistion examples in R. It’s filled with real-world examples of munging data to answer questions and understand rich data sets (in this case, baseball data).
Get it on Amazon for $49.66
9. Python for Data Analysis – Wes McKinney - $23.09
This is a great book for learning the ins and outs of Pandas. It will teach you to clean, aggregate, transform, and visualize data in Python using Pandas dataframes. It’s Python’s closest equivalent to R for Data Science.
Get it on Amazon for $23.09
Communicating what your data have to say with clarity, precision, and efficiency. Its pretty graphics also make it a great coffee table book.
Get it on Amazon for $32.95
Econometrics / Applied Statistics
11. Mostly Harmless Econometrics: An Empiricist’s Companion – Angrist and Pischke [HIGHLY RECOMMEND] - $26.28
A handbook on advanced econometrics. Useful for brushing up on linear models (simple and multiple linear regression) and experiment design (instrumental variables, difference-in-difference models, answering causal questions.)
Get it on Amazon for $26.28
This book is very similar to Mostly Harmless Econometrics, but more beginner-friendly. Get this one instead if you’re learning econometrics for the first time.
Get it on Amazon for $27.67
This book felt like a greatest hits compliation of all the most useful and exciting things I learned about experiment design as an undergrad. It’s the best book I’ve found to date for marrying the strengths of old-school statisticians and newer-school data scientists.
Get it on Amazon for $27.73
Miscellanious (lighter reads)
14. The Signal and the Noise — Nate Silver - $11.28
What is data science? How can it be used to solve real-world problems?
Get it on Amazon for $11.28
15. The Book of Why – Judea Pearl - $21.23
Learn the basics of causality
Get it on Amazon for $21.23
16. The Information – James Gleick - $13.98
A short, digestible history of and introduction to information theory. It won’t make you an expert, but you’ll get the main ideas.
Get it on Amazon for $13.98
Learn how data science has revolutionized the game of baseball, with plenty of applied examples using real data.
Get it on Amazon for $19.95
What makes a good forecast? See how a mix of quantitative and qualitative techniques can be combined to see the future.
Get it on Amazon for $13.11
19. Chasing Perfection - Andy Clockner - $12.99
A mostly-qualitative run through the current state of basketball analytics, detailing recent phenomena such as the decline of the mid-range jumper, tanking for draft picks, and the specialized medical analyses being used to ensure player longevity.
Get it on Amazon for $12.99