Jump to ratings and reviews
Rate this book

Think Stats

Rate this book
If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. You'll work with a case study throughout the book to help you learn the entire data analysis process―from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.

136 pages, Paperback

First published January 1, 2011

Loading interface...
Loading interface...

About the author

Allen B. Downey

31 books213 followers
Allen Downey is a professor of Computer Science at Olin College and the author of a series of open-source textbooks related to software and data science, including Think Python, Think Bayes, and Think Complexity, which are also published by O’Reilly Media. His blog, Probably Overthinking It, features articles on Bayesian probability and statistics. He holds a Ph.D. in computer science from U.C. Berkeley, and M.S. and B.S. degrees from MIT. He lives near Boston, MA with his wife and two daughters.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
86 (18%)
4 stars
183 (39%)
3 stars
131 (28%)
2 stars
45 (9%)
1 star
13 (2%)
Displaying 1 - 30 of 51 reviews
Profile Image for Jean-Luc.
273 reviews33 followers
August 20, 2012
Most books about Statistics teach the subject w/ with pen and paper, and don't take advantage of the powerful CPUs sitting on most students' desks. Books about computing statistics assume the reader already knows the mathematical theory. This book tries to strike a happy medium: teaching students to understand data by writing programs to flesh out the computations for you. It's an ambitious book, but it doesn't entirely work.

For starters, it doesn't actually list what a student should know ahead of time, beyond Python (the programming language used). There is a surprising amount of Calculus referenced. A student that attempts to read this book without a decent amount of math and programming experience will be quickly discouraged.

I lost track of the # of times a term is discussed and used in an exercise before it's defined. I can't help but think this is what happens when you don't have an editor.

I would murder a dog for Cthulhu if I thought it would get me a solution manual. I might post my own if I can remove all the profanity.

The book frequently refers to reader to Wikipedia articles for more information, but maths Wiki entries are opaque to the non-expert. The editors are too busy trying to out-formalize and out-complicate each other w/ no regard for the muggles. Footnotes should have been better chosen.

I'm going to stop here because I'm just too frustrated by how much time I spent on this. The author uses this book as a textbook for his own class. It's easy to camp his subversion repository and see that he posts new updates at least once a week. I'm confident that after a few more semesters, this will be a good book. At the moment, it isn't.
Profile Image for Nathan Brodsky.
15 reviews11 followers
Read
October 22, 2019
The book is full of valuable insights and good, elaborate explanations. Well worth the read.
Profile Image for Ali Izadi.
36 reviews4 followers
April 9, 2020
Practical stats with computationl approach. Good book to start using statistics in your data analysis problems.
Profile Image for briz.
Author 6 books72 followers
Shelved as 'did-not-finish'
July 23, 2016
It's a textbook. A good one. I didn't finish it. Wiping the slate clean! I saw Allen Downey give a talk on Bayesian stats, and it was fun and informative. I think he's great.

One annoyance. I think I'm maybe the perfect audience for this book: someone who took stats long ago, has worked with data ever since in some capacity, but has moved further and further away from the first principles/fundamentals. Someone who speaks Python and wants to port all of her Stata skillz onto pandas (the Python library, not the Chinese bear - okay, also the Chinese bear*). So, in a way, this book was perfect for that. MY ONE COMPLAINT is that Allen provides many helper functions and .py files pre-written for you to play around with. I would have preferred less hand-holding, and more: Now build a function that will give you the cumulative distribution function!

But then: who am I to complain. I didn't finish it (for now). And it can be hard, sometimes, to find the perfect puzzle piece for your current skillset/desires/time constraint on the Great Learning Journey that is life.

* images of pandas feverishly computing z-scores, while I cackle above them, "Work harder!!! Why are you so slow!!!" images of furry paws clutching pens and notebooks, scribbling
Profile Image for Sergey Shishkin.
158 reviews46 followers
June 25, 2016
Very comprehensible introduction into computational statistics. Minus one star for code examples: Wrapping numpy, pandas and scikit into a class-oriented API made the examples rather harder to understand. I'd rather prefer the examples to re-implement library methods in plain Python first and then point to the library functions.
Profile Image for Nancy.
72 reviews20 followers
May 1, 2015
While I'm only halfway through this book, it teaches neither statistics nor tips/tricks with Python libraries. The github source code that accompanies the book is probably more useful as a reference than the book. I recommend a book that focuses on one or the other. This is interesting to flip through.
Profile Image for Danielle.
374 reviews4 followers
September 22, 2020
3.5. Interesting computational approach to statistics; even as a Python user, I would have preferred a more language-agnostic approach to the methods discussed in the book (but I guess that wasn’t the point).
Profile Image for Franta.
117 reviews113 followers
October 20, 2016
This is a computing book that teaches basic statistics concepts. Downey has a very peculiar way of explaining math and science concepts - it is purely example/experiment driven.

If you like this style of learning and like to solve interesting problems with some math and lots of coding experiments, I highly recommend Peter Norvig's Jupyter Notebooks:
http://norvig.com/ipython/README.html .

Profile Image for Utsav Parashar.
40 reviews7 followers
June 11, 2019
Good Book to start with about stats.
basic knowledge of python will be useful.
Profile Image for Micyukcha.
106 reviews3 followers
March 8, 2021
Computational introduction to Stats through Python. For a multi-disciplinary subject such as data science/stats/comp sci, there will be multiple approaches for beginners. For the programmer/coder, this method may be easier to follow than a math/statistical approach.

It does require some follow-up reading and offline searches and also some line-by-line interpretation of the author but as a statistical introduction that requires you to get into the code and teaches you through trial, it works.
Profile Image for Alexander L. Hayes.
70 reviews1 follower
October 18, 2021
This book had me arguing with myself. Between chapters I went between feeling like this was great to feeling like this was terrible. After some mental digesting, I've summarized my qualms as:

1. Believing this needs a third edition (it does)
2. Being annoyed with parts of the presentation (it seems minor, but inconsistent use of bitly and Wikipedia links felt like they'd be improved by actual references), and
3. Feeling like this should either be an introductory book for pandas, numpy, or statsmodels (but not all three). Or that it should minimize library use to focus on implementation concepts like a 'Little' book.

I liked how Downey carried a few examples through the entire text and motivated how each choice leads to a different interpretation. Furthermore: there was good focus on how assumptions turn into results and where the methods were weak. Both invite the reader into having some skepticism during interpretation, and suggest that they should be on the lookout for what is (or is not) stated when reading statistical arguments elsewhere.

So who should read this, and when should they read this? (1) If you're a Python expert who wants an easy introduction to statistical modeling. (2) If you're comfortable in R but want to pick up Python. (3) If you're comfortable with statistics, want to pick up some programming experience, and don't mind trial and error.

In summary: this is flexible enough to use in multiple circumstances, but this should be used as supplementary material and not a primary source for any of the topics it mentions.



Other books I've read by Allen B. Downey:

- Think Julia: How to Think Like a Computer Scientist
Profile Image for Yahia El gamal.
47 reviews4 followers
July 8, 2018
Very nice book. It's different from what you usually get in that area. I would describe as a modern introduction book of stats. Modern because it focuses on computational methods (e.g. starts with bootstrapping to calculate confidence intervals of the mean instead of analytical methods). It doesn't go very deep but it covers a lot of things.

The nice thing about it is that you go through the same prolems/datasets from one chapter to another. And you build on top of what you learned in a very coherent manner.

Python code is very clean as well
Profile Image for André Hagenbruch.
9 reviews5 followers
December 28, 2011
Although this is just a slim volume you will profit most from it if you have the time to do the exercises and follow the many pointers (often from Wikipedia) to the full explanations. After that you should have a pretty good grasp of topics like distributions, probabilities, and hypothesis testing...
Profile Image for Maged M..
79 reviews3 followers
January 6, 2018
thinking like a stats. I like the book structure. How Allen introduce several stats in the books through one problem.
Profile Image for Romain.
801 reviews47 followers
April 20, 2020
Ce livre offre un bon panorama de l’utilisation des statistiques dans un contexte data science, mais il est loin d’être réussi. Tout d’abord je n’ai pas adhéré au plan suivi par l’auteur. Il se prête peut-être à des cours – ce livre est issu des cours dispensés par l’auteur –, mais pas à la lecture. Ensuite il mixe mathématiques et programmation et c’est justement là qu’il pèche. Si les deux disciplines sont intimement liées, il est en effet impensable de faire des statistiques avec un papier et un crayon, mais de là à expliquer comment on a codé ses propres fonctions en Python alors qu’il existe des librairies comme pandas, statsmodel, scipy, seaborn, etc. je ne comprends pas mis à part, encore une fois, pour le côté didactique. Et puis à trop vouloir coder on oublie la méthode en route, le pourquoi. Qu’est-ce qu’il faut faire dans quel ordre, le comment étant quasiment accessoire avec ce qu’y existe aujourd’hui. A mon sens, un bon livre de statistiques moderne devrait se contenter d’expliquer la démarche, le pourquoi utiliser telle ou telle technique, telle ou telle mesure, mais pas comment les mettre en oeuvre. Ça me rappelle un peu les cours où l’on nous demandait de faire des calculs de matrice ou d’intégrale à la main c’est un peu la même démarche que je trouve toujours aussi inutile.

C’est le second livre d’ ALLEN DOWNEY que lis et je ne suis toujours pas convaincu. Le bon côté des choses, car il y en a un, est qu’il est accessible gratuitement en ligne.

https://www.aubonroman.com/2019/05/th...
897 reviews19 followers
May 4, 2019
My quest for a really helpful stats book goes on. Because this isn't it.

Now, that's a more severe judgment than I intend because there were parts of this book that were helpful and deepened my understanding.

In most stats books, I find it difficult to separate the material that explains the stats concepts from the material (if any - since this is always under-represented) that explains how to do stats (i.e. how to analyse a dataset or how to analyse the results of an experiment). This book is no exception and perhaps is sightly worse for several reasons (a) it uses Python programs for both purposes, (b) it often uses the same datasets for both purposes, and (c) analytic methods are postponed to a final chapter - the main chapters rely more on simulations, which (given points(a) and (b)) seems to blur the distinction more.

This is a shame because I don't see why, with better 'sign-posting', a book couldn't: introduce a concept analytically; give intuitions using simulation; and then quite separately give case studies where concepts and simulations are used for analysis and prediction.
Profile Image for Len MacRae.
12 reviews1 follower
May 25, 2018
This book seems to have a very narrow use case. It's designed as a textbook for "an introduction to the practical tools of exploratory data analysis." Do not expect anything more. This is not the book for someone trying to learn statistics or trying to learn Python. I can see it having value within a course or as a supplement to other material but limited value elsewhere.
Much of my frustration with this book can be summed by an example glossary entry: "chi-squared test: A test that uses the chi-squared statistic as the test statistic." There is a lot of material here which the author assumes you already know, and therefore many explanations are lacking. The book could be much improved by stating in the Preface more clearly what is expected of the reader before they begin this book.
340 reviews1 follower
March 20, 2018
This was a good look at some different prediction / modeling methods through simulation and re-sampling, but leaves many useful analytic methods of determining the same information for the last chapter. It would've been nice to have that presented alongside the initial information with simulations and re-sampling guiding an understanding of the analytic methods.

A lot of the actual python code has been abstracted by the author and put in classes and functions, making the examples easy to replicate quickly. However, unless you want to copy the author's code exactly, implementing these methods in to your own projects will be more difficult than the book may lead you to believe.
Profile Image for m.
19 reviews
June 1, 2019
Overall a clear, easy to follow intro to a variety of introductory topics in statistics with code snippets provided in Python.

My primary gripe is that the code snippets frequently use functions that are unexplained before they are used, or IMO unnecessarily introduce the use of OOP, which only makes following along more difficult.

Formatting-wise, I think the book would also benefit from adding syntax highlighting (unless that was just SafariBooks), PEP8 compliant function naming, and the flavor of code snippet line references employed by Fluent Python by Ramalho.
Profile Image for Mohit Aneja.
26 reviews1 follower
December 17, 2019
Disclaimer: I didn't finish the book.

Although it is a good beginner level book for practical statistics, the author uses too many "thinkplot" libraries every now and then to explain the concepts. It made it a lot harder to interpret the actual real-life implementation of those functions since I have worked with Pandas, Numpy and Matplotlib libraries before. It'd have been better if the examples used raw Python code used in actual data science applications.
Profile Image for Kenta Suzuki.
25 reviews3 followers
June 7, 2017
A good book for a programmer. This book teaches you stats in application, not theory or mathematical equation or proof which most of the textbooks present. If this book contained the instruction on how to do stats with numpy rather than pre-defined function by the author, this would be a five star book.
19 reviews
June 15, 2020
Interessante. Relembrei várias coisas, vi várias coisas, e não entendi algumas coisas (não fiz os exercícios também, acho que fazendo da pra entender). Achei bem interessante. Gostei da sinceridade dele em alguns pontos. Gostei do fato de não ser muito baseado em outros livros, parece bem com um post de blog bem grande dele.
Profile Image for Ferhat Culfaz.
243 reviews13 followers
December 21, 2018
Not much detail. Good simple explanations, but overall too simplistic and lacks depth. Plus a lot of the functions the author uses he wrote himself. It’s perhaps better to stick to the established libraries such as pandas and statsmodels to do similar work.

So overall, a bit too basic.
13 reviews
May 22, 2019
Why you need to create a book, where you in each chapter gives the reader an opportunity to read this on wikipedia? Good book for professional statisticians who wants to revise the basics. It's not appropriate structure for the book, if the main goal to make some introduction for begginers.
Profile Image for Pritesh Shrivastava.
80 reviews5 followers
July 15, 2019
Had to skip some portions of the book.

One major disadvantage I found was that instead of using standard Python packages like Scipy, the examples include a lot of custom built functions and packages which make them less generalizable.
34 reviews2 followers
April 26, 2018
I was looking for more important answers to some of the questions I had and this book was not the one because the answers I had were not mentioned like I would have thought they were.
Displaying 1 - 30 of 51 reviews

Can't find what you're looking for?

Get help and learn more about the design.