Developed from celebrated Harvard statistics lectures, Introduction to Probability provides essential language and tools for understanding statistics, randomness, and uncertainty. The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC). Additional application areas explored include genetics, medicine, computer science, and information theory. The print book version includes a code that provides free access to an eBook version. The authors present the material in an accessible style and motivate concepts using real-world examples. Throughout, they use stories to uncover connections between the fundamental distributions in statistics and conditioning to reduce complicated problems to manageable pieces.
The book includes many intuitive explanations, diagrams, and practice problems. Each chapter ends with a section showing how to perform relevant simulations and calculations in R, a free statistical software environment.
Sometime ago I got interested in machine learning. Since machine learning involves a good amount of statistics, I started looking for books or resources on probability. After abandoning two widely recommended books, I chanced upon Blitzstein's lectures, I felt that his book would have much more to offer than the lectures. I found it immensely interesting, the authors provide lucid explanations for every concept. There is a lot of emphasis on building intuition about statistical concepts. To drive these concepts home, the book has a good collection of exercises, for me these were the most exciting part. The problems are carefully crafted and will make you think, merely plugging in the formula won't work. I would heartily recommend this book as a first course in probability.
This book starts from the basic concepts of probability theory and develops it up to the law of large numbers and the central limit theorem (and more). I really like the approach this book takes, with a more intuitive explanation of the concepts rather than just focusing on the mathematics, which often (for me at least) doesn’t provide any idea of *why* these probabilistic results should be true.
The book uses the concept of a “story” to define distributions, which is basically learning by generic example. I find this way more useful than merely stating the density functions, as it becomes a lot easier to recognise when the distributions occur in practice.
The book only relies on basic real analysis, and a tiny bit of linear algebra. As the authors also emphasise: probability isn’t hard mathematics, but what makes it hard is connecting it with the real world and knowing how to apply the vast array of tools available.
"Introduction to Probability" may initially seem appealing, especially for those venturing into the realms of mathematics and artificial intelligence from a software development background. However, the book quickly escalates beyond basic concepts, becoming challenging as early as Chapter 3. Its explanations might not cater to beginners, as they lack intuitive clarity for those without a strong mathematical foundation. Moreover, the book delves into calculus unexpectedly, which could catch readers off guard. Despite offering a wealth of problem sets for practice, its suitability for absolute novices is questionable. Now, I'm exploring "Probability for Enthusiastic Beginners" in search of a more accessible entry point into probability.
5 stars for the effort of giving the intuitive stories for each probability distribution concept.
It does not just introduce the concept but provides many examples of applying probability in real-life situations. Each chapter also has the R codes that connect with the introduced concepts for readers to play around.
Overall, it is one of the best probability books for self-studying, and reading this book while watching the course (https://projects.iq.harvard.edu/stat1...) is highly recommended.
Amazing overview of probability. Great breakdowns and story examples. I'm surprised how much calculus and core mathematics concepts come up, but its very well explained.
Notes/quotes: - Probability is the root language of statistics, since "Mathematics is the logic of certainty; probability is the logic of uncertainty." - First chapter: probability and counting. Ie counting all possible outcomes and getting a probability from there. - "To generalize the notion of probability, we'll use the best part about math, which is that you get to make up your own definitions." - "The frequentist view of probability is that it represents the long-run frequency over a large number of repetitions of an experiment." ... "The bayesian view of probability is that it represents a degree of belief about the event in question, so we can assign probabilities like... 'the defendant is guilty' even if it isn't possible to repeat....the same crime over and over again." - Second chapter: "Conditional probability is the concept that addresses this fundamental question: how should we update our beliefs i light of the evidence we observe?". "Conditioning is the soul of statistics", ie "the means by which we update beliefs to reflect evidence and as a problem-solving strategy." "When we calculate conditional probabilities, we are considering what information observing one event provides about another event, not whether one event causes another." - "whenever we make a probability statement, . there is always some background information that we are conditioning on." ... "conditional probabilities are probabilities, and all probabilities are conditional." - "one of the most important skills to develop when studying probability and statistics is the ability to go back and forth on between abstract ideas and concrete examples. Relatedly, it is important to work on recognizing the essential pattern or structure of a problem and how it connects to problems you have studied previously." - "Many common mistakes in probability can be traced to confusing two of the following fundamental objects with each other: distributions, random variables, events, and numbers. Such mistakes are examples of category errors. In general, a category error is a mistake that doesn't just happen to be wrong, but in fact is necessarily wrong since it is based on the category of object. For example, answering the question 'how many people live in Boston?' With '-42' or 'pi' or 'link elephants' would be a category error.... always think about what category an answer should have." - "An especially common category error is to confuse a random variable with its distribution. We can this error sympathetic magic; this term comes from anthropology, where it is used for the belief that one can influence an object by manipulating a representation of that object." ... "'The word is not the thing; the map is not the territory' - Alfred Korzybski"... "We can think of the distribution of a random variable as a map or blueprint describing the rv. Just as different houses can share the same blueprint, different rvs can have the same distribution, even if the experiments they summarize, and the sample spaces they map from, are not the same." - "A random variable is a function assigning a real number to every possible outcome of an experiment. The distribution of a rv X is a specification of the probabilities for the events associated with X." - "If were waiting for the first Heads in a sequence of fair coin tosses, and in a streak of bad luck we happen to get ten Tails ina row, this has no impact on how many additional tosses we'll need: the coin is memoryless. The Geometric is the only memory less discrete distribution (with support 0,1,...), and the Exponential is the only memory less continues distribution (with support 0 to infinity)). - "a distribution can have multiple medians and multiple modes." - "Using strategies such as conditioning on what we wish we knew and first-step analysis, we can often decompose complicated expectation problems into simpler pieces." - "In many settings, the expected value of the item that they bid on given that they won the bid is less than the unconditional expected value they originally had in mind." - "'what should I do if I can't calculate a probability or expectation exactly?' ... : simulate it, bound it, or approximate it." - "Monte Carlo just means that the simulations use random numbers (the term originated from the monte carlo casino in monaco)." - "Every time we use the proportion of times that something happened as an approximation to its probability, we are implicitly appealing to LLN (law of large numbers)." - "...the law of large numbers states that the proportion of Heads converges to 1/2, but this does not imply that after a long string of Heads, the coin is due' for a Tails to balance things out. Rather, the convergence takes place through swamping: past tosses are swamped by the infinitely many tosses that are yet to come." - "The Chi-Square distribution is important in statistics because it is related to the distribution of the sample variance, which can be used to estimate the true variance of a distribution." - "The PDF of the Student-t distribution eith n degrees of freedom looks similar to thst of a standard Normal, except with heavier Tails (much heavier if n is small, and not much heavier if n is large).....t1 distribution is the same as the Cauchy distribution.... As n -> inf, the tn distribution approaches the standard Normal distribution." - "Inequalities and limit theorems are two different ways to handle expectations and probabilities that we don't wish go calculate exactly. Inequalities allow us to obtain low and/or upper boundaries on the unknown falue: Cauchy-Schwarz and Jensen give us bounda on expectations, while Markov, Chebyshev, and Chernoff give us bounds on tail probabilities.... the law of large numbers says as n -> inf, the sample mean X converges to the true meaning mu with probability 1. The central limit theorem says that the distribution of X, after standardization, converges to the standard normal distribution." - "...for real-world phenomena, independence can be an excessively restrictive assumption; it means that the Xn provide absolutely no information about each other....Markov chains are a happy medium between complete independence and complete dependence.... Markov property (for time-homogenlus Markov chain)... says that given the entire last history X0, X1, X2... Xn, only the most recent term, Xn, matters for predicting Xn+1.... The Markov property greatly simplifies the computations of conditional probability: instead of having to condition on the entire past, we only need to condition on the most recent value.... to describe the dynamics of a Markov chain we need to know the probabilities of moving from any state to any other state.... this information can be encoded in a matrix, called the transition matrix, whose (i, j) entry is the probability of going from state i to state j in one step of the chain." - "[Little's law] The long-run average number of customers in a stable system is the long-term average arrival rate multiplied by the average time a customer spends in the system." - "A set is a Many that allows itself to be thought of as a One" Georg Cantor - "Much of math and statistics is really about pattern recognition: seeing the essential structure of a problem, recognizing when one problem is essentially the same as another problem (just in a different guise), noticing symmetry, and so on." - "It is very easy to make mistakes in probability, so checking answers is especially important. Some useful strategies for checking answers are (a) seeing whether the answer makes sense intuitively (though...probability has many results that seem counterintuitive at first), (b) making sure the answer isn't a categorical error or a case of a biohazard, (c) trying out simple examples, (d) trying out extreme examples, and (e) looking for alternative methods to solve the problem."
I discovered that if I wanted to develop a solid foundation in machine learning then a familiarity with the basics of probability theory and statistical inference would be useful.
This book teaches basic probability and several of the more common statistical distributions in an academic but easy to read style. It's a good prep before learning statistical inference.
Prerequisite is only first year university level mathematics skills.
I technically read his sequel to this book (for stat111 - message me if anyone reading this wants a copy), but it isn't published/on goodreads and I want to keep tracking what I'm reading!
Prof Blitz is the best and the book I read is for intro to statistical inference and if you ever have the chance, you've got to take this class! The best intro to inference I've ever seen - helped me finally clear up a bunch of topics I was always shaky on (which is why i went back to slog through so much review).
A few things I learned:
1. The cramer -rao bound. I never understood where this comes from, but it's just from the the fact that correlations are less than or equal to abs(1), so that bounds the variance of an estimator by looking at the cor of the estimator and it's score. Such a simple proof that I was reading too quickly to understand in the past!
2. I knew this, but it's just so good to remember: Confidence intervals are random, parameters are not (at least for the frequentist). So, the confidence interval is the thing that moves around and if we have 100 conf intervals, 95 of them will cover the true parameter (if 95% conf.)
3. When in doubt, if one is struggling with any sort of statistics problem, there's a solid chance that adding and subtracting some value (therefor adding zero) and then refactoring and using the law of iterated expectations will probably solve everything lol. ESPECIALLY if you know that it's a refactoring, e.g., Var(Y) = Var(E[Y|X]) + E[Var(Y|X)]
I mostly self-studied this book, along with watching a few of the Stat 110 videos. The difficulty level of different chapters varies and I have a much hazier understanding of some of the topics covered (e.g. moment-generating functions) than others, but I feel like I learned a lot regardless.
An Anki deck I made from material in the book is available here.
Probably one of the best introductory level mathematics books I have ever read. If I were to teach an intro to probability theory course, this would be my go-to book as of right now.
I would suggest anyone interested in statistics or probability theory to give it a read, even if you are somewhat knowledgeable in the field, some of the examples are really well thought out and demonstrate elementary concepts very well.
Introduction to Probability is a modern take on Probability and Statistics. Authors Joseph K. Blitzstein and Jessica Hwang do a phenomenal job explaining the different aspects of the field.
The book is a textbook, so it contains workable problems. The book is recent so it includes some aspects of popular culture.
I enjoyed the book. Thanks for reading my review, and see you next time.
I read this book and did associated Harvard course to learn probability. The examples and teaching between this and the online version helped me understand probability, random variables, multiple types of distributions, etc. I read up until chapter 10 and thought the contents were good.
Great book. Pair it with STAT110 and you have a huge amount of material for self-study. Book and video lectures are both suited for refreshment and for a first contact with the subject.
The best intuition builder in probability theory by far. Covers much more than a typical introductory-level text and has such a clearness in explanation than nothing else.
Without much exaggeration, this is the single most important book I have read in my life. It introduced me to the possibility of extending logic to the realm of unpredictability and chaos. The abundant amount of concrete and vivid examples helped me to appreciate many of the core concepts of probability and statistics. I sincerely recommend this book to anyone who wants a lucid, fascinating, and at times challenging read.