Jump to ratings and reviews
Rate this book

Statistical Models: Theory and Practice

Rate this book
This lively and engaging textbook explains the things you have to know in order to read empirical papers in the social and health sciences, as well as the techniques you need to build statistical models of your own. The author, David A. Freedman, explains the basic ideas of association and regression, and takes you through the current models that link these ideas to causality. The focus is on applications of linear models, including generalized least squares and two-stage least squares, with probits and logits for binary variables. The bootstrap is developed as a technique for estimating bias and computing standard errors. Careful attention is paid to the principles of statistical inference. There is background material on study design, bivariate regression, and matrix algebra. To develop technique, there are computer labs with sample computer programs. The book is rich in exercises, most with answers. Target audiences include advanced undergraduates and beginning graduate students in statistics, as well as students and professionals in the social and health sciences. The discussion in the book is organized around published studies, as are many of the exercises. Relevant journal articles are reprinted at the back of the book. Freedman makes a thorough appraisal of the statistical methods in these papers and in a variety of other examples. He illustrates the principles of modeling, and the pitfalls. The discussion shows you how to think about the critical issues including the connection (or lack of it) between the statistical models and the real phenomena. Features of the book: authoritative guidance from a well-known author with wide experience in teaching, research, and consulting careful analysis of statistical issues in substantive applications no-nonsense, direct style versatile structure, enabling the text to be used as a text in a course, or read on its own text that has been thoroughly class-tested at Berkeley background material on regression and matrix algebra plenty of exercises, most with solutions extra material for instructors, including data sets and code for lab projects (available from Cambridge University Press) many new exercises and examples reorganized, restructured, and revised chapters to aid teaching and understanding"

458 pages, Paperback

First published January 1, 2005

Loading interface...
Loading interface...

About the author

David Freedman

39 books16 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
33 (47%)
4 stars
24 (34%)
3 stars
9 (12%)
2 stars
3 (4%)
1 star
1 (1%)
Displaying 1 - 4 of 4 reviews
Profile Image for G..
15 reviews2 followers
January 19, 2019
I took one course from David Freedman at UC Berkeley not long before he died in 2008, and this book was the textbook that we used in that class. The class was Statistics 215A, a course on the practical application of statistical models designed for PhD level students of Statistics or Biostatistics. Professor Freedman was a man I both highly respected, and privately despised at times. He was one of the most intelligent and knowledgeable statisticians who I have ever come across, but his teaching methods were occasionally cruel and hard-nosed. He was not a man who would suffer fools easily, or quietly, for that matter. It was routine in class that he would pose questions, often ones that appeared deceptively simple, that his students would hesitate to answer. They would hesitate because they knew that speaking would bring his attention to them, and with his attention came a laser-like critical focus that could burn right through the misconceptions, errors, and fallacies that the student would unknowingly and inevitably hold. Professor Freedman was simply brilliant, but the majority of his students, even doctoral candidates at one of the strongest graduate programs in Statistics in the United States and the world, were not as brilliant as him. I do not clearly remember many of the actual questions he would ask, but I clearly remember the fear, the hesitation, and the awkwardness with which the class would respond to them. One moment in particular that stands out involved a PhD student in Statistics (of Russian origin), who answered one of Freedman's questions with an answer that prompted perhaps the kindest response from Freedman that I heard: "that's the least stupid thing I've heard you say all day." Several times his responses to students reduced them to tears. Once the class became wise to his method of asking questions we would sit silently, each unwilling to expose our ignorance to the burning light of truth that Freedman would wield. His response was to begin chiding us to at least take a stand, as the worst and most pathetic thing we could do would be to cower in fear of being wrong. He was usually able to goad some brave PhD student into answering, and his response would always make the misconceptions clear to us all. His method was a painful one, but it did teach us statistics.

One of the more amusing things that Professor Freedman would say is that he was one of the last remaining "non-wizard" statisticians. By this, he seemed to imply that many of the current statisticians were busy applying methods in causal inference that leveraged mathematical assumptions to enable claims from data that were beyond what traditional statistical methodology was capable of. I always personally thought that he may have been referring to Statisticians such as Peter Bickel, Mark Van Der Laan, and many of those from the Stanford school. This statement from Professor Freedman was amusing, but also utterly serious, and completely reflective of his point of view of how Statistics should be practiced and applied. In Statistics 215A, this point was made very clearly: when applying statistical methods, one must be absolutely certain of the assumptions that one employs in their application. His professional judgment was that simply creating fanciful assumptions to justify powerful mathematical approaches was wizardry, and by implication, often as unrealistic and useless as magic is in the real world. Perhaps the single most important lesson I learned in the class that I took from Professor Freedman was to mentally separate the mathematical model of a situation from the reality of the data generating process that the model describes. While this may sound to be an elementary observation, one that would be learned in the first few statistics classes that one would take, it is absolutely fundamental, and an observation that is all too frequently forgotten in the heat of data analysis. Freedman taught us to be careful to clearly understand the probabilistic model, and to realize that along with certain seemingly reasonable assumptions sometimes come unexpected and generally completely unjustifiable ones. He would remind us that logically, if we could not be sure that each of these assumptions were met, then any and all analyses that we would make using an approach that would assume them are possible false, meaningless, and a waste of time and effort.

Freedman's book, Statistical Models: Theory and Practice, is a kind of upper-division and early graduate level sequel to his well known introductory Statistics book written with Roger Purves (who I also knew while at UC Berkeley) and Robert Pisani, simply named Statistics. Those familiar with the book Statistics will recognize the style of exposition and problem sets when reading Statistical Models. They have the same concept-oriented focus as that of Statistics. In fact, the first four chapters of Statistical Models serves as a quick review of the basic approach of multiple linear regression, largely in the context of causal inference (in which one wishes to make arguments regarding causal connection rather than simple statistical association). This is because the rest of the book centers on the use of the multiple linear regression model in applications primarily in the fields of the social sciences or in Economics. As a part of Statistics 215A we selected current social science articles published in journals from among those pre-selected and chosen by Freedman himself, and we characterized and criticized their use of statistical methods, including their tendency to either ignore or completely misunderstand the mathematical or probabilistic assumptions implicit in their use.

The overall theme of Statistical Models is to present practical modeling approaches that are actually used in applications of Statistics, and to make it crystal clear that in the vast majority of cases when they are used the interpretations of the results they provide may be totally unjustified or incorrect. This book seems to have been designed by Professor Freedman to train statisticians to reflexively remember to check and verify assumptions, and to be painfully aware that using statistical methods when one cannot or has not done such things is equivalent to making the mystical pronouncements of a shaman or witch doctor. Freedman wanted to ground the statisticians he taught in a hard-headed, critical approach to making claims or statements of how things really are from data analyzed using statistical methods. He wanted statisticians to both fear making "magical" statements, and to abhor doing so. This attitude of his endeared me to him, even as his rough teaching style drove me to fear being left alone with him (which was a truly ego-shattering experience for me).

Statistical Models has chapters on maximum likelihood and the bootstrap in regression that are both simple, and practical. None of the book goes too deeply into mathematical details apart from careful discussions of modeling assumptions. This is certainly not a book that presents results in a mathematical style of definition, theorem, proof. That is not the purpose of this book, which is to serve as a practical guide to the use and interpretation of statistical models, valuable to any practicing or applied statistician. The chapters on path models or simultaneous equations seem primarily useful to those performing work in Econometrics or other social sciences. Also included in the book are reprints of certain selected papers written by social scientists among those that Freedman would use as case studies in pitfalls in applying and interpreting statistical methods.

While much of Statistical Models would be most directly applicable to those in economics or the social sciences, the clarity of approach (intermittent dry, Freedman style humor), and concern with fundamental understanding and careful interpretations, this book could be read for profit by any user of statistics. I value it as a kind of wake-up-call, which reminds me to not sleepwalk while performing data analysis, making unchecked or unjustifiable assumptions by haphazardly applying statistical methods to data that may be particularly unsuited for them. It is not a perfect book. I would not choose it nor recommend it as an introduction to multiple linear regression. However, I would recommend it to one who already has a fairly solid grasp of the mechanics of multiple linear regression, who would like to gain a more nuanced understanding of the right way of thinking about statistical models when analyzing real data. One thing to understand about David Freedman is that he was just about as mathematically adept as any statistician, and possibly more so than the large majority of them, but the exposition in Statistical Models may not always seem to reflect that due to his casual and speaking-style of writing. Don't be fooled by this into thinking that the mathematical aspects of the subject that he treats was not first and foremost in his mind. He is definitely not anything like Richard Feynman, but he does share with him the ability to take a complex mathematical subject and explain it in such a way that it is both complete and understandable to one of less sophistication (and in this group it is safe to include many graduate students of fields that employ Statistics, and not the general public). This is a book for statisticians, and among them, particularly for applied statisticians.

As it has been quite a few years since I read the book from cover to cover I won't go into particular details or aspects of the text, but I do occasionally revisit sections of interest, even if it is just to remind myself of Freedman's style and clarity of thinking. While this is not he best possible book of its kind, it is a good one, and should not be overlooked by those wanting to expose themselves to practical issues in the application of statistical methods.
2,681 reviews37 followers
February 8, 2016
While in many ways this is a book of the mathematics used in the construction of statistical models, there are some gems at the end. The first chapter is very educational as it contains explanations of three of the best experiments ever conducted, some of which were natural. A natural experiment is where data is collected and then assigned to treatment or control in a random manner. The data is then analyzed and then processed in order to better understand or to assign an explanatory mathematical model.
The first is the Health Insurance Plan (HIP) study regarding the efficacy of breast cancer treatments. The second is the famous data analysis of the spread of cholera conducted by John Snow in 1855, decades before the emergence of the germ theory of disease. The last is a description of the model of poverty developed by G. U. Yule in the last year of the nineteenth century. Using census data, he developed a model on the causes of poverty. These three examples serve as a primer on how valuable statistical models can be and how they are derived from databases.
The titles of chapters 2 through 8 explain the mathematical contents fairly well. They are:
*) The Regression Line
*) Matrix Algebra
*) Multiple Regression
*) Path Models
*) Maximum Likelihood
*) The Bootstrap
*) Simultaneous Equations
The math is all soundly developed so that the reader will understand how it is used to create the models.
However, I found the reprints in the appendix to be by far the most interesting content. There are four of them and the first is of a paper by James L. Gibson where he examines the sources of political repression during the McCarthy era. Gibson investigates whether the primary source of repression was the political elite or from the mass public.
The second reprint is of a paper by William N. Evans and Robert M. Schwab and is an examination of the relative effectiveness of public and Catholic high schools regarding the students finishing high school and starting college. The third reprint is of a paper by Ronald R. Rindfuss, Larry Bumpass and Craig St. John and is an examination of the relationships between the education that a woman has versus her rates of bearing children. The last reprint is of a paper by Mark Schneider, Paul Teske, Melissa Marschall, Michael Mintrom and Christine Roch. It is an examination of whether the opportunity for parents to select the public schools their students attend leads to their being more involved in school programs such as the PTA.
Reading these papers gives the reader an appreciation for the breadth of use that mathematical and statistical models can be applied to. In a world where people cannot be assigned or manipulated, only the power of statistical modeling can be used to evaluate and explain the consequences of aspects of public policy.

This book was made available for free for review purposes
140 reviews2 followers
April 4, 2016
Nice textbook on the use of statictical models in social and health sciences.
Displaying 1 - 4 of 4 reviews

Can't find what you're looking for?

Get help and learn more about the design.