Jump to ratings and reviews
Rate this book

Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations

Rate this book
Avoid data blunders and create truly useful visualizationsAvoiding Data Pitfalls is a reputation-saving handbook for those who work with data, designed to help you avoid the all-too-common blunders that occur in data analysis, visualization, and presentation. Plenty of data tools exist, along with plenty of books that tell you how to use them--but unless you truly understand how to work with data, each of these tools can ultimately mislead and cause costly mistakes. This book walks you step by step through the full data visualization process, from calculation and analysis through accurate, useful presentation. Common blunders are explored in depth to show you how they arise, how they have become so common, and how you can avoid them from the outset. Then and only then can you take advantage of the wealth of tools that are out there--in the hands of someone who knows what they're doing, the right tools can cut down on the time, labor, and myriad decisions that go into each and every data presentation.

Workers in almost every industry are now commonly expected to effectively analyze and present data, even with little or no formal training. There are many pitfalls--some might say chasms--in the process, and no one wants to be the source of a data error that costs money or even lives. This book provides a full walk-through of the process to help you ensure a truly useful result.


Delve into the "data-reality gap" that grows with our dependence on data
Learn how the right tools can streamline the visualization process
Avoid common mistakes in data analysis, visualization, and presentation
Create and present clear, accurate, effective data visualizations
To err is human, but in today's data-driven world, the stakes can be high and the mistakes costly. Don't rely on "catching" mistakes, avoid them from the outset with the expert instruction in Avoiding Data Pitfalls.

272 pages, Paperback

First published August 20, 2019

Loading interface...
Loading interface...

About the author

Ben Jones

6 books10 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
26 (38%)
4 stars
30 (44%)
3 stars
7 (10%)
2 stars
4 (5%)
1 star
0 (0%)
Displaying 1 - 4 of 4 reviews
October 7, 2020
I love how the author's setting the importance of ethics and the integrity for the data-wielding:
Q:
And if you've been working with data for some time, you'll read a section here or there, and you'll nod knowingly, glancing down at a scar or two that you earned by falling headfirst into the pit with me. And your brow may furrow when you read about other pitfalls, a sinking feeling coming over you that you may have made that mistake without recognizing it. If so, know that I feel your pain.
It's really important, though, that we learn to pick ourselves up and dust off our jeans, clean off those scuff marks, ice any bruises we may have suffered, and carry on, a bit wiser for the experience.
Equal in importance is that we show others the same grace. (c)
Q:
If you've worked with data before, you know the feeling. You're giving an important presentation, your data is insightful beyond belief, your charts and graphs are impeccable and Tufte-compliant, the build to your grand conclusion is unassailable and awe-inspiring. And then that one guy in the back of the room – the guy with folded arms and furrowed brow – waits until the very end to ask you if you're aware that the database you're working with is fundamentally flawed, pulling the rug right out from underneath you, and plunging you to the bottom of yet another data pitfall. It's enough to make a poor data geek sweat bullets. (c)

Takeouts.
Error types:
Pitfall 1: Epistemic Errors: How We Think About Data
Pitfall 2: Technical Traps: How We Process Data
Pitfall 3: Mathematical Miscues: How We Calculate Data
Pitfall 4: Statistical Slipups: How We Compare Data
Pitfall 5: Analytical Aberrations: How We Analyze Data
Pitfall 6: Graphical Gaffes: How We Visualize Data
Pitfall 7: Design Dangers: How We Dress up Data

Exploring the contours of your data.

A lot of great examples, including the one I hate so much: the non-zero axes thing - it's so darn irritating and there actually are some people who think it's an example of great design.

Q:
“Simple random samples” can be anything but simple to get right, and just ask a data guru to explain what a “p-value” means in layman's terms sometime. (c)
Q:
It's not crime, it's reported crime.
It's not the outer diameter of a mechanical part, it's the measured outer diameter.
It's not how the public feels about a topic, it's how people who responded to the survey are willing to say they feel.(c)

The inner-outer join / string variations issue:
Q:
This has been a successful story, so far. Allison has avoided the common pitfall of bringing together two data sets and doing calculations and analysis of the two merged tables without first considering the areas of overlap and lack thereof.
A technicality? Sure, but that's exactly why it's called a technical trespass. (c)

Q:
Why the sudden decrease in reported strikes not seen for a full decade? Was there some effective new technology implemented at airports all over the country? A mass migration of birds and animals south? A strike by the FAA employees responsible for managing the data? (c)

Q:
There are a number of pitfalls in play here:
Simply computing the difference in mean between the different groups and assuming any difference you see is statistically significant, ignoring the statistical probabilities altogether. We'll call this the “p-what? pitfall.”
Getting a p-value that's low by sheer chance and therefore rejecting the null hypothesis when it's actually true is the “Type 1 pitfall.” In other words, you assume there's a statistically significant difference between the groups when they're basically cut from the same cloth.
Getting a p-value that's high means you can fall into the “Type 2 pitfall” by failing to reject the null hypothesis when it's actually false.
Misunderstanding the concept of statistical significance altogether, you get a low p-value in an experiment and then you run around the building waving a piece of paper and claiming to everyone who will listen that you have definitive proof that the null hypothesis is dead wrong, because MATH! Let's call this pitfall “p is for proof, right?”
Running a test in which you collect data on many, many different variables, you blindly compute p-values for dozens upon dozens of comparisons, and lo and behold, you find a couple of low p-values in the mix. You don't bother confirming it or asking others to replicate your results. You just sit back and breathe a sigh of relief that you found a low p-value and thank the stats gods that now you'll have something to write about. We'll call this pitfall “p is for publish, right?”
You confuse the notion of practical significance and statistical significance, and you conduct a huge clinical study with thousands and thousands of patients, some taking an experimental drug and others a placebo. You get a p-value of <0.0001 for your key factor – lifespan – but you forget to look at the size of the difference between the means. The difference is vanishingly small, and test subjects can expect to live 2 days longer in total. Of course, this pitfall is called the “p is for practical, right?”

These are just a handful of pitfalls that null hypothesis testing can cause us to fall into, which is at least part of the reason why a number of scientists, researchers, and statisticians are ditching the procedure altogether in favor of Bayesian methods such as the Bayesian information criterion.(c)
Profile Image for David.
205 reviews5 followers
February 10, 2020
Good examples of the different types of pitfalls to be aware of in working and visualizing data. The checklist at the end is very helpful. A bit more on the data processing side than other, similar books like How Charts Lie but a lot of overlap, too, presented slightly differently.
Profile Image for Gabriel Le Gall.
22 reviews2 followers
January 17, 2022
Insightful book, maybe too basic at some points (e.g. pitfall about summing of ratios with different denominators...) but I learned a lot. I was extremely curious when the author talked about OpenRefine. I did not know this tool and I think I has so many business applications.
646 reviews
March 15, 2020
really great information. overly conversational writing style turned me off a bit. definitely a more accessible data/viz text than others I've read.
Displaying 1 - 4 of 4 reviews

Can't find what you're looking for?

Get help and learn more about the design.