Jump to ratings and reviews
Rate this book

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Rate this book
Foreword by Steven Pinker

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.

By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.

Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?

Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.

338 pages, Hardcover

First published May 9, 2017

Loading interface...
Loading interface...

About the author

Seth Stephens-Davidowitz

12 books810 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
12,320 (30%)
4 stars
16,187 (40%)
3 stars
8,560 (21%)
2 stars
2,235 (5%)
1 star
1,017 (2%)
Displaying 1 - 30 of 3,668 reviews
Profile Image for Jessi.
21 reviews31 followers
April 27, 2017
This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less than 300 pages of text should not need to use the same example three times, especially when it's about how the author can't believe women are concerned about the smell of their vagina.

The last section of the book explains the limitations big data holds and is really the most grounded section, the rest being almost hagiography. It would have done a lot to work the third section into the examples of the first two sections. It would have balanced out the praise and also would have done much to explain the flaws present in some of the examples included.

Some cool facts buried in a lot of murky oddness.

Disclaimer: I was given this book in a Goodreads giveaway.
Profile Image for Will Byrnes.
1,327 reviews121k followers
May 6, 2021
…people’s search for information is, in itself, information. When and where they search for facts, quotes, jokes, places, persons, things, or help, it turns out, can tell us a lot more about what they really think, really desire, really fear, and really do than anyone might have guessed. This is especially true since people sometimes don’t so much query Google as confide in it: “I hate my boss.” “I am drunk.” “My dad hit me.”
There’s lies, damned lies and then there are statistics. One must wonder. Do the lies get bigger as the datasets grow? Seth Stephens-Davidowitz posits that the availability of vast sums of new data not only allows researchers to make better predictions, but offers them never-before-available tools that can offer insight that direct questioning never could.

description

We have seen steps up of this type before. Malcolm Gladwell has made a career of such, with Blink, Outliers, and The Tipping Point. Freakonomics is the one I would expect most folks would know. Nate Silver put his data expertise into The Signal and the Noise. All these looks at data and how we interpret it rely on the analyst, regardless, pretty much, of the data. While the same might be true of Stephens-Davidowitz’s approach, he focuses on the availability of materials that have not been there in the past. The smarts that must be applied to get the most interesting results can now be applied to new oceans of data. It is more possible than it has ever been to draw inferences and actually test them out.

In addition to the volume of data that is now available, there is the sort. The author looks at Google and FB data for evidence of underlying realities. Surveys can sometimes offer inaccurate outcomes, when the people being queried do not provide honest answers. Are you a racist? Yes/No. But one can look at what people enter into Google to get a sense of possible racism by geographic area. The everyday act of typing a word or phrase into a compact rectangular white box leaves a small trace of truth that, when multiplied by millions, eventually reveals profound realities. Looking for queries on jokes involving the N-Word, for example, turns out to yield a telling portrait of anti-black sentiment, which also correlates with lower black life expectancy. (And pro-Trump vote totals)

We are treated to looks into a variety of research subjects, from picking the ponies, to seeing what really interests/concerns people sexually, looking for patterns of child abuse, selecting the best wine, using the texts of a vast number of books and movie scripts to come up with six simple plot structures.

I thought the most interesting piece was on the use of associations, and provoking curiosity, rather than relying on overt statements to influence how people feel about a different group of people. Another was on using a data comparison of one’s (anonymous) medical information to others who share many characteristics to improve medical diagnoses.

There are some areas in which it was not entirely persuasive that the methodology in question was tracking what was claimed. SS-D sees in searches of Pornhub, for example, what people really want and really do, not what they say they want and say they do. Really? I expect that what people check out on-line does not necessarily track with what might be of interest in real life. It would be like someone with an interest in mysteries being thought to have homicidal tendencies after searching for a variety of homicide related titles. Should a writer doing research into a dark subject like child pornography, human trafficking or cannibalism expect the heavy knock of the police on his/her door? Where is the line between an academic or titillation search and one made for planning?

SS-D makes a point about there being a significant difference between searches that offer projections for groups or areas, and their inapplicability for predicting individual behavior, although that will not necessarily remain the case. In baseball, for example, the explosion of available information may very well be applied to specific players to diagnose and even correct flaws in technique, or recognize patterns that might expose underlying medical issues, or predict their arrival. The Big Data related here is much more macro, looking at group proclivities. Useful for spotting trends, measuring public sentiment, but in more detail than has been heretofore possible.

And of course there is the impact of dark players. Those with the resources and motivation could manipulate the Big Data produced by Google and Facebook. Such players would not necessarily be limited to Russian cyber-spies and pranksters, but corporate and ideological players as well, like Robert Mercer. There could have been a bit more in here on those concerns.

The book offers plenty of anecdotal bits that could have been lifted from any of the other data books noted at the top of this review. What one needs, ultimately is smart, insightful analysis. Having all the data in the world (that means you, NSA) is merely a burden unless there is someone insightful enough to figure out the right questions to ask, and how to ask them.

SS-D notes several Google (Trends, Ngrams, Correlate) services that might be familiar to folks doing actual research, but which were news to me. It might be useful to check out some of these, maybe even come up with meaningful queries to shed light on pressing, or even completely frivolous questions.

Not all problems can be solved, or even examined by the addition of ever more data. Sometimes, many times, the information that is available is perfectly sufficient to the task, but other factors prevent the joining together of its various pieces to create a meaningful whole. The now classic example is from 9/11, when an absence of coordination between the CIA and FBI resulted in suicide bombers who could have been foiled succeeding in their mission. Politics and the culture of nations and organizations figure into how data is used

So if everybody lies, is Seth Stephens-Davidowitz telling us the truth? I am sure there is a query one could construct that would look at diverse data sources, pull them all together and give us a fuller picture, but for now, we will have to make do with reading his book and articles, checking out his videos, applying the analytical tools already incorporated into our brains, and seeing if there is enough information there with which to come to a well-grounded conclusion. And that’s no lie.


Review first posted – May 5, 2017

Publication date – May 9, 2017

=============================EXTRA STUFF

Links to the author’s personal, Twitter, and FB pages

VIDEOS – SS-D speaking
----- Stanford Seminar - Insights with New Data: Using Google Search Data
-----Google Sex with Seth Stephens-Davidowitz - Arts & Ideas at the JCCSF
----- Big Data and the Social Sciences - The Julis-Rabinowitz Center for Public Policy and Finance

The June 2017 National Geographic cover story has particular relevance to the treatment of actual truth in today's political environment. It is illuminating, if not exactly uplifting. - Why We Lie: The Science Behind Our Deceptive Ways - By Yudhijit Bhattacharjee

July 12, 2017 - Washington Post - one of the very serious applications of big data - The investigation goes digital: Did someone point Russia to specific online targets? - by Philip Bump

July 15, 2017 - One of the ways big data gets compromised is via automated dishonesty - Please Prove You’re Not a Robot by Tim Wu - Thanks to Henry B for letting us know about the article
Profile Image for Lori.
308 reviews99 followers
February 3, 2018
When sociologist ask people if they waste food, people give the only correct answer. It's wrong to waste food.

When sociologist survey the contents of the same people's garbage, they get a more accurate answer.

Just imagine how much more information is available trolling through internet searches.
Profile Image for Richard Derus.
3,168 reviews2,094 followers
June 18, 2023
2020 EXHORTATION Wednesday, 29 July 2020, the four horse-manuremen of the datapocalypse will testify before Congress about their insane, untrammeled greed and its deleterious effect on Society. (I am presupposing the end result of the hearing here because I am under no obligation to hide my own opinion of these nauseating monopolists.)

2019 EXHORTATION We're entering the 2020 election cycle for real at this moment. Please, all US citizens, PLEASE read books! Especially books about data, how it's acquired and analyzed, how it's massaged and manipulated—the more you know about the topic, the harder it will be for agenda-having politicians to lie to you with numbers.

I have nothing unique to add to the conversation about this book. I think those most in need of reading it won't, and that's frustrating.

If you've ever seen a number adduced to explain a trend, read this book. If you've ever asserted that a certain percentage of something was something/something else, read this book. If you've ever seen a politician quote a study and your innate bullshit filter clogged up, read this book.

Really simple, high-level terms: READ. THIS. BOOK.
Profile Image for David Rubenstein.
821 reviews2,665 followers
January 22, 2018
This is an engaging book about how big data can be used to improve our understanding of human behavior, thinking, emotions, and preference. The basic idea is that if you ask people about their behavior or their preferences in surveys, even anonymous surveys, they will often lie. People do not like to admit to low-brow preferences; racists do not want to admit to their prejudices, most people who watch pornography do not want to admit to it, and even voting is often misrepresented; some people who voted for Trump would not admit to it.

But, by analyzing immense datasets from Google, public archives, social media, and the like, Seth Stephens-Davidowitz has been able to unearth a lot of fascinating answers to puzzling questions. For example, he is able to predict, through Google searches for various symptoms, who is likely to have early stages of pancreatic cancer. He can predict epidemic breakouts of some contagious diseases well before they are announced by the CDC (Center for Disease Control). He shows that the single factor that correlates with voting for Trump is that of racism.

Then there are the fun factoids, about the sorts of things that people search for most often on Google. Most commonly, the search "Is my son ..." is followed by "gifted", while the search "Is my daughter ..." is followed by "overweight". That tells us something about stereotypes for the way people think about their children. Interestingly, the release of a new violent movie in a city is correlated with a decrease in violent crime in that city. Perhaps the reason is that violent people who are watching the movie are not out on the streets, committing crimes.

And here we get to the main problem with this sort of analysis. Undoubtedly, the research and analysis of big datasets is done correctly. However, once a surprising result is found, understanding the motivations behind the online activity are often subjective and open to interpretation. While this book is very careful about its underlying assumptions, it is a slippery road to getting the correct interpretations and explanations.

This is an easy, well-paced book that should appeal to anybody who enjoys books like Freakonomics: A Rogue Economist Explores the Hidden Side of Everything.
Profile Image for Rana Heshmati.
568 reviews845 followers
December 16, 2019
بنظرم کتاب بی‌نظیری بود. من اطلاع دقیقی از کلان‌داده‌ها نداشتم و همون چیزهایی رو می‌دونستم که اکثراً می‌دونیم. ولی این کتاب دید خیلی جالب‌تری بهم داد. الان کمی بیشتر، از اینترنت و اطلاعاتی که ازم به جا می‌مونه می‌ترسم، یکم بیشتر حواسم رو جمع می‌کنم، (انگار که چشم سومی مدام درحال مشاهده توست. نه به صورت شخصی، که یک سوژه ناشناس. که ترسناک‌تر هم هست.) و ‌شاخک‌هام نسبت به چیزهایی که می‌بینم حساس‌تر شدن.
زبان طناز و درعین‌حال غم‌انگیزی داشت که باعث می‌شد با اینکه مثلاً نان‌فیکشنه، راحت و روون باشه و تونسته بود مطالعات سخت آدم‌ها رو به شکلی که هر خواننده‌ای متوجهشون بشه توضیح بده، که خیلی باعث تحت‌تأثیرقرارگرفتنم شد.
ترجمه هم ترجمه خوبی بود. گرچه یک زیرفصل (بخش اول فصل چهارم: The truth about sex) حذف شده که من از روی نسخه انگلیسیش خوندم و اونم خیلی جالب و آگاهی‌بخش بود. کاشکی چاپ می‌شد.
اما درنهایت عاشق پاراگراف آخر مؤخره شدم که گفته بود:
«اما مهم نیست چقدر برای پیراستن نثرم زحمت کشیده باشم، چرا که بیشتر مردم نهایتاً پنجاه صفحه اول را می‌خوانند، چند نکته را می‌فهمند، و بعد به زندگی‌شان ادامه می‌دهند.
بنابراین کتاب را به شایسته‌ترین شکل ممکن به پایان می‌برم؛ یعنی با تبعیت از داده‌ها و توجه به کاری که مردم می‌کنند، نه چیزی که می‌گویند. پس می‌روم با چندتا از دوستانم نوشیدنی می‌خورم و دست از کار کردن روی این مؤخره لعنتی می‌کشم.
کلان‌داده‌ها می‌گویند تنها تعداد خیلی کمی از شما هنوز در حال خواندنید.»

جالب اینکه، من مقدمه رو که خونده بودم و برای یکی از دوستام تعریفش می‌کردم گفت خب دیگه می‌خوای بخونی چیکار؟ همینه دیگه. فهمیدی چه خبره...
ولی هی. من تا آخر خوندمش. بسیار لذت بردم. بسیار تحت‌تأثیر این همه کار و زحمتی که کشیدی قرار گرفتم و دلم خواست خودم هم در انجامشون بهت کمک می‌کردم. ممنون.
Profile Image for Maziyar Yf.
599 reviews353 followers
December 22, 2022
ست ایزاک استیونز دیویدویتس نویسنده و دانشمند جوان آمریکایی اعتقاد دارد که اصولا انسان ها بخشی از نظرات و رفتار خود را پنهان کرده یا ظاهر سازی می کنند ( در فرهنگ ما ضرب المثل هایی مانند خواهی نشوی رسوا همرنگ جماعت شو یا با سیلی صورت خود را سرخ نگه داشتن هم تاکیدی ایست بر همین مفهوم ، یعنی تمایز میان ظاهر و باطن یا تقابل میان درون و بیرون . درون است که ارزش زیادی دارد و بیرون با آنکه حائلی ایست برای دنیای حساس و ظریف باطن اما بر آن تصنع و احتیاط حاکم است ) . بنابراین نظر سنجی ها همانند ابزارهای سطحی هستند که نمی توانند به عمق رفتار انسان پی ببرند اما با گسترش روزافزون اینترنت و استفاده گروه زیادی از مردم از آن ، حال می توان دریافت که مردم در جست و جوی روزانه شان به دنبال چه هستند ، چه چیزهایی را دوست دارند و یا حتی به کدام کاندیدا رای خواهند داد . این خلاصه و اساس کتاب همه دروغ می گویند نوشته ست استیونز است .
نویسنده روشهایی مانند طراحی پرسش نامه یا سوال کردن مستقیم از مردم را ناکارمد و منسوخ می داند ، او به جای این روش استفاده از کلان داده یا بیگ دیتا را پیشنهاد می کند که هر لحظه بر حجم آن افزوده می شود . به این گونه دانشمندان علوم اجتماعی به اقیانوسی از اطلاعات دست پیدا کرده اند که تقریبا تمامی ندارد . استیونز چهار ویژگی اصلی برای کلان داده شمرده است : دسترسی در هر زمان به جدیدترین اطلاعات ، فراهم آوردن داده های صادقانه ، امکان زوم کردن روی زیر مجموعه هایی کوچک از مردم و امکان پذیر ساختن آزمون روابط علی چهار مزیت کلان داده هستند .
اساس کار نویسنده بر این اصل استوار است که تمامی انسان ها تمایل دارند افکار شرم آورخود مثلا در مورد نژادپرستی یا تبعیض جنسیتی و یا گرایشات عجیب جنسی شان را از دیگران پنهان کنند ، اما هنگام استفاده از گوگل و یا سایت پورن هاب که هویت فرد را حفظ می کند افراد صادقانه به آن چه در فکرشان است و یا آنچه دوست دارند انجام دهند اعتراف می کنند . بنابراین ظاهر ما چهره ایست که در جامعه یا در شبکه های اجتماعی که هویت ما آشکار است مانند اینستاگرام یا فیس بوک و حتی گودریدز از خود نشان می دهیم و باطن ما آن چیزی ایست که در گوگل یا سایت پورن هاب به دنبال آن هستیم . در حقیقت در این سایت ها ما چهره فرهنگی ، شاد و نخبه مان را نشان می دهیم نه خود واقعی مان را . استیونز نشان می دهد که کلمات ، کلیک ها ، لینک ها و از همه مهمتر جست و جوها هستند که داده ها را تشکیل می دهند . به کمک این داده های جدید می توان واقعیت پنهان دروغ های مردم را دید . نویسنده این داده ها را اکسیر حقیقت دیجیتال می نامد . اکسیر حقیقت نشان می دهد که مردم علاقه بسیار زیادی برای داوری دیگران بر اساس ظاهرشان دارند ، انبوهی مرد و زن گرایشهای هم جنس خواهانه یا تمایلات جنسی فانتزی دارند ، عداوتی فراگیر با آمریکایی های آفریقایی تبار وجود دارد و کودک آزاری پنهان و اسلام هراسی خشونت آمیز هم بسیار فراوان است .
اکسیر حقیقت دیجیتال نشان می دهد که ما در افکار خود تنها نیستیم و از آن مهمتر ممکن است بتواند به انسان در جهت یافتن راه حل هایی برای کاستن از رفتارهای نفرت انگیزش کمک کند .
ست استیونزگرچه کلان داده ها را همانند انقلابی بزرگ می داند اما هشدارمی دهد که چندان ابزار قاطعی هم نیست . کلان داده ها ما را از دیگر روش هایی که انسان ها از روزگاران گذشته برای فهم جهان کشف کرده و بهبود بخشیده اند بی نیازمان نمی کند . این ابزارها هر دو مکمل هم هستند .
در پایان نویسنده کلان داده را نسخه قرن بیست و یکمی این ضرب المثل می داند : هیچ وقت درونت را با بیرون دیگران مقایسه نکن . او البته نسخه جدیدی از این زبانزد ساخته : هیچ وقت جست و جوهای گوگل خودت را با پست های شبکه های اجتماعی دیگران مقایسه نکن .
Profile Image for Eli ad.
2 reviews9 followers
December 31, 2019
such an interesting book, it broaden my views, i'm looking forward to read more books of the author
Profile Image for Monica.
659 reviews661 followers
June 16, 2019
Everybody Lies has all the makings of the kind of book I get suckered into buying during an amazon kindle sale. A pop culture polemic that has a very short half-life of relevancy. After reading it, my first blush was to say that I was spot on. But as I thought about it, I realized it had more depth. That's likely because Seth Stephens-Davidowitz is an actual scientist trying to educate people about what they are actually revealing with everything that they say and do.

The late 20th Century has heralded access to vast quantities of information on every one of us. Our buying habits, browsing habits, what news sources that we use in a very proliferated world of news access. We are telling about ourselves every time we go online on our tablets, phones and computers. Every text, every phone call, every e-mail is adding data to our digital make up. Like it or not, data is collected and available for each and every one of us on some very personal things. There is a field of science dedicated to analyzing the data and interpreting its meaning. The byproducts of this new field are used for good and evil. Corporations can use the information to target people who may buy their items (How did goodreads know that I was thinking about buying a mattress?). Some data mining results could be used to determine how much people need certain types of government services. Some internet searches combined with buying habits, forum discussions, book reviews, blog posts etc have led to medical discoveries. The amount of data is staggering and the ability to compile and analyze the data to reveal useful information is a new science that goes way beyond statistics. It requires knowledge of math, and sociology and psychology and engineering and biological science and an understanding of human nature etc to attempt to mine useful information. What Stephens-Davidowitz has discovered is that everybody lies about…well everything. His primary discussion is that people rarely tell the truth in poll and surveys etc. They also lie on their online data habits. Oftentimes to themselves. That little fact obviously complicates the mining of data for example in Red States w/ their stated evangelical postures that consume the most porn and have the highest rates of internet searches for access to abortions etc. People lie in their own searches as they seek to reinforce their own positions and don't necessarily search for answers. Those kinds of actions are not surprising but they complicate analysis (understatement).

This book was very interesting data primer. There is so much more to the amount of data most of us generate every day and Stephens-Davidowitz does a great job of explaining the basics. Some of his examples and his approach are a bit superficial, juvenile, pop culture. I don't find myself curious about the users of pornhub, or average penis size or baseball stats. Some of that was silly and salacious; betraying his youth and blatantly catering to what his data mining perceived would be an audience of young males. Bah. Also, he quoted Malcolm Gladwell as a resource which in my view should never be used if you hope to build a foundation based upon experience in the field and credibility on the subject…of anything. Nonetheless, I enjoyed the book and I think Stephens-Davidowitz has a very compelling and prosperous future as both a scientist and a writer.

4 Stars

Read on kindle
Profile Image for Ali Karimnejad.
314 reviews198 followers
February 4, 2023
3.5

کتاب پر از آمار جالب توجهه و در واقع با بررسی بازدیدهای سایت‌های پورن و سرچ‌های گوگل، خیلی دقیق‌تر از خیلی نظرسنجی‌ها و مطالعات جامعه‌شناسی نشون می‌ده که مردم برغم اونچه که میگن، چیز دیگه‌ای در سر دارن.

کل حرف کتاب رو در یک کلام می‌شه اینطور خلاصه کرد که مردم به هزاران دلیل ��ختلف احتمالا در نظرسنجی‌ها دروغ خواهند گفت یا حداقل حقیقت رو به طور کامل نخواهند گفت. تنها جایی که مردم پیش اون براحتی اعتراف می‌کنن یا نادانسته خود واقعی‌شون رو بی‌پرده آشکار می‌کنن در پیش چشم موتورهای جست‌وجو هست. بخش عمده کتاب به مصادیق مختلف این قضیه و استفاده‌های تجاری یا جامعه شناختی بالقوه‌ اون می‌پردازه که انصافا هم جالب توجه بود.

با این همه بنظر خود من، اگرچه نویسنده روی انقلابی بودن این روش مانور می‌ده، ولی همچنان لازمه توجه کنیم که در خیلی موارد، می‌شه تفسیرهای متفاوتی از نتایج داشت و اونقدری که نویسنده دلش می‌خواد نشون بده، آمار موتورهای جست‌وجو، حقیق�� رو شسته و رفته تحویل ما نمی‌ده. مثال می‌زنم: ا

مثلا اینکه در بین پر بازدیدترین ویدئوها، هیچ ویدئو پورنی وجود نداره، لزوما چیز بخصوصی رو نشون نمی‌ده. قاعدتا شما وقتی ویدئو "گانگنام استایل" رو می‌بینی دلایل زیادی داری که اون رو باز نشر کنی برای دوستات یا خونوادت ولی وقتی ویدئو پورن می‌بینی اینکار رو هرگز نخواهی کرد. همین روی تعداد بازدیدها خیلی اثر میگذاره.

در واقع گاهی بنظرم میومد که تاکید بیش از حد روی آمار و ارقام باعث شده نویسنده نتیجه‌گیری‌های شتاب‌زده‌ای انجام بده. همین یکم توی ذوقم زد. ولی اون مطالبش که راجع به آزمون‌های الف-ب موتورهای جس��‌وجو و فیس‌بوک روی کاربران بود خیلی برام جالب بود.

سر جمع حتما ارزش خوندن داره ولی باید بدونید که کتاب هرگز از سطح اطلاعات عمومی فراتر نمی‌ره و چیزی یادتون نمی‌ده.
Profile Image for Carole.
537 reviews129 followers
October 26, 2019
Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who We Really Are by Seth Stephens-Davidowitz takes us into the world of social sciences via the internet. I might have found the book version a bit on the boring side but I enjoyed the audiobook. Big Data can answer any and all of our questions. But will the answer be what we want to hear. And do we need to know about all aspects of the world we live in. The book is similar to some of Malcolm Gladwell's work but it is not Malcolm Gladwell. However, you will be informed, you will learn about our world and you will sometimes be amused.
Profile Image for Amos.
16 reviews
July 23, 2017
No practicing analyst or social scientist will find anything of value in this book. It verges on being dangerously deceptive, filled with logical fallacies and half baked reasoning for it's conclusions. The book claims to be finding truth in an uncertain world, but actually is just adding to the noise.
Profile Image for Rachel.
132 reviews8 followers
November 27, 2017
I wanted to like this book. It's an interesting topic. But I found the methodology extremely sloppy. Or maybe the author just omitted some key facts. He was clearly determined to prove that racism caused the election of Donald Trump. But it's disconcerting to read the conclusion BEFORE the data analysis itself. On one hand, he says that Obama easily won two terms, DESPITE racism. Then he quickly says that Trump won the 2016 election BECAUSE of racism. So which is it? Is racism so widespread that it caused both candidates to win, the black man despite it, the white man because of it? It makes no sense. Nor was I terribly convinced that Google searches for the word n-gger are actually clearcut reflections of a person who would never vote for a black president but always vote for Trump. That's a pretty big leap of logic. As is his notion that black people would spell it "n-gga" therefore all these searches are by white racists. Also, is it a real absolute that racists would never vote for a black president? After all, you could vote for Obama because you feel he's the best of two options, yet still be a flaming racist. Likewise, if you search Google for racist jokes, does that actually prove that you are treating minorities unfairly? It may sound like a reasonable conjecture but this is data science, not an op-ed column. There should be a more decisive connection before making a grand sweeping statement that Trump won due to racists but Obama won despite racists.

The author is even sloppier in the section on searches of a pornographic nature. He refers to a data set from a porn site called PornHub. He has to assume that anyone who registers on that site and states "I am male" or "I am female" is absolutely telling the truth. But how do we know that? Are we sure that men never pretend to be women to chat with others, exchange messages, or share videos on porn sites? I'm not convinced.

As was widely reported, 25% of searches by (alleged) women on porn sites are for rather violent porn. I don't mean a little spanking, but hardcore search terms including words like "brutal" and "crying" and so forth. 20% of the (alleged) women's searches are for lesbian porn. But the author is quick to point out: this is sexual fantasy! It's not real life! Those women aren't actual lesbians, nor do they want to engage in violent sex.

But when it comes to men's searches, he regards those as literal fact. If men search for gay porn, it's because they're gay, maybe closeted, but definitely gay. Why does he insist this is true for men, but not for women?

The same sloppy reasoning is applied to various other search terms. The fact that "boyfriend won't have sex" is far more common than "girlfriend won't have sex" is the foundation for his notion that men are more likely to refuse sex to their partners than vice versa. But how do we know that's actually true? What about the notion that women are more likely to SEARCH for a solution to this problem online?

The fact is that we know absolutely nothing about the people performing these searches - whether they are male or female, racist or fair-minded, gay or straight. So making assumptions about their motivations based solely on search terms is just poor data science. Maybe there's some essential research that the author omitted. But it looks like pure speculation based on search terms, which is not what I would expect of an author who claims to be a data scientist.

Stick with Freakonomics if this topic interests you.
Profile Image for aPriL does feral sometimes .
1,987 reviews457 followers
October 18, 2021
I was annoyed by the author’s writing style in ‘Everybody Lies’. I have no doubts author Seth Stephens-Davidowitz was trying to write to a large general audience, including that assumed class of American non-science reader who hates math and binge watches ‘Keeping Up with the Kardashians’. Good for him, and maybe you, right? But I became more and more annoyed as I read. Ah, well. It is an interesting and informative read, in spite of trying too hard to be fun, imho.

What is the book about? I am glad to report it has genuine information about the science of statistics and ‘big data’ collecting, and how the erroneous selection of study parameters or assumptions about what is relevant data to study affects conclusions (as far as I know - I am a dunce at scientific math, despite that I passed a statistics class). The author used what seemed to me genuinely interesting new methods to formulate statistical studies, primarily using Google’s forensic tools, along with other sources.

I was shocked by what people type into Google Search (which Google compiles into anonymous data). For example, President Obama’s race appears to have truly ignited racists into coming out of their closets. Comparing survey interviews with people who state they are racist (a low percentage) with the percentage of those who Googled “n***** jokes” state by state turns out to show some truly hidden pockets of unexpected racism - and the total percentage of racist searches on Google was WAY higher than the racism that typical surveys show. In addition, those places who adore Trump also searched most for “n***** jokes”. Correlation? Idk, no one does know for the record, but I think yes.

Also of interest to me (please don’t bust my balls because of my prurient interests - and maybe there is a pun in this sentence, hehheh - read on) men really truly do Google a lot about penis sizes. Come on, fellas, give it a rest! (Yes, I am trying to be snarky since the too much ‘at rest’ position is part of what men appear to be most anxious about!) Men prowl porn sites in humongous numbers - shocking, right? - which is good for statisticians looking for Truth about sexuality for their inputs into their mathematical equations. Based on Google porn searches, the author estimates 5% of the population is gay. (Btw, conservatives mostly use the word ‘homosexual’ while liberals use the phrase ‘same-sex’, statistically, in Google searches.)

Not to neglect what Google says about what the ladies’ biggest sexual worry is, all I can say is, Oh. My. God. Vagina odor. Really? Really!!

All statisticians should take note - interrogative surveys often show different results from those statistics revealed in Google searches about the percentages of who is thinking/feeling what where and when, especially in those morally-weighted or personally embarrassing areas of society. Of course, interpretation is always fraught with possible erroneous judgements whatever the source of sampling.

I have always trusted those insurance actuarial tables FAR more than political or media spins or even university data studies - so now I am adding Google statistics to my ‘trusted info’ list. Of course, gentle reader, I know any compilations of data can be erroneously or purposely manipulated or massaged. ‘Garbage in, garbage out’ still applies...which is the case ‘Everybody Lies’ makes as well. The book seemed on top of the science, as far as I know. I am not a science-brain, but an amateur wannabe.

My one irritation with this book is all about the manner in which the information is explained. Gentle reader, my complaint is subjective as hell. Honestly, I can’t put my finger on it, though. The writer seemed to be trying to fill out his actual 200-page book to 300 pages by having personal emotional filler similar to the gaspy asides many shows use to increase the viewers’ emotional high about what is being discussed. Are you familiar with those TV shows that, after each commercial break, recap the entire show in the preceding minutes before the commercial break in a breathless montage manner? And they often had a shocked-gasp teaser of what will be shown before the commercial break? Anyway, I felt there was a lot of that style of emotional manipulation (and extending of the material) going on in this book, somehow. I simply did not appreciate the personal ‘fun’ filler so much. Maybe there wasn’t enough snark. I prefer snarky humor, if there is humor. Bite me. Maybe a more tightly edited book would have worked better for me to enjoy reading it. Anyway, I realize I am floundering about here. None of this may be true at all for you.

Ultimately, this is a book worthy of reading for the general reader (for the record, I definitely have a lit/history brain, so yes, I am a general science reader!) and the explanatory information about how statistical studies are done (the only math-involved college class which engaged me) and what people are really feeling and thinking (if Google searches are to be believed, and I think they are).

Included are extensive Notes and Index sections.
Profile Image for Gypsy.
426 reviews580 followers
February 4, 2021

من با این کتاب یه زاویه‌ای دارم که خودمم نمدونم دوستش دارم یا نه.

کتاب مفیدی بود. در این شکی نیست. اما صادقانه نویسنده کار خاصی نکرده بود. یه موضوعی که به‌شدت توی سال‌های اخیر داغه (کلاً موضوعات مربوط به تحلیل داده و اقتصاد و تحلیل رفتار رو کی دوست نداره بدونه؟) رو انتخاب کرده و دم دستی‌ترین شواهد رو هم به عنوان گواه صحبت‌هاش آورده. من مشکلی با دم دستی بودن‌شون ندارم، اتفاقاً باعث می‌شه مخاطب عام و کسایی که خیلی خوش‌بینن هم یه تکونی بخورن، اما مشکلم با نحوۀ تحلیل نویسنده‌ست. من دلم می‌خواست به نویسنده ایمیل بزنم باهاش صحبت کنم حتی. رفتم سایتش رو خوندم، ویدیوها و سخنرانی‌هاش رو نگاه کردم، حتی چند خطی هم نوشتم، اما به نظرم کارم بی‌فایده اومد و صرف نظر کردم چون یه فارغ التحصیل دکترای ظاهراً خفنِ امریکایی قطعاً به جاییش نیست منِ ایرانی لیسانس‌خونده و خُرد و بدبخت بیام نقدش کنم. :)

عنوان کتاب بهترین بخششه، چون دربارۀ نویسنده هم صادقه.

نویسنده دروغ می‌گه. بارها در قسمت‌هایی از کتاب، در عین اینکه هیجان‌زده می‌شدم، یه‌کمم بدبین می‌شدم که عه، ببین، مگه ما اینا رو نمی‌دونیم؟ مگه ملت در درون واقعاً نژادپرست نیستن؟ مگه مردها و زن‌ها دربارۀ زندگی جنسی‌شون اغراق نمی‌کنن؟ بابا ما اینا رو می‌دونیم. لازم نیست بیای با روش خاصت اینا رو ثابت کنی. بعد اوایل به خودم تشر می‌زدم که خب صبر کن، داره از چیزهای ساده و واضح شروع می‌کنه که برسه به چیزهای عمیق‌تر. از همون اول که نمتونه بیاد بگه، اونم همچین موضوعی که این‌قدر جدیده و شاید برای خیلی‌ها اصلاً درکش سخت باشه. (من خودم درک اندکی از اقتصاد و کلان‌داده دارم) اما احساس کردم نویسنده داره زور زیادی می‌زنه کارش رو بزرگ و علمی نشون بده، درحالی‌که در هر فصل فقط به مباحثش ورود می‌کنه. نمی‌تونه از فرضیاتش خوب دفاع کنه. نمی‌تونه ایده‌هاش رو بسط بده. فقط بهت یه سری فکت واضح و بدیهی می‌ده که تو متقاعد شی چیزی که می‌گه درسته. (و خب درسته! توی این بحثی ندارم) اما نویسنده جز بیان و تشریح روش تحقیقش (اگه اینطوری می‌شه گفت) کار خاصی نمی‌کنه. جاهایی که به تحقیقات‌شون ریز می‌شه من بیشتر هیجان‌زده می‌شم ولی یهو می‌دیدم خیلی سریع ازشون می‌گذشت یا اصلاً بی‌ربط حرف می‌زد. فقط تو رو به بادِ اطلاعات می‌بست که نفهمی خودِ نویسنده نمی‌تونه چیز بیشتر و جدیدتری بهش اضافه کنه. شاید تلاشش عامدانه هم نبوده ها، شاید از سر ذوق خودش یا جوونیش بوده، چمدونم. کار شبیه رسالۀ دکتراست تا یه کتاب مدون نظری.

من توی این جدال درونی دربارۀ نیت و رویکرد نویسنده بودم و هنوزم هستم و به نظرم خواهم بود. همه دروغ می‌گویند، حتی نویسندۀ کتاب همه دروغ می‌گویند.
Profile Image for Mostafa Galal.
177 reviews210 followers
October 7, 2018
كتاب مفيد يقدم عدد من المعلومات الجيدة لكن كان يمكن اختصاره للنصف تقريباً دون أن يوثر ذلك على المحتوى
Profile Image for Jim.
Author 7 books2,050 followers
April 17, 2018
I am now convinced that Google searches are the most important data set ever collected on the human psyche. writes the author early on & he shows why. (Google trends is available to all here: https://trends.google.com/trends/) He also checked other big data sets including Wikipedia, Facebook, Pornhub, & even Stormfront, the largest racist site. What he found was really interesting & it will help harden the soft, social sciences. It's a new frontier.

He points out problems with traditional reporting. In the section about child abuse & abortions, Google searches suggest that child abuse does increase during economic downturns while gov't figures incorrectly show little change. Closing abortion clinics doesn't stop them, it simply leads to more self-induced abortions. Both happen off the books, but there is now convincing supporting data to show us what we need to address & make more informed decisions with resources.

Big data has an advantage over every other type of survey because few realize it is being collected, so we don't lie to make ourselves look better. It's also anonymous & aggregate, so caution needs to be used when forming conclusions. For instance, based on Pornhub searches, the author concludes that about 5% of men are gay because they searched for gay porn. That seemed a reasonable conclusion until he pointed out that 15% of women search for rape porn. Does that mean they want to be raped? The author says of course not & makes a big deal out of the difference between fantasy & reality. That makes me question his first conclusion, although it seems about right.

Gut reactions are often wrong & he provides several examples where it's wrong due to cognitive biases. He also points out "The Curse of Dimensionality". Given large enough sets of data, there will be correlations just through chance. For instance, there are graphs that show how closely autism diagnoses track with organic food sales or Jenny McCarthy's popularity. Separating these out is a whole other problem.

Big Data only gives us trends that we need to examine. We can't use it on the individual level. While 1000 people searched for how to kill their girl friend, only 1 girl was killed in his example. That's horrific & might have been stopped if someone had looked at his search history, but do we give up everyone's privacy for a 1 in 1000 chance that we might prevent a murder? Some might be willing, but I'm not, so we also have new questions to address.

The audio book was well narrated & I didn't miss the graphs too much. They're provided in the extra material, but weren't handy when I was listening & the book took that into account for the most part. Highly recommended in either format.
Profile Image for Raya راية.
802 reviews1,491 followers
October 20, 2018
"تحمل بعض المصادر المتصلة بالإنترنت الناس على الاعتراف بأشياء لا يعترفون بها في أي مكان آخر، أنها بمثابة مصل الحقيقة الرقمية. خذ عمليات بحث غوغل، وتذكر الظروف التي تجعل الناس أكثر صدقًا: متصلين بالإنترنت، وحدهم، لا يوجد شخص يجري دراسة استقصائية."


في ظل الثورة التكنولوجية المتسارعة، أصبح البحث عن المعلومات أسهل وأيسر مما كان قبلًا. فإمكاننا الآن أن نعرف كم شخص متصل بالإنترنت يوميًا، وما هي المواقع الأكثر تصفحًا، وإلخ من البيانات. وبالتأكيد يمكننا أن نعرف المزيد عن طبيعة البشر من خلال ما يجرونه من أبحاث على الموقع الأكثر شهرة في العالم "غوغل". وبالتالي سنعرف أن البشر يكذبون، ويستترون خلف العديد من الأقنعة ��لتي تخفي رغبات وأوهام وأفكار قد تبدو لنا صادمة. كم مرة كنت تجلس وحيدًا مع حاسوبك الشخصي وتطرح على "غوغل" العديد من الأسئلة والأفكار التي تدور في بالك وتجري أبحاثًا، والتي لا يعلم عنها أي أحد على وجه الأرض؟ بالطبع، عدد لا يصحى من المرات!

في هذا الكتاب، يبين لنا سيث سيتفنز- دافيدويتز أهمية البيانات الضخمة، وبيانات "غوغل" في الكشف عن الكثير من الأمور المختلفة والمتنوعة في حياة البشر، وكيف أن نتائج بيانات "غوغل" تختلف في نتائجها عن الدراسات الاستقصائية والاستبيانات، وذلك بأن البشر يميلون للكذب! مما يفتح الباب أمامنا لنوع جديد من استخراج المعلومات، ألا وهو "البيانات الضخمة".

"لا يخبر كثير من الناس الدراسات الاستقصائية عن الأفكار والتصرفات المحرجة. يريدون أن يبدوا جيدين على الرغم من أن معظم الدراسات الاستقصائية لا تذكر أسماء الأشخاص وهذا ما يسمى بالانحياز للمقبول اجتماعيًا."


كتاب ممتع حقًا، وفكرة جديدة عليّ كليًا.

"يكذب الناس حول عدد الكؤوس التي احتسوها في طريقهم إلى البيت، ويكذبون حول عدد مرات ذهابهم إلى النادي الرياضي، وحول تكلفة تلك الأحذية الجديدة، وفيما إذا كانوا قد قرأوا ذاك الكتاب. يتصلون متعذرين بالمرض في حين أنهم أصحاء، ويقولون أنهم سيبقون على تواصل في حين أنهم لن يبقوا على تواصل، ويقولون أن الأمر لا يتعلق بك في حين أنه يتعلق بك، ويقولون أنهم يحبونك في حين أنهم لا يحبونك، ويقولون أنهم سعداء بينما هم تعساء، ويقولون أنهم معجبون بالنساء في حين أنهم معجبون في الحقيقة بالرجال. يكذب الناس على أصدقائهم، ويكذبون على رؤسائهم في العمل، ويكذبون على أطفالهم، ويكذبون على آبائهم، ويكذبون على أزواجهم، ويكذبون على زوجاتهم، ويكذبون على أنفسهم. ويكذبون بكل تأكيد على الدراسات الاستقصائية"


أما بخصوص الترجمة، فأن زوجي وشريكي وصديقي أحمد حسين شاهين قام بترجمته لما وجده فيه من فائدة وفكر جديد يستحق أن يطلّع عليه القارئ العربي. وهذه التجربة الأولى له في الترجمة، والتي ندعو الله أن يستكملها ويطوّرها في ترجمة أعمال أخرى تثري المكتبة العربية.

وها هو رابط الكتاب، متاح إلكترونيًا ومجانًا لكل من يرغب في قرائته:
https://drive.google.com/file/d/1qQwm...

والله ولي التوفيق.

...
Profile Image for Matt Ward.
214 reviews14 followers
June 5, 2017
This book could have used a good editor. It tries to be a Gladwell-type of book without fully succeeding. Issue 1 is that the anecdotal stories are not fleshed out enough to really draw you in like Gladwell does. This causes much of the book to come across as a list of facts, and it gets pretty old by the midway point.

The other issue is a growing trend among people writing data books. They want to write in a colloquial style to make it seem informal and easy to read. They don't want to scare off people with talk of algorithms and things like that.

Unfortunately, using tons of sentence fragments and colloquial phrases only makes a book like this harder to read. It's precision and clarity that make books easy to understand. Introducing ambiguity in order to sound like a friendly conversation is exactly the wrong approach.

Overall, there are a bunch of interesting facts in here. I think Seth gets a bunch wrong, though, in not understanding fully why certain search terms are used.
Profile Image for Trish.
1,373 reviews2,616 followers
November 7, 2017
Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that Google searches contained the best data for determining what people are concerned about. He has uncovered some interesting trends that are not apparent through direct questioning because people are sometimes ashamed of their fears, feelings, prejudices, and predilections.

I didn’t really like this book. Partly the reason is because I listened to it, and Stephens-Davidowitz gives charts, graphs, data points that obviously cannot be represented in the audio version. These usually help me to grasp things easily and maybe bypass pages of material that is not as interesting to me. It wasn’t that his material was hard, it was that I oftentimes did not like what he was talking about. He had a tendency to focus on deviant behavior, e.g., sexual predators, abuse, porn, etc. One might make the argument that these behaviors are important to understand and therefore worth looking at. Possibly. However, if ‘everybody lies,’ one might make the argument that we do not have to look at deviance to find untruthfulness.

What we discover is that to test Stephens-Davidowitz’s thesis that ‘everybody lies,’ we have to spend quite a lot of time with statistics and creating studies, or as he is wont to do, studying big data. Big data probably irons out discrepancies in the reasons for our Google searches, e.g., that it is not me that is interested in the herpes virus, it is my brother, because in the end it doesn’t matter why we did the search; what matters is that we did the search. Besides, maybe I’m lying about my brother having the virus, but my interest in the topic is not a lie.

Stephens-Davidowitz has made a career so far out of the study of big data, showing us ways to slice and dice it so that it is useful to our view of the world. Only thing is, I am not as interested in what big data tells us as he is. He’d trained as an economist, and towards the end of the book he hit a couple of areas I did find more interesting, like the notion of regression discontinuity, a term used to describe a statistical tool created to measure the outcomes of people very close to some arbitrary cut-off.** S-D talks about using this tool on federal inmates, discovering criminals treated more harshly committed more crimes upon their release. But S-D also studied students on either side of the admissions cut-off for the prestigious Stuyvesant High School: those who attended Stuyvesant did not have a significant performance difference in later life than students who did not.

Apparently Stephens-Davidowitz went into data science because of Freakonomics, the bestselling book by Steven D. Levitt. He believes that many of the next generation of scientists in every field will be data scientists. I did finish the audiobook, another study he took note of in the last pages. Apparently few readers finish ‘treatises’ by economists. He believes this is his big contribution to our knowledge base, and there is no doubt his contrariness did highlight ways big data can be used effectively.

If I may be so bold, I might be able to suggest a reason why many female readers may not be as interested in the material presented, or in Stephens-Davidowitz himself (he was/is apparently looking for a girlfriend). Stay away from the deviant sex stuff, Seth. It may interest you but I can guarantee that fewer women are going to find that appealing or reassuring conversation or reading material.

An interesting corollary to this economists’ data view is the question of whether the truth matters, which is how I came to pick up this book. Recently on PBS’ The Third Rail with Ozy, Carlos Watson asked whether the truth matters. At first blush the answer seems obvious, and two sides debated this question. One side said of course truth matters…but most of us know one man’s truth to be another man’s lie. The other side said ‘everybody lies.’ It got me to thinking…I do think the two ways of coming to the notion of lying dovetail at some point, and one has to conclude that truth may not matter as much as we think. What matters is what we believe to be true.

Finally, it appears Stephens-Davidson agrees to some degree with Cathy O'Neill, author of Weapons of Math Destruction, in that he agrees you best not let algorithms run without human tweaking and interference. The best outcomes are delivered when humans apply their particular observations and knowledge and expertise along with big data.

** S-D describes it this way:
“Any time there is precise number that divides people into two different groups, a discontinuity, economists can compare, or regress, the outcomes of people very very close to the cut off.”
Profile Image for Atila Iamarino.
411 reviews4,428 followers
June 12, 2017
Acertei em cheio nessa leitura! Seth Stephens-Davidowitz apresenta uma análise de como as pessoas se comportam, na mesma linha do The Signal and the Noise: Why So Many Predictions Fail - But Some Don't e do Dataclisma: Quem somos quando achamos que ninguém está vendo. Mas enquanto Signal and the Noise fala de tendências de dados e Dataclisma fala do comportamento das pessoas dentro do OkCupid!, Everybody Lies fala de como as pessoas se comportam em geral.

O autor usa uma série de dados de forma bastante inovadora, como tendências de buscas no Google (onde ele trabalha), buscas no PornHub, Facebook e outras fontes de big data para fazer o que ele chama de "sociologia de verdade" ou sociologia baseada em evidências. Os dados que ele mostra sobre preconceito (buscas por temas preconceituosos), insegurança de auto-imagem, inseguranças em relação aos filhos e afins mostram uma imagem bem mais crua e feia da sociedade do que o que pintamos com postagens em Facebook e Instagram.

Outros revelam informações no mínimo interessantes, sobre a diferença que se formar em Harvard pode fazer (nenhuma, o ponto parece estar em quem se forma), onde criar os filhos, como aumentar as chances de sucesso em um encontro... O livro lembra bastante uma versão mais nova e, na minha opinião, mais curiosa da abordagem inovadora de Freakonomics.

Se você não está interessado na revolução que o registro e a disponibilidade de dados está causando no mundo, e no estrago que empresas e governos conseguem fazer com o controle que têm sobre a informação, no mínimo vai curtir o livro pelos fatos curiosos e mórbidos que ele levanta dos dados. Saber por exemplo que o número de homens que buscam como fazer bem sexo oral nas mulheres é o mesmo que busca por como fazer sexo oral em si mesmo fala muito sobre como as pessoas pensam. Um livro para todos os gostos.
Profile Image for Fahime.
329 reviews245 followers
November 27, 2019
کتاب حاضر -با این عنوان جذابش- در مورد کاربردهای بیگ دیتا (کلان داده) و داده‌کاوی در علوم اجتماعی و اقتصادی‌ست. نویسنده ابتدا با مقایسه‌ی نتایج حاصل از پیمایش‌ها و آمار و اطلاعات موجود نشان می‌دهد پیمایش‌ها می‌توانند گمراه‌کننده باشند (به دلیل تمایل انسان‌ها به خوب جلوه دادن خود). سپس در چندین فصل کاربردهای داده‌کاوی را مفصلا تشریح می‌کند و در انتها به محدودیت‌ها و مشکلات ناشی از فراگیر شدن داده‌کاوی می‌پردازد. شخصا از مباحث مربوط به شهود و تفاوت همبستگی و علیت بسیار لذت بردم. بسیار جذاب و پرکشش نوشته شده و ترجمه‌ی محشری دارد. به نظر من جذابیت مباحث مطرح شده در کتاب به قدری‌ست که علاوه بر متخصصین داده، علاقمندان به موضوعات اقتصادی و اجتماعی نیز از خواندن کتاب لذت ببرند. به شدت پیشنهاد می‌شود.
Profile Image for Kuszma.
2,418 reviews200 followers
March 8, 2020
„A krétai Epimenidész a következő halhatatlan kijelentést tette: Minden krétai hazudik.”

Erre az amerikai Seth Stephens-Davidowitz a k��vetkező halhatatlan kijelentést tette: Mindenki hazudik. Na, Epimenidész, most mit lépsz?

Különben meg tényleg. Hazudunk arról, mennyit szeretkezünk, mit gondolunk a gyerekeinkről, mennyire vagyunk toleránsak a kisebbségekkel. Hazudunk a facebookon, a szociológiai kutatásokban, az állásinterjún. Hazudunk másnak, hazudunk magunknak. És hogy kerek legyen: hazudunk a könyvcímekben is. Jó, legyek megengedő: csak sumákolunk, hogy jobb eladási mutatókat tudjunk elérni.

Mert ez a könyv igazából nem arról szól, hogy mindenki hazudik, bármit állítson is a címe*. Persze arról is, hisz egyik következtetése, hogy a Google-keresések során olyan dolgokat is megvallunk, amiket különben sehol máshol. De ez önmagában még egy elég puha hipotézis, hisz ki tudja, a sok hülyeség, amit a gugliba begépelünk, komolyan vehető-e. Oké, elfogadom, a banki hitelfelvételnél hajlamosak vagyunk jóval stabilabbnak lefesteni az élethelyzetünket, mint amilyen valójában, és ritkán tesszük ki nyíltan a közösségi oldalakra, hogy a PornHub-ról szoktunk esti mesét választani magunknak. (Én persze nem. De TI igen. Höh.) Ugyanakkor ha egy férfi a Google-ban rákeres, hogyan tudná önmagát, khm, a szájával, khm, nem is folytatom, az nem feltétlenül azt jelenti, hogy komolyan foglalkoztatja a kivitelezés – lehet, csak olvasta Stephen-Davidowitznál, hogy sokan ilyesmi iránt érdeklődnek, és egyszerűen megöli a kíváncsiság, ezt mégis ki képes véghez vinni. Szóval az, hogy ki hazudik és ki mond igazat, bonyolult kérdés, legalábbis óvatosan kell tehát kezelnünk a szerző azon állítását, hogy az ő módszere egyfajta „igazságszérum”**.

Valójában ez a könyv himnusz a Big Data adatkutatóihoz. Seth Stephens-Davidowitz igazi központi állítása (szerintem és hála Istennek) nem a hazugságokkal kapcsolatos, hanem azzal, hogy a Big Data olyan eszköz, ami a puszta méretével teljesen átalakítja a tudományos kutatást. Hiszen eddig az volt, hogy a társadalomtudósoknak elképesztően bonyolult és költséges volt kísérletezniük – a kérdőíves rendszerek, a csoportokat és kontrollcsoportokat felhasználó kutatások egyaránt hosszadalmas előkészületeket igényeltek, ráadásul az eredmények gyakran ugyanúgy kétségbevonhatóak voltak, vagy nem érték meg a befektetett tőkét. Ezzel szemben ma egy techcég naponta akár több ezer ún „A/B tesztet***” is lefuttathat kvázi automatikusan, minden probléma nélkül, és hála az internet elterjedésének, a kutatók a Google több milliárdos adatbázisából tudnak dolgozni, olyan információkat hívva le a rendszerből pár billentyű leütésével (és némi kreativitással), amiről pár évtizede még álmodozni se mertek. Van egy hipotézisünk, miszerint Trump választói rasszistábbak, mint a többiek? Nézzük meg, a Google-ben hol írták be a „nigger viccek” kifejezést a keresőablakba, és az így kapott térképet vessük össze a választási térképekkel. Voilá. És hogy ezt meg tudjuk tenni, az forradalmasítja a társadalomtudományt, ami eddig puha volt, ám most - azzal, hogy ilyen szimplán tudunk hipotéziseket ellenőrizni - lényegesen egzaktabbá válik****. Persze – ezt a szerző nem tagadja – ez semmiképpen sem jelenti azt, hogy a megszokott kiscsoportos kísérleteket és felméréseket el lehet felejteni, pusztán azt, hogy a kettő együtt használva elképesztően erős következtetéseket fog eredményezni. És ebben tökéletesen egyetértek vele.

Összességében ez a könyv baromi informatív és hatásos, ha azokat a passzusokat nézzük, amikor a titkolt, explicit előítéletességről beszél, amelyek még mindig hemzsegnek a nyugati civilizáció álcahálója alatt. Már pusztán ezért is érdemes elolvasni. Másfelől viszont olyannyira központba helyezi a szexualitás témakörét, hogy azt már nehéz másnak értelmezni, mint vevőcsalogatásnak. Jó, hát bevallom, ezek a részek engem is szórakoztattak, de azért közben többször elméláztam azon, mennyire relevánsak az efféle információk. Ha Seth Stephens-Davidowitz azt akarta bizonyítani velük, hogy tényleg, de tényleg mindenki hazudik, akkor jelentem, ez már durván 15-20 oldal után sikerült neki, a többi csak a tejszínhab a gesztenyepürén. Különben meg finom a tejszínhab is. Úgyhogy nem panaszkodom.

* Mondjuk ahhoz képest, hogy a szerző eredetileg a „Mekkora a péniszem?” címmel akarta piacra dobni ezt a terméket, a „Mindenki hazudik” állítás kifejezetten visszafogottnak tűnik. Ugyanakkor önmagában jelzi, hogy Stephens-Davidowitz nem retten vissza egy kis marketingdinamit-használattól.
** Jellemző a módszer korlátait illetően a példa, amit a zárszóban előcitál. Itt a szerző hivatkozik egy matematikusra, aki abból kiindulva, hogy a könyvek első feléből többet idéznek, mint a könyvek második feléből, azt a következtetést vonta le, hogy az emberek kevesebb könyvet olvasnak végig, mint ahogy azt állítják. Csinált ilyen kis csecse statisztikát is a Big Data felhasználásával, amiben kiszámolta, hogy az idézetek oldalszámozásából ítélve hány százalék fejezte be végül az adott műveket. (Piketty Tőkéjét például 3%.) No most én szintén gyakrabban idézek a könyvek első feléből, mint a végéből, mégpedig azért, mert hajlamos vagyok az első száz oldalról mindent kiírni, ami megtetszik, utána viszont ellustulok, és jobban megszűröm, mit érdemes, mit nem. Lehet, a hipotézis inkább kiötlője saját olvasási szokásairól árulkodik, mint az enyémről.
*** Olyan tesztek ezek, amelyeket mondjuk a Google, az Amazon vagy a Facebook már évek óta tömegével futtat. Például több ezer random kiválasztott felhasználónak a kék egy bizonyos árnyalatában jön föl az oldal, más felhasználóknak meg egy másik árnyalattal. Amelyikre több kattintást generál, az lesz az optimális szín.
**** Érdekes, hogy mindeközben a természettudományok mintha veszítenének egzaktságukból. Legalábbis a kvantumfizika vagy a húrelmélet kezd annyira elméleti lenni, ami kísérleti úton már nem is igazolható.
Profile Image for Yaaresse.
2,077 reviews16 followers
February 22, 2020
At 58%, I give up. DNF.
I've seldom read anything that contained so many individually interesting (if shallow) sentences and still bored the hell out of me. I'm also tired of reading about the author's infatuations with baseball, Google, and porn. I am counting this book as read, however, because I should get some small (if valueless) reward for the time I lost reading it.

Some random, non-linear thoughts because I'm not interested enough in the book to try harder at this point:
1. The author worked for Google. Apparently, he's still smitten by them and thinks their massive collection of data on users is the most wondrous gift to humans ever. Of course he does. His degree and career depend on it.
I am of the opinion that the most, and perhaps the only, honest thing Google has ever done is when they got rid of their "Don't be evil" corporate slogan. Google is kind of like Walmart to me. I don't like it, trust it, or believe anything it claims and use it as little as possible.
2. The fact that the government turned over every American's tax data to whatever researchers want to dig through it--and in such detail that they can trace income changes for every address a single individual has reported from--is worrisome. "Oh, but the individuals weren't identified." Bullshit. Maybe not by name, but there is enough info there to cross-match with other data and individually identify people. That every search you've every made is time/location stamped and available to whoever wants it is just effing creepy. While we all should know by now that we're just products to be data-mined for profit, seeing a time-stamped list of an individual's exact searches over a 24-hour period is disconcerting.
3. If this guy is right, I've been using search engines for all the wrong things. (Read that in snark font.) Do people really do Google searches for "Why are Jews cheap?" or "Is my daughter ugly?" or "Am I gay?" Gee, all this time, I've been using searches for things like "What is the capital of Latvia" or "What is the formula to calculating amortization?" Every now and then I sink to searching for David Bowie music videos. I think I once stooped to "Why the hell are the Kardashians famous?" Supposedly women everywhere are Googling "Does my vagina smell?" and "Is my husband cheating"? Well, honey, if you have to ask....
4. Data for data sake serves little purpose. It just makes for more noise, not more clarity.
5 . Apparently the author believes that the way to keep readers' interest is to use the most sensationalist examples he could come up with. Throw around a lot of racist terms and references to sex kinks or insecurities. When all else fails, talk baseball and throw in some frat boy humor. I get it: the porn industry drives nearly every internet innovation, from web design to security and data collection. Fine. But the tenth or 15th time he refers to women being worried about vaginal odor or how common Pornhub searches for incest videos are, it comes off as an attempt to be provocative rather than informative. For a chapter or two when that content is relevant to the topic, fine. Every single chapter? Boring. And really creepy after a while.
6. If there is any discussion about the ethics of data mining in this thing, it's so far buried in the back that I didn't get to it. If there is substantive material on statistical relevance, hypothesis testing or phantom populations, it's buried in the back.

He's the kid who walks in with his bright, shiny, non-traditional method and basically says he's right and everyone else who ever studied data is wrong.

(And maybe, if as the author claims, the last chapters of books don't get read as much as the beginning ones, it's because books are getting less and less informative and/or credible.)
Profile Image for Hossein.
238 reviews50 followers
November 3, 2019
کتاب فوق العاده ای بود و در اصل قدرت داده ها رو در عصر مدرن نشون میداد.
تقریبا میشه جواب همه سوالات رو با تحلیل داده ها پیدا کرد.
بعد از خوندن این کتاب به پیش بینی پروفسور هراری که در کتاب انسان خداگونه نگرش حاکم آینده جهان رو داده باوری خونده بود بیشتر ایمان آوردم.
امیدوارم آموزش متخصصین داده در کشور رونق بگیره که به نظر میاد همین الانش هم نیاز مبرم جامعه است.
Profile Image for Narges.
63 reviews12 followers
August 19, 2021
یک خبر از روزهای دور--> آپدیت جدید واتس‌اپ کاربران را مجبور می‌کند برای ادامه استفاده از این اپلیکیشن یا اطلاعات خود را با فیس‌بوک به اشتراک بگذارند یا دسترسی به واتس‌اپ را از دست می‌دهند. واتس‌اپ با آپدیت جدید کاربرانش را مجبور می‌کند تا با قانون جدید حفاظت از اطلاعات شخصی موافقت کنند و در غیر این صورت دسترسی‌شان به اپ را از دست می‌دهند. موافقت کردن با قوانین جدید واتس‌اپ به معنای آن است که اطلاعات خصوصی کاربر از جمله شماره موبایل وی با فیس‌بوک (مالک پیام رسان) به اشتراک گذاشته می‌شود. تمام کاربران این اپ باید تا ٨ فوریه ٢٠٢١ میلادی با قوانین جدید موافقت کنند

این خبر رو یادتونه؟ چند وقت پیش در گروه‌ها، این آپدیت واتساپ جنجالی شده بود و ملت از هم می‌پرسیدن چرا؟ اطلاعات ما به چه دردشون میخوره؟ چی کار می‌خوان بکنن با اطلاعات ما؟ به چی می‌رسن؟ و خلاصه کلی سوال این چنینی که خیلی از دوستان رو درگیر کرده بود!
برای جواب دادن به این سوالات به صورت عام و بدون هیچ بحث تخصصی، قطعا این کتاب رو توصیه می‌کنم به خوندن! نمونه‌های کوچیکی از تحلیل داده‌های ثبت شده از کاربران رو داره! از سایت‌های پورن بگیرین تا اطلاعات سرچ گوگل و غیره!
قطعا خیلی از سوالات این چنینی رو جواب میده و ذهن دوستانی که از دنیای دیتاست و بیگ‌دیتا به دورن رو روشن میکنه ! هرچند به قول نویسنده، اگر 50 صفحه رو خوندی و دیگه حوصله‌ی ادامش رو نداشتی کاملا طبیعی و مشکلی نداره! این یعنی نویسنده میدونه روده درازی کرده و شاید یذره متن رو به جاهای حوصله سر بر کشیده ولی در کل برای من تا آخرین خطش ارزشمند بود، چون حس میکردم با کتابی درگیرم که نویسنده به خاله زنک بازی مدرن علاقه‌منده و به من خواننده‌ی فضولش بدون زحمت، اطلاعاتی که از دیتاست‌های مختلف در آورده رو راحت کف دستم میذاره!

به طور مثال شما فک کن یه کاسه تخمه بر میداری میشینی و یک محقق میاد روبروت و برات تعریف میکنه که، مردم هند در سایت‌های پورن هاب بیشترین سرچشون چی بوده؟! یا اینکه وقتی در فلان ایالت سقط جنین رو ممنوع کردن بیتشرین سرچ ملت به روش‌های سقط جنین از طریق ویتامین سی، جعفری، چوب لباس سیمی بوده و بعد تو حالت بهم میخوره و میگی خاک به سرت با این سرچات و باز میپرسی دیگه چی فهمیدی و ادامه میدی به شنیدن و خوندن !!!!
میگم کتاب برای من خواننده‌ی فوضول جذاب بود، برا این چیزاش میگم!

حالا موضوع اینکه چرا این اسم رو برای کتاب گذاشتن؟
شاید بارزترین دلیلی ک�� نویسنده براش میاره اینکه، مردم در ظاهر یه چیزی میگن ولی در خفا با کلید کیبوردهاشون غیر از اونچه که گفتن رو سرچ میکنن! و اشاره میکنه که نبینین ملت چی میگن بلکه ببینین ملت چی انجام میدن و با کلی تحلیل‌های داده‌ای ثبت بر ادعای این موضوع رو داره!
خلاصه که کتاب با اینکه برای من بسیار جذاب بود ولی ممکنه برای شما حوصله سر بر باشه، پس آمادگی این پارادوکس رو قبلا از خوندنش داشته باشین.
Profile Image for JJ Khodadadi.
435 reviews109 followers
February 17, 2024
یک کتاب عالی درباره داده کاوی موضوعات داغ و پرحاشیه که هم زبان ساده و زیبایی داره و هم ممکنه شمارو عاشق علم داده بکنه
Profile Image for Ali.
30 reviews8 followers
January 27, 2023
چند سال پیش سریالی پخش می‌شد به نام شرلوک. کاراکتر اصلی که مثلاً تیزبین و زیرک بود، از نحوهٔ ایستادن همکارش نتیجه گرفت حتماً ارتشی بوده است، از آفتاب‌سوختگی دست‌ها و صورت نتیجه گرفت حتماً در افغانستان یا عراق خدمت کرده است، و از ساییدگی سوکت شارژر گوشی او هم نتیجه گرفت برادرش حتماً به الکل اعتیاد دارد.
استدلال‌ها و نتیجه‌گیری‌های این کتاب هم تقریباً به همین میزان آبکی (یا به عبارت بهتر کوته‌بینانه) هستند. کتاب در همان صفحات ابتدایی تمام می‌شود و ادامهٔ آن صرفاً با تعداد زیادی مثال، و مقداری لاطائلات و بدیهیات پر شده است.

***
بهمن‌ماه ۱۴۰۱
Profile Image for Hamide meraj.
208 reviews142 followers
January 22, 2020
خوندن و شروع این کتاب همزمان شد با ورود من به حوزه بسیار بزرگ دیتا در یک شرکت استارتاپی
مطالب کتاب با وجود اینکه از حوزه علم و دانش بیگ دیتا بهش نگاه میشه اما خیلی خیلی هیجان انگیزه .. اگر میخواهید بدونید کلان داده چی هست ؟ چه اطلاعات جالبی میشه ازش پیدا کرد؟ و اینکه در دنیای حاضر چقدر فضای مجازی نشان دهنده روحیات، خلقیات و تفکرات ما هستند حتما این کتاب رو بخونید. دید همه جانبه ای این تحلیلگر داده داره (در واقع همه تحلیل گرها به نظرم به مرور در همه جهات زندگی دچار این دید همه جانبه نگر میشه
خلاصه که خوندنش برای من دنیایی از شگفتی داشت و منو بیشتر به این حوزه علاقه مند کرد.
Displaying 1 - 30 of 3,668 reviews

Can't find what you're looking for?

Get help and learn more about the design.