Jump to ratings and reviews
Rate this book

The Alignment Problem: Machine Learning and Human Values

Rate this book
A jaw-dropping exploration of everything that goes wrong when we build AI systems and the movement to fix them.

Today’s "machine-learning" systems, trained by data, are so effective that we’ve invited them to see and hear for us—and to make decisions on our behalf. But alarm bells are ringing. Recent years have seen an eruption of concern as the field of machine learning advances. When the systems we attempt to teach will not, in the end, do what we want or what we expect, ethical and potentially existential risks emerge. Researchers call this the alignment problem.

Systems cull résumés until, years later, we discover that they have inherent gender biases. Algorithms decide bail and parole—and appear to assess Black and white defendants differently. We can no longer assume that our mortgage application, or even our medical tests, will be seen by human eyes. And as autonomous vehicles share our streets, we are increasingly putting our lives in their hands.

The mathematical and computational models driving these changes range in complexity from something that can fit on a spreadsheet to a complex system that might credibly be called “artificial intelligence.” They are steadily replacing both human judgment and explicitly programmed software.

In best-selling author Brian Christian’s riveting account, we meet the alignment problem’s “first-responders,” and learn their ambitious plan to solve it before our hands are completely off the wheel. In a masterful blend of history and on-the ground reporting, Christian traces the explosive growth in the field of machine learning and surveys its current, sprawling frontier. Readers encounter a discipline finding its legs amid exhilarating and sometimes terrifying progress. Whether they—and we—succeed or fail in solving the alignment problem will be a defining human story.

The Alignment Problem offers an unflinching reckoning with humanity’s biases and blind spots, our own unstated assumptions and often contradictory goals. A dazzlingly interdisciplinary work, it takes a hard look not only at our technology but at our culture—and finds a story by turns harrowing and hopeful.

496 pages, Hardcover

First published October 6, 2020

Loading interface...
Loading interface...

About the author

Brian Christian

10 books829 followers
Brian Christian is the author of The Most Human Human, which was named a Wall Street Journal bestseller, a New York Times Editors’ Choice, and a New Yorker favorite book of the year. He is the author, with Tom Griffiths, of Algorithms to Live By, a #1 Audible bestseller, Amazon best science book of the year and MIT Technology Review best book of the year.

Christian’s writing has been translated into nineteen languages, and has appeared in The New Yorker, The Atlantic, Wired, The Wall Street Journal, The Guardian, The Paris Review, and in scientific journals such as Cognitive Science. Christian has been featured on The Daily Show with Jon Stewart, Radiolab, and The Charlie Rose Show, and has lectured at Google, Facebook, Microsoft, the Santa Fe Institute, and the London School of Economics. His work has won several awards, including fellowships at Yaddo and the MacDowell Colony, publication in Best American Science & Nature Writing, and an award from the Academy of American Poets.

Born in Wilmington, Delaware, Christian holds degrees in philosophy, computer science, and poetry from Brown University and the University of Washington. A Visiting Scholar at the University of California, Berkeley, the Director of Technology at McSweeney’s Publishing, and an active open-source contributor to projects such as Ruby on Rails, he lives in San Francisco.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
1,605 (53%)
4 stars
1,034 (34%)
3 stars
299 (9%)
2 stars
42 (1%)
1 star
13 (<1%)
Displaying 1 - 30 of 364 reviews
Profile Image for David Rubenstein.
822 reviews2,663 followers
January 24, 2022
The biggest problem in artificial intelligence (AI) is to devise a reward function that gives you the behavior you want, while avoiding side effects or unforseen consequences. This book examines the alignment problem from a number of fascinating perspectives.

This is a fascinating book, full of the implications of AI on philosophy, sociology, and psychology. There are interactions between AI and sociology, psychology in a two-way street. Our understanding of psychology helps to improve AI in numerous ways. Also, AI gives researchers many valuable insights into psychology, and issues in sociology. After all, we want automated algorithms to be unbiased, to be fair. But, who is to say exactly what is fair? Sometimes, the answer isn't easy.

The first problem, well known to workers in AI, is the inherent bias due to small training datasets. AI algorithms demonstrate bias, and can subtly perpetuate it. It seems like many of the biases are not the fault of the algorithms, but instead are a mirror of society and culture. In the 1950's, people tried to predict, using punch card machiles, which prisoners would succeed on parole. A ProPublica study was conducted of the accuracy of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions). COMPAS is used to predict whether an inmate, if released, would commit a violent or a nonviolent crime within 1-3 years. The algorithm was found to be biased against blacks; it overpredicts recidivism among blacks, and underpredicts for whites. A key factor is that it actually does not predict whether a released prisoner would commit a crime. It really predicts whether a released prisoner would be arrested and convicted for a crime. Higher rates of police profiling blacks lead to an inherent bias.

There is a US antidiscrimination law that prohibits certain attributes--like race and gender--from being used in machine-learning modes for hiring, criminal detentions, and so on. Nevertheless, other unprotected variables are correlated with race and gender, so the algorithms can still be discriminatory. In addition, the blocking of these attributes prevents or even mitigating the discrimination!

Predicting whether of not a patient with pneumonia should be hospitalized as an inpatient is problematic. Models predict that if a patient has a chest pain, or has heart disease, asthma, or is over 100, then the patient is less likely to die! The reason is that patients with these conditions automatically receive more care, so they are less likely to die.

Many problems in AI are solved by looking at psychology. For example, BF Skinner taught a pigeon how to bowl in a miniature alley through incremental steps. This led researchers to teach an algorithm to play difficult video games by rewarding incremental steps. Basically, great video games train you how to play. Similarly, neural networks learn language translation by starting with simple sentences before graduating to more difficult ones. This approach is similar to language learning by children. The book Bobby Fischer Teaches Chess uses a similar approach.

AI is not just about automating tasks, but how can we better understand human psychology. How can we best train ourselves/ First, we should use sparse rewards. Second, we should incentivize a state, not an action. In real life, we can use gamification as an approach to reinforcement learning. Studies of toddlers show that toys that seem to violate the laws of physics were most novel, and held the interest of six-year-olds for the longest time. Infants use violations of prior expectations as special opportunities for learning.

Psychologists have studied overimitation in children and chimpanzees. People learning a new task will learn best through imitation. Sometimes we imitate behaviors that are not relevant to a task. A toddler might overimitate if he cannot figure out why an adult is doing something, so he does it too. As it turns out, chimpanzees do not purposely overimitate. But children can understand whether an adult is teaching or simply experimenting. If an adult is experimenting, the child does not overimitate.

A fascinating chapter of imitation describes the problems encountered by the the first researchers in autonomous driving. Teaching an autonomous care in a video game to drive with imitation is best done by randomly alternating between human and machine drivers.

This book is fascinating on many levels. But it is not always an easy read. Some of the concepts are difficult, even subtle. It is such a pleasure to read a well-researched book that plumbs to the depths of a complicated subject.
Profile Image for Krzysztof.
89 reviews6 followers
April 1, 2021
There is a great book trapped inside this good book, waiting for a skillful editor to carve it out. The author did vast research in multiple domains and it seems like he could neither build a cohesive narration that could connect all of it nor leave anything out.

This book is probably the best intro to machine learning space for a non-engineer I've read. It presents its history, challenges, what can be done, and what can't be done (yet). It's both accessible and substantive, presenting complex ideas in a digestible form without dumbing them down. If you want to spark the ML interest in anyone who hasn't been paying attention to this field, give them this book. It provides a wide background connecting ML to neuroscience, cognitive science, psychology, ethics, and behavioral economics that will blow their mind.

It's also very detailed, screaming at the reader "I did the research, I went where no one else dared to go!". It will not only present you with an intriguing ML concept but also: trace its roots to XIX century farming problem or biology breakthrough, present all the scientist contributing to this research, explain how they met and got along, cite author's interviews with some of them, and present their life after they published their masterpiece, including completely unrelated information about their substance abuse and dark circumstances of their premature death. It's written quite well, so there might be an audience who enjoys this, but sadly I'm not a part of it.

If this book was structured to touch directly the subject of the alignment problem it would be at least 3 times shorter. It doesn't mean that 2/3 are bad - most of it is informative, some of it is entertaining, a lot seems like ML things that the author found interesting and just added to the book without any specific connection to its premise. I really liked the first few chapters where machine learning algorithms are presented as the first viable benchmark to the human thinking process and mental models that we build. Spoiler alert: it very clearly shows our flaws, biases, and lies that we tell ourselves (that are further embedded in ML models that we create and technology that uses them).

Overall, I enjoyed most of this book. I just feel a bit cheated by its title and premise, which advertise a different kind of book. This is the Machine Learning omnibus, presenting the most interesting scientific concepts of this field and the scientists behind them. If this is what you expect and need, you won't be disappointed!
Profile Image for Dan Elton.
36 reviews20 followers
July 5, 2021
A well researched book on AI safety written to be enjoyed by experts and newbies alike!

This book is the culmination of *four years* of dedicated work and interviews with over 100 world-class experts. The brilliant thing about this book is that it is so information dense and full of interesting anecdotes that people of any level of expertise stand to gain something from it. He’s carefully tuned it so a wide variety of people can enjoy it without getting bored or overwhelmed.

This book covers the well known problems of bias and brittleness in machine learning, including the following well-known cases - the Richard Caruana’s example of pneumonia triage system that went haywire, the COMPAS parole recommendation system, the Google Photos “gorilla” tag fiasco, word2vector gender bias, and the 2018 fatal Uber car crash in Tempe, Arizona. You’d be mistaken to think of this as just another book warning about data bias, lack of robustness, and the potential for discrimination and the perpetuation of inequalities, however.

Sprinkled between the warnings and calls for action are remarkably clear descriptions of modern machine learning techniques and how they relate and/or were inspired by recent developments in neuroscience, cognitive science, developmental psychology, and the social sciences. The author dives into the nitty gritty of how present day AI systems work and does not shy away from explaining current technical challenges.

The way he explains reinforcement learning and links it to research on the dopamine in the brain was one of the highlights of the book for me (I had forgotten how dopamine was linked to temporal difference error, and his description of the history of study on dopamine was fascinating). Not all of the concepts were new to me, but in every case the way he explained each concept was very new to me and wonderful to read. I learned new concepts too. For instance, I never understood what the difference between “on policy” and “off policy” RL systems was until I read his explanation. Other concepts I picked up were “cooperative reinforcement learning”, “shaping”, and various “impact metrics”. If you haven’t heard of these terms and are interested in AI safety, I heartily recommend this book.

This book follows a trend of seamlessly linking near term and far term AI safety concerns which has been a trend since the publication of Nick Bostrom’s 2014 meditation on far future AI, “Superintelligence”. The book is very “down to earth” -- you may be surprised that the standard arguments about why we should be concerned about long term AI risk that we’ve heard from Elon Musk, Sam Harris, etc are largely absent from this book (most notoriously, the “paperclip maximizer”). This is refreshing because those arguments draw on assumptions (such as fast takeoff) which are very hard to defend with empirical data or the current science on AI. (I still find those arguments convincing enough to warrant serious investment of resources to prevent risk, but they aren’t necessarily the best first arguments to present to someone) Instead the author follows an ingenious strategy - he starts with current problems in AI and some near future concerns (for instance with driverless cars driving off the road or home robots that refuse to be turned off.) Then, by providing sufficient technical background, he proceeds to explain why these are really hard problems, some of the solutions that are being worked on, and the limitations of the solutions proposed so far. The book is cautiously optimistic, showing how meaningful progress on the alignment problem is already occurring. So far the problems with AI that we are encountering *right now* appear tractable, which should motivate more people and resources to flow into AI Safety rather than trying to regulate progress to a standstill, which is impossible and likely to be harmful. At the same time, however, by the end of the book the reader will have a deep appreciation of the challenges ahead and the need for extreme caution as we move towards more and more intelligent and powerful AI.
Profile Image for aPriL does feral sometimes .
1,992 reviews459 followers
April 9, 2022
'The Alignment Problem: Machine Learning and Human Values' by Brian Christian is a very interesting overview about the issues in developing useful computing machines. I found it very comprehensive and yet easy to understand. However, it does give me pause in any fantasy I may have had over the Singularity occurring.

The main goal of machine learning is teaching the computer to see, hear and do things without human oversight, and to learn to categorize and make inferences on inputs like humans, and performing a job on the inputs similar to how the human brain functions. The amount and types of inputs necessary to think like a human being, well, ok, computers cannot be fed enough inputs, actually, because of severe limitations based on current hardware. Typically, inputs have to be identified first by an actual human, too, i.e., this is a cat, this is a shadow, this is a dress. Software has to be upgraded to make inferences, judgements, decisions. Which is why scientists are exploring machine learning instead. The computer will teach itself about what/who/why/where by identifying the inputs without help, and performing human-like brain processing on inputs. Theoretically.

Toddlers can do the job of learning about their environment and how to do social interaction (starting with what that is) and how to do a job and figure out actions and activities more quickly and comprehensively than any computer. Quantum computers might be the only hope of a computer thinking as good as a toddler. Meanwhile, computer scientists are making do with inventing new ways for programming machine learning on the computers we have today. The answer is having the computer program itself after starting with minimal basic programming.


I have copied the book blurb as it is accurate:

"Today’s “machine-learning” systems, trained by data, are so effective that we’ve invited them to see and hear for us—and to make decisions on our behalf. But alarm bells are ringing. Recent years have seen an eruption of concern as the field of machine learning advances. When the systems we attempt to teach will not, in the end, do what we want or what we expect, ethical and potentially existential risks emerge. Researchers call this the alignment problem.

Systems cull résumés until, years later, we discover that they have inherent gender biases. Algorithms decide bail and parole—and appear to assess Black and White defendants differently. We can no longer assume that our mortgage application, or even our medical tests, will be seen by human eyes. And as autonomous vehicles share our streets, we are increasingly putting our lives in their hands.

The mathematical and computational models driving these changes range in complexity from something that can fit on a spreadsheet to a complex system that might credibly be called “artificial intelligence.” They are steadily replacing both human judgment and explicitly programmed software.

In best-selling author Brian Christian’s riveting account, we meet the alignment problem’s “first-responders,” and learn their ambitious plan to solve it before our hands are completely off the wheel. In a masterful blend of history and on-the ground reporting, Christian traces the explosive growth in the field of machine learning and surveys its current, sprawling frontier. Readers encounter a discipline finding its legs amid exhilarating and sometimes terrifying progress. Whether they—and we—succeed or fail in solving the alignment problem will be a defining human story.

The Alignment Problem offers an unflinching reckoning with humanity’s biases and blind spots, our own unstated assumptions and often contradictory goals. A dazzlingly interdisciplinary work, it takes a hard look not only at our technology but at our culture—and finds a story by turns harrowing and hopeful."



Computer scientists and mathematicians are trying to get computers to not only be useful doing repetitive acts that bore people to do, and to do work more quickly, but to be useful the same a human brain is useful.

One of the first concepts I learned in studying programming thirty years ago is "Garbage In, Garbage Out." As I turned the last page of 'The Alignment Problem' I realized that that was still true of inputs. However, machine learning has added more garbage, as in output 💩.

The book shows how computer scientists have become more cognizant that simple if-then-else modules won't do at all. For the last 70 years, the needle has moved from programming the computers to do everything by an explicitly created program for a job, to programming computers to "teach" themselves how to do a job, like that of driving a car, or flying an airplane, or face recognition, or mortgage and job applicant assessments, or judging if a convicted offender will reoffend, etc. It is too difficult to program a computer with everything necessary to perform a complex job like the ones I mentioned. But after reading this book, I think teaching a computer to teach itself is very difficult too. It amplifies our own biases, for one example, as explained in this book.

Think about gender and race discrimination. It's not the programmers' fault computers are racists and misogynists. If most of the professional photos programmers input into computers are of white males, or of white males performing a job, like being a doctor or a scientist or a plumber, the computer will 'learn' scientists and doctors and plumbers are all white males - an obvious conclusion to a computer. Most professional photos of many workers in the professions ARE of white males, including politicians.

First, as described in the book, most of the computer scientists didn't see the issue of discrimination at all as the computer worked (problem one). When it was pointed out, they realized the self-teaching computer was a "black box" - they didn't know WHY it was teaching itself only white males were "good" for whatever was the job (problem two). The computer was teaching itself as it had been programmed to do, and so however the computer was doing it had become an invisible process to the scientists who were out of the loop of whatever the computer was doing to do the job (problem three).

Another issue of photos is until recently cameras were calibrated with a photo of a blue-eyed blond girl. ALL CAMERAS. Darker skin colors were completely ignored by manufacturers of cameras. The history of this is described in the book.

An issue about self-teaching computers is they clearly got the impression black people who've been in prison are sure to return to prison, based on statistics the computer was fed. Not only was the computer 'unaware' of black only neighborhoods (they don't know about segregated black and white neighborhoods), it didn't know black neighborhoods have generally a hell of a lot more police officers policing their neighborhoods and arresting black people far more than in white neighborhoods (white people have a lot fewer police policing them). Computers do not know about any of the other systemic issues - black people getting arrested for walking or driving because they are a black person, etc. A lot of black people get arrested and rearrested - that's all the computer knows. Once scientists became aware of how the computer was teaching itself from its inputs, they then had new problems -how to fix it?

Programming the computer to be blind to race and gender will not work, either. For example, women who have nine-month gaps in their work histories will be labeled as terrible employees without a gender tag and giving the computer instructions to ignore gaps in women's employment applications.

But in trying to resolve race and gender issues, a lot of ethical and political social issues come up -fairness is hard to program in a software when we humans can't get it right in the real world.

Since computers were being taught to teach themselves, how was it coming up with its answers? What was it 'looking' at? This was often hard to discover because once the computer began to teach itself it was a black box. But eventually programmers sometimes were able to figure it out through trial and error. For example, in one case, programmers were distressed to find the computer had decided shadows on the ground were more important instead of other objects in a photo, so it was giving answers based on the shadows. Or it was looking at measurement rulers as a key element in photos because some photos had a ruler next to the object that the computer was supposed to be looking at. If the photo had a ruler, it was good, regardless of the object it had been intended to judge and regardless of any other factors.

Computers have been giving erroneous answers to questions people thought it was answering correctly, and people didn't know it was outputting crap. These computers had taught themselves, using the beginning algorithms it had been programmed with, and were coming up with completely off-the-wall outputs. Some of these programs are being used still by many companies and government agencies and police departments today.

Christian is much more scientific and circumspect than me, gentle reader. My own outrage colors my review. Christian writes like the educated scientist he is.

From his Goodreads bio:

"Born in Wilmington, Delaware, Christian holds degrees in philosophy, computer science, and poetry from Brown University and the University of Washington. A Visiting Scholar at the University of California, Berkeley, the Director of Technology at McSweeney’s Publishing, and an active open-source contributor to projects such as Ruby on Rails, he lives in San Francisco."

To know what it is necessary to train a computer to use the same skillset we humans have, it has become necessary to involve specialists in psychology, sociology and philosophy to describe what skills we humans have in our braincases. The book includes the work of psychiatrists' tests on babies and toddlers that show some of the ways how the human brain functions. Philosophers are necessary because of the issues of morality. Sociologists are necessary to explain as best they can how and why of human behavior. These parts of the chapters are as fascinating as those describing how scientists are translating the art of being human to a computer!

So. Ok, then. Computer scientists are translating the work of psychiatrists, philosophers and sociologists on how the brain learns and other behaviors of people into machine-learning programs. This means a lot of what computer scientists are doing is translating biochemical brain responses (dopamine, serotonin) and electrical neuron-signaling into math. This is described in the book.

Machine learning is basically about the computer "earning" a +1 if it does good, or a -1 if it effs up - "rewards" and "demerits". This requires the necessity to tell the computer the parameters of earning a +1 or -1. And of course, when, or if, to stop.

There are, and were, a lot of funny outcomes due to the programmers' inability to foresee everything a computer needed as inputs to 'think', as well as the learning, a computer had to do for itself to resolve a problem. Algorithms have had to change from checking and working with every inputted detail, into being told to look for a more generalized thing and being guided by earning a +1 if they got a solution that was right or a -1 if they got it wrong. For example, finding a photo of a bicycle out of many photos of many objects without being told "this is a photo of a bicycle".

The chapters on game playing, which are a matter of earning points, had some hilarious outcomes because programmers neglected programming what winning the game was. Instead computers went into loops that never ended in order to wrack up points forever! +1, +1, +1, ....

There were other amazing challenges computer programmers conquered in teaching a computer to teach itself how to win at games, too. The book tells the story of computers winning over real human players at chess, Go, and even the Super Mario video games.

My conclusions? I sincerely think the answer to when a computer will 'feel happy' or have any feelings is basically: it will never happen. How would we program that? We don't even know exactly what the boundaries of Life are, much less how being alive starts. Secondly, a computer is only as accurate as its inputs - garbage in, garbage out. However, today, it's also about how it has 'taught' itself - the machine's IQ.

Omg.

The book has extensive Acknowledgements, Notes, Bibliography and Index sections - over a hundred pages for these sections! I recommend 'The Alignment Problem', but I think nerds will enjoy it most.
Profile Image for Tariq Mahmood.
Author 2 books1,052 followers
November 14, 2020
My AI's perception as a superior technology which should be embraced unquestionably almost reverentially was successfully challenged after going through the numerous examples in this book. By the end of the book, I was convinced that AI is better and will get even more efficient as compared to human ingenuity, but needs to be constantly tested for questioned, any AI system depends upon the quality of the training data and the type of algorithms employed to solve any problem.
Profile Image for Sebastian Gebski.
1,043 reviews1,021 followers
May 20, 2023
Uneven, maybe even very uneven.

It starts very well ("Prophecy") - the considerations regarding fairness & transparency are very good - maybe even the best I've seen in a written form. The second part ("Agency") is dedicated to an interesting problem (value functions in reinforcement learning loops) - I found it generally interesting but far too shallow (over-simplified) for my taste. The third part ("Normativity") is a natural follow-up. It dives into the role of imitation (how it could simplify/improve learning). That part is also quite interesting, but I was disappointed with the chapter I was mostly interested in - the one on uncertainty (think: hallucinations or ability to say "I don't know").

The book is quite good at describing the problems but doesn't do much when it comes to practical answers to those.

It's a good book on interesting topics, but not a must-read. 4.2 stars.
Profile Image for Morgan Blackledge.
696 reviews2,268 followers
October 11, 2023
GREAT BOOK.

A MUST READ!

Brian Christian’s RIVETING overview of ethical issues in artificial intelligence (AI) and machine learning (ML).

Moral Animals

The classic thought experiment in animal morality goes as follows: an invasive species of snake is released on an island with mice that have heretofore never encountered snakes as predators. The snakes make easy pray of the mice. And before too long the vulnerable mouse population on the island collapses and goes extinct.

The question is: are snakes moral or immoral?

Most ethicists agree that the answer is neither.

The snakes are simply behaving as they have evolved. As such, their predatory behaviors (and the unfortunate consequences) are neither moral or immoral.

But rather “a-moral”.

In other words, the snakes cannot meaningfully or productively be held accountable to ethical systems or moral standards. The snakes are simply behaving as their evolutionarily conditioned genetic programming dictates.

See mouse, kill mouse, eat mouse.

Survive and reproduce.

The ethical and moral responsibility is more appropriately and productively assigned to the humans that released the snakes into the fragile ecosystem. We probably can’t retrain snakes to be vegetarians. But reprogramming the human ethical/moral system to consider the ecological consequences of our behavior may actually prevent mass extinction (including our own).

Ethical Machines

A similar thought experiment in machine ethics goes as follows: If you set your thermostat to a sensible 68 degrees. And your pet snake dies of hypothermia. The question is: was the thermostat behavior moral or immoral?

And the obvious answer is the same.

No.

The thermostat is an a-moral agent.

The thermostat is simply behaving as programmed.

The ethical/moral responsibility is on the dumdum who set the thermostat too cold, and didn’t put one of those heater things in the snake tank.

Ok.

So far so good.

But now let’s crank it up.

What about a self driving car that is in a situation where a crash is unavoidable.

The car can either (a) steer into a pack of bicyclists and kill/maim 7-10 otherwise innocent people, or (b) drive off a cliff and kill the driver. If you said (b) you’re probably not alone. It’s better to kill one person rather than a baker’s dozen.

But what if you’re the driver?

Would you buy a car that was programmed to kill you to spare others? If not, how would you program the car?

Here’s another head scratcher.

What if you could create an algorithm that could predict who would be a good candidate for parole based on their crime records and demographics. What if it was (on average) a better predictor of recidivism than expert human judgment?

You would probably say yeas to that.

In fact, wouldn’t it be irresponsible not to?

Well what if that same algorithm was biased in a racist way?

Still want to use it?

Here’s one more.

What if there was a very useful AI that could diagnose better than a human doctor.

But it was a TOTAL mystery how it made its conclusions.

A black box as it were.

Now let’s say that same AI diagnosed you with a mysterious illness and recommended emergency surgery to remove some otherwise valued organ.

Like for instance, your breasts or your penis.

Let’s say that nobody could explain why the AI was recommending the surgery, or how it came to its diagnostic conclusions.

Would you feel comfortable doing the procedure?

These are ACTUAL, current day issues in AI/ML.

But wait.

There’s more.

What if you programmed an AI to manage a hospital in such a way as to maximize lives saved and minimize death.

Sounds reasonable right?

Well what if three people needed various organ transplants or they were going to die. And one healthy kid comes in for a check up. The AI could save three lives for the price of one if it killed the healthy kid and harvested his organs.

Sound good?

My guess is probably not.

So who would we hold accountable if an AI did something HORRIBLE like kill an innocent person just to harvest their organs?

You could say, who ever programmed it.

But what if another AI programmed the murdering AI.

Or what if the murdering AI was self programmed?

Then who’s responsible?

What if AI/ML could learn all about you and could manipulate you into buying whatever it wanted you to buy? Or voting for whoever it wanted you to you vote for?

Then what?

So.

What’s the solution?

How do we program AI/ML to align with our human values?

This is the alignment problem.

And (as if this writing) nobody fucking knows.

Let that sink in.

NOBODY

FUCKING

KNOWS

HOW

TO

MAKE

AI/ML

ALIGN

WITH

HUMAN

VALUES

Additionally, AI/ML is getting exponentially more powerful by the minute, at rates that are exceeding even the most irrationally exuberant predictions, with no slowdown in sight.

AND!

We’re ALREADY dependent on AI/ML for a TON of shit.

It’s ALMOST too late to turn back.

And if we did. We would lose the important tactical and economic advantages we currently enjoy and take for granted. And we would suffer TREMENDOUSLY as a result.

So we are BARRELING HEADLONG toward an event horizon where we are no longer in control of something that is WAY smarter than we are, and which is a total mystery, and which we are utterly dependent upon, and which other people might use to manipulate, dominate and maybe even kill us if we don’t stay competitive, and which at present has NO alignment with human values, and we have NO clue at all how to do about it, or even what that would look like if we did.

What could possibly go wrong 😑

Well.

You have to concede that SOMETHING could go wrong.

If you’re still not convinced.

Read this please read this book and tell me why/how.

5/5 ⭐️
Profile Image for Wick Welker.
Author 7 books482 followers
April 26, 2023
Teaching a child to understand the world.

I've read a few books like this but really enjoyed this one as it connected with the reader regardless of your experience in this field. I know very little about machine learning and AI and this book teaches in really simple ways how far the technology has progress and also goes into the detail of how incredibly complex and difficult it is to properly train AI. The crux of this book is that one of the main challenges is aligning human and machine values together. You can ask a machine to rack up as many points as possible in a boat racing game. What will happen is the machine just does loops around a pole with the boat because this is the fastest way to get points.

What we need to teach machines is that the actual task was to be done while completing a race. The value is in sticking to the rules while achieving the task. It is apparently way way more difficult to teach a machine along these line. It's similar to when you praise a child for sweeping the floor only to find out that the child is dumping the garbage can on the floor to sweep again, chasing the praise. The best route is to praise a certain state (a clean kitchen) rather than the specific tasks to achieve that state. Reading this book will help you understand the challenges. I found it very engrossing.
Profile Image for Rick Wilson.
806 reviews320 followers
April 20, 2022
It’s a good overview of a brief moment in technological advancement.

There’s a common thread in machine learning (AI, I'm going to use these terms interchangeably) research that “oh man we got to be really careful and think about how we set up these machines because they may end the world as we know it.” Thankfully this seems to be counterbalanced by the actual empirical research being done, which mostly seems like a lot of fun tricks. Similar to impressing people with your ability to open a jar by smashing it on the ground.

I love the new models coming out. As of April 2022, Open AI's DallE and GPT-3 models are super cool, (hell, I used their Davinci model to help me write a homework assignment last week) but computer “intelligence“ is intelligence the way a stick you found on the ground is like a forest. I’m sure it represents a tiny little part of it, and there’s some really cool stuff happening in the AI field right now, there’s a phenomenal convergence between computing power and new research methods, just a mind-boggling amount of funding, and a lot of brilliant people going into the field. But every time I read a book like this, I get the impression that “intelligence “is just brute force. It’s like breaking into a bank vault by unleashing a large nuclear explosive. Which is cool. But it’s not intelligence. And it’s not close to intelligence. And it always seems like the answer that these authors have is to dissect the wholeness of consciousness and human experience into constituent parts and then try to reconstruct the parts of the whole.

And that’s what this author does, compellingly. He breaks apart a lot of parts of human consciousness and thought and problem solving and then goes on to show how those have been deconstructed into machine learning algorithms. And I’m sure we can go back-and-forth with me saying that this isn’t intelligence, the author saying “ya ha,” and so on, but I find myself unconvinced that we are even on the right track. We are creating some really impressive tricks out of silicone chips, and the field is advancing it’s such a rapid state that it’s hard to keep up. But it seems like a combination of errors in that we don’t understand what’s happening anymore than we really understand ourselves. It’s like driving down a country road that says there’s a town in 10 miles. You drive on for what feels like 20 minutes, the town should be there, and then there’s another sign saying that the town is in 10 miles.


That said, this book was great. It’s a fascinating tour of the state of machine learning circa 2022. I feel like this field flips itself on its head every year, and in five years it will probably be quaint and mostly outdated. But for now I thought it was a great book. With the title “The Alignment Problem,” I thought I’d be a little more oriented towards Nick Bostrom type warnings about the dangers of AI. Instead it’s essentially a tour of an AI museum of modern machine learning models.

I thought it was well told and generally stays between the lines of speculation and hyperbole. There were some times when talking about evolutionarily psychology, I thought the author was getting a little off what my impression of modern research is. It seems like in psychology whenever we say “only humans can do this“ that thing is contradicted by some sort of niche exception almost immediately. Tool use, language, generosity. We think we are really special as humans and are so willing to come up with reasons why we are unique. I just haven’t typically seen that backed up in significant ways in replicable research. That doesn’t necessarily contradict the core of the book, but it’s becoming a pet peeve of mine. I do think the point the author is trying to make is that what separates us from say a reptile or bird is potentially what would separate us from, on the other side of the spectrum, AGI or some sort of intelligent computer. I’ll grant that, but I think there’s a better and more truthful way to portray it.

That said it this is a good book if you’re willing to get into the weeds of how modern AI is set up, the types of different structures a system can be assembled in, who did what where, and why we’ve been using those structures. It’s a fantastic overview and a strong aggregation of what I understand to be an up to date tour of the field.

Also, if you made it this far, here’s a treat (https://arxiv.org/abs/2204.06974)
Profile Image for Max.
70 reviews14 followers
January 7, 2021
Really nice introduction to AI & the alignment problem - Christian gives a great overview over some bigger trends in ML (e.g. curiosity, imitation learning, transparency) and the history of AI, often connecting it to insights from cognitive science, which really enriched the book, speaking as a human and cognitive scientist. I wonder what more refined thinkers on the future of AI think of the book*, but I found that it connects nicely to many of the looming challenges with building AI systems that are robust and whose workings will be appropriately aligned with human values. Even though similar in style and purpose, I found that it has little overlap with the recent The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World and Human Compatible: Artificial Intelligence and the Problem of Control. I expect this triple to contribute a lot to introducing more smart cookies to face this formidable challenge and heaving AI's longer-term developments to many agendas as a Serious Issue. So here's to hoping that the ongoing AI revolution will be less of a naively hopeful leap than I'm afraid it will be.

*Rohin Shah from the Alignment Newsletter [liked it a lot](https://www.lesswrong.com/posts/gYfgW...)
Profile Image for Rishabh Srivastava.
152 reviews191 followers
November 24, 2021
Strongly recommended if you're into Machine Learning. The first third of the book is accessible to all readers, but the rest of it is more enjoyable if you have some basic idea of how ML works.

Had some fascinating takeaways beyond machine learning that can be applied to decision making. My favorites were:

1. Simpler models tend to be the most generalizable. For example, when modeling the self-reported happiness of a couple, a simple metric (# of times they had sex - # of times they fought) was far more generally predictive than other, more complex indicators. More complex features can help predict things in a narrow domain better, but simpler features are more generalizable

2. Model attention and explainability is often more important than just predictive accuracy. Multitask networks with feature saliency and visualization techniques are great for understanding the features that a model considers important

3. We should strive to reward states of the world, rather than the actions of our agent (in reinforcement learning). Reward functions that are helpful in one environment (always eat as much sugar and fat as you can is good as a hunter gatherer) are harmful in another environment (modern humans)

4. In reinforcement learning, points have to be assigned in such a way that when you undo something, you know are “fined” the same amount of points as what you earned when did it. If not, your model will promote short term decision making

5. A novelty detection system that tells an agent that they’re in a new situation, and hence should have weak priors, improves the generalizability and performance of an agent . Also rewarding an agent for being wrong in surprising ways leads to better performance than just rewarding an agent when it’s right
Profile Image for Jessica Dai.
145 reviews60 followers
June 13, 2021
tldr worth a read !

Really solid overview of the research field that is typically referred to as "responsible AI" (fairness, explainability, deep learning, language models, RL) -- this book is therefore unique from other tech x society books in the sense that it is highly technical but also [I think] accessible, though I'm probably not the best person to judge that. I'd consider myself pretty familiar with the academic work that this book describes, but Christian packages a really nice story for the history of particular subfields/ lines of inquiry, and draws connections to e.g. psych/neuro, and I feel like I learned a lot.

My personal thought on e.g. putting a values-aligned lens on RL agents has always been that I have trouble drawing a line from the academic work to what this means in practice (as opposed to e.g. fairness or language models, where these are related to systems already in production and which are therefore already shaping/reshaping people's lives). I sort of wish this was made clearer! But also nitpicking lol.

Reboot review (not written by me) here.
Profile Image for Karl Robert.
2 reviews
February 13, 2021
Brilliant reading that covers numerous aspects concerning learning and teaching of both humans and programs, a bit of practical ethics and filosofy all woven together under one topic that is the development of machine learning programs. It demonstrates perfectly how in order to teach you must first understand the subject and how you learn more as you teach it to someone.
If you have any interest in AI, its safety and real ethical problems or the history of how machine learning has developed hand in hand with psychology, computer science, social sciences and neurology, this book is well worth a read.
Profile Image for Baal Of.
1,243 reviews60 followers
July 25, 2022
There are already dozens of excellent reviews summarizing the content of this book so there's no need for me to write anything. This book is important and useful for anyone who wants to get a fairly deep layman's understanding of the problems inherent with machine learning AI development. These problems are difficult, but it is extremly important that they be confronted head on since they can literaly be a matter of life and death. Christian has written an excellent book, one I think should be widely read.
Profile Image for Poorna Kumar.
24 reviews8 followers
June 11, 2022
Very nice! Superb technical writing and enjoyable (and I say this as someone who isn't particularly into science writing).

I was somewhat familiar with part 1 of the book (on fairness and transparency) from my work and studies, and can confidently say that the author has done a fabulous job of distilling the current understanding on these topics with nuance. This is a real feat when the subject is so complex. Even though I knew about these topics from before, the book still deepened my understanding and appreciation of them and put many results in perspective.

Parts 2 and 3 of the book, broadly around reinforcement learning, were fascinating and quite new to me. I enjoyed those parts as well, but not as deeply as Part 1, maybe because of my own ignorance/being new to the subject.

This book is carefully and comprehensively researched, and really well explained. It's hard to find something like this. If you care about machine learning, read this book.
Profile Image for Alexander Kutovyi.
25 reviews11 followers
November 14, 2021
This book is an excellent read for DS professionals and those just wondering about machine learning's origins, limitations, and prospects. There is nothing particularly mind-blowing or too technical. Still, some cases and stories backtracing the evolution of things one otherwise takes for granted nowadays are fabulous—many references to cognitive scientists, human biology and anthropology studies, which I loved the most. Worth reading indeed.
23 reviews31 followers
December 15, 2020
This is an EXCELLENT book about one of the most important problems of our times. I was already fairly familiar with the alignment problem and the technical side of things, but I still got a lot out of it, especially in the earlier sections about the history of AI and of reinforcement learning. I also really liked the deeper links he drew between reinforcement learning, and how we make decisions.

This book had the rare delight of being half about unfamiliar topics, and half about topics I knew well, yet doing justice to the topics I knew well. Christian has a gift for simplifying complex topics, using good examples, breaking things down intuitively, but keeping true to the core of the idea. He peppers the book with insights from personal interviews with people relevant to the story, and fills a page with names of technical reviewers of the book, and this clearly shows in the general accuracy and quality

This is now one of my go to books for people who want to understand the alignment problem, the historical context, and some paths to potential solutions.
Profile Image for Alex Railean.
265 reviews39 followers
May 4, 2021
This is an excellent book, it is like a survey paper written in very understandable terms.



ßßßßßßßßßßßßßßßßßßßß notes for personal üse

- word2vec example: doctor - man + woman = nurse
- and so it went, with many examples placing women in household contexts

- perceptron
- - bias in the camera itself, color calibration [could not adequately represent black people]
- Kodak employee and model, Shirley Page
- - "Shirley card" - the same principle applies to any data set used for training
- bias propagates easily now, by means of open source libraries or data sets that others reuse in their projects
- - orchestra audition behind a screen, to avoid bias; later the candidates were also instructed to remove shoes, because the sound of their walk would be used to infer gender, hence bias creeped back in
- redundant encoding - some trait that can be used to infer something else that we're trying to NOT use in our calculations (e. g. race, gender)

- fairness through blindness doesn't work


# transparency
A mountain of unstructured data is not transparency

- black box neural nets va decision trees. The latter is easy to understand and follow
- - story: asthmatic patients -> send them home, they are safe. This rule was produced by a machine learning algorithm. A human doctor would treat this as a critical problem and move the patient to ICU. they get better care, hence they have a much higher survival rate. The machine got it completely wrong, building a model that actually endangers vulnerable patients..
- idea: when a company uses black boxes to make judgments, the verdict must be signed by a human, who is then responsible for answering the "why so?", if needed.
- - bogsat modeling technique: bunch of guys sitting at a table
- animal detection vs bokeh detection, because most photos of wildlife have artistically blurred backgrounds
- - saliency: design a neural net that shows you which part of the image contributed to the result the most
- this is how the animal/bokeh detector was caught
- -


- multi-tasking TODO focus not only on the inputs but also on the outputs
- - deconvolution: visualize the intermediate layers of the neural network
- localization of training data : fire trucks in the USA are red, but in Canberra - neon yellow. Self driving cars trained in the USA might not recognize fire trucks elsewhere
- - todo: tcap method


## training
Credit assignment problem: answer the question "where did I go wrong?" (instead of just giving you a pass /fail verdict in the end)


Td-learning (temporal differences) : make intermediate predictions, learn from them, even before a game (or other process) ends, before the final score is available. This always converges to the optimum, if it can train long enough. The principle is to observe how predictions change over time)
It seems that this is the role played by dopamine in our systems: track the error in the expectations of future rewards (not rewards themselves, and not just reward predictions)


## x
Skinner's variable returns had the most effect: the reward will come, but after a variable number of iterations.
This pattern is also what keeps gamblers glued to their addiction.


Shaping: Reward behavior that at least somehow resembles thr desired one, in order to steer the subject towards the end goal. If you wait until the subject performs the desired action right away [in order to reward it], the moment might never come, or come much later. This is a "sparse reward", aka the "**sparsity problem"**.

Epsilon-greedy: be greedy [in terms of gathering points] most of the time, but occasionally try a variation for fewer points, doing something unusual.



Parenting: react promptly to a child's legitimate attention requests, and slower to the ones that are just seeking attention.

### Key ingredients for good shaping:
**a good curriculum**: start with simple problems and actions that prepare you for more complex, upcoming challenges

Reference to the Super Mario example: you learn to avoid mushrooms because they kill you - this happens at an early stage in the game, so you learn it fast. Then you have to learn that the big mushrooms are good and should not be avoided. That type of mushroom is introduced in a moment in the game where you don't have enough room to maneuver - so you learn about the good mushrooms at an early stage too.

Thus, a good curriculum plays a crucial role in one's learning experience. If the challenges are not properly calibrated, the learner may never stumble upon the good behavior on their own.


**Well-chosen incentives**. If you get it wrong, you fall into the trap of "rewarding A, while hoping to get B".

This often applies to management of companies and employees



Reward functions: reward states, not actions. Otherwise you end up with agents that find loopholes to get easy points (example: child that cleans the room, then throws everything back on the floor, to pick it up again)


Gamification - looks into the problem of finding how to find rewards for certain behaviors that bring humans closer to their goals.


# curiosity
This is what made it possible to make a breakthrough in "Montezuma's revenge", which is a serious case of the sparsity problem.



Compression: a better understood world is more concisely compressible. That is, you can express the underlying principles in an elegant way that makes sense. Thus one can use compressibility as a metric for understanding


## imitation and over-imitation
Reference to the experiment where human babies would imitate everything, including redundant moves, when opening a puzzle box. Other animals would skip the unnecessary part and get straight to the point.

Perhaps the ability to over-imitate is what is needed to bootstrap a curious and self-driven intelligence that doesn't depend too much on external rewards?

However, in a related experiment that probes whether the child is aware of the redundancy of that action it is established that they are. Therefore we come to another potential explanation: "I know the action is unnecessary, but I assume the other human also knows it, and yet does it anyway; probably they know something I don't, so I better do what they do".

In another variation of the experiment, there is an adult who uses a toy, and the baby observes. If the baby has reasons to believe that the adult is unfamiliar with the toy, then the child does NOT perform the redundant action. They only do it when they are aware of the fact that the adult has seen the toy before and is better fsmiliarized with it.



Knowing that a solution exists is sometimes a key factor in accomplishing something, or even accomplishing it more efficiently. Reference example: two climbers found a path to climb a geological formation in Yosemite Park (it is basically a flat wall). It took them 8 years to plan the path and come up with a strategy.
After this was done, another climber was able to do it after only a week of analysis.



**indirect normativity** - a way to align the system to our desire, without articulating every tiny detail of the expected result.


Learning by observing - A beginner watching an expert will not get the chance to see how the expert deals with "beginner mistakes", because the expert doesn't make such mistakes anymore. Thus, this will train a model that is not able to deal with basic issues, which is a major weakness of this approach.

**possiblism** - always do the best theoretically possible thing for the current situation. However, it might not be always feasible - for example, a beginner might know what needs to be done, in principle - but they have insufficient skill to do it right.

**actualism** - do what makes sense based on what you think will actually happen.

Example: you want someone to review your paper. You can give it to a super qualified professor, who is very busy, so you might not even get the review. But if you get it - it will be very thorough. Alternatively you can ask a less qualified colleague to look at it - you'll get feedback of a lower quality, but it will arrive in a short time.


### inverse reinforcement learning
Turn the matter around and ask: what is the reward?

Unlike a computer game, life is not easy. There is no obvious score. Suppose "walking" is a feature that was developed through reinforcement learning - in that case, what was the objective? What was being optimized?


### cross training
Switch roles, the trainer becomes the trainee (like in pair programming). This enables the trainer to learn something too

To-do: review this

### open-category problem
A neural network trained to identify which of the N classes a given object belongs to, will always choose one of the N, without considering that it could also be "none of the above".

În other words, it will give you an answer even if you provide trash at the input, and sometimes it will even be very confident in its verdict!

**Dropout** - run the same input data through the same network multiple times, but each time turn off a random part of the network. Then compare the results provided by this "ensemble of networks". This improves the quality of the output.

When there is no consensus, the system can say "I know that I don't know" and perhaps involve a human for further investigation.


**Corrigibility** - ability to intervene in the operation of an autonomous system and change parameters/goals/etc.





### concluding remarks

Certain types of errors are less serious than others (like in Onlite, not knowing the exact number of business partners is not really a big deal, you only need a rough estimate)
Profile Image for Tommy.
80 reviews10 followers
August 28, 2021
The Alignment Problem was phenomenal and I would highly recommend it to anyone who is even remotely interested in machine learning, how algorithms shape modern life, or even the parallels between psychology and artificial intelligence. My main background in AI is from an extensive article on Wait But Why, which explained much more of the future cases of what artificial general intelligence would mean for our society. The Alignment Problem, however, goes into the nuts and bolts of both the history and the current implementation—including successes as well as the multitude of pitfalls—of machine learning. Ultimately, this book gave me hope in the future of machine learning, not because AI itself is so cool, but because there are so many people working to make it ethical, just, and amazing.
We find ourselves at a fragile moment in history—where the power and flexibility of these models have made them irresistibly useful for a large number of commercial and public applications, and yet our standards and norms around how to use them appropriately are still nascent. (page 48)

I read this voraciously and enjoyed it so much that I think I might buy it so that I can reread it. I must also give the caveat that most of my reading of this book occurred in somewhat of a fugue state: sleep-deprived on a Greyhoud bus. Nonetheless, I still believe The Alignment Problemto be enthralling.

I absolutely loved the way that Christian writes, equally erudite and strikingly approachable. When there is a new topic that he wants the reader to learn about, he has a unique way of bringing it up that I found to be extremely effective. First, he describes an everyday situation, then he gives a formal definition the subject/topic/term, and finally he explains how it is relevant or its application in the real world. In essence, he invites the reader to build an intuition of a new topic, tells you that you kind of already know what this is—but he puts a new name to it—and then he shows you how it is quite a bit more amazing than you thought. I think more people ought to teach in this way; to me, this is near the Platonic ideal of how to teach.

Furthermore, it was quite clear that Christian did his research for The Alignment Problem. When he says that he did hundreds of interviews, I do not doubt him at all. I must also address my earlier comment about how this book is extremely approachable in its prose. Since a lot of this book was based not only on original research, but also relied heavily on personal interviews, Christian gave direct quotes of the way that people spoke (including their dialects/mannerisms of speaking) and also used syntactical tools such as ellipses to great effect.

I'll try not to gush too much more about this book, but I must also point out that I loved how much he integrated psychology into this book. He could almost write an entire book just on how our brains work and I would love it equally. Since this book was about machine learning and human values, Christian had to adequately address the latter portion of the subtitle, and boy did he deliver! I especially enjoyed the chapters on Imitation and Inference, where he described how we are trying to include human values in our AI either by—you guessed it—having the machines imitate us or infer what we are doing. Lengthy sections of the book spoke exclusively on neuroscience (such as how dopamine is a "reward chemical" based not on the reward itself, but actually on how reality differs from our expectation of the future).

Finally, I'll leave you with one of my favorite justifications about why you ought to learn more about this, from the conclusion, page 327
Increasingly, our interaction with almost any system involves a formal model of our own behavior.... What we have seen in this book is the power of these models, the ways they go wrong, and the ways we are trying to align them with our interest.
120 reviews6 followers
February 22, 2021
If you’re plugged into the artificial intelligence world, you’ll immediately recognize the title. The “alignment problem” in AI is ensuring that artificial agents’ goals align with the goals of humans. That’s not an easy problem to solve, as Christian details through countless examples. The “reward function” for AI programs is often misspecified.

Early in the book Christian tells the story of AI researcher Dario Amodei, who in 2016 was working on a general-purpose AI to play computer games and had gotten stuck on a boat race. Instead of trying to win the race, the AI was instead spinning the boat around in circles, forever. The problem turned out to be simple. The AI was optimized to maximize in-game "points" rather than directly trying to win; the researchers thought points were a decent approximation but instead the AI had found a part of the water where it could get power-ups forever, and just stayed there rather than trying to race.

The hardest part is that humans are not very good at articulating the reward function we want for our AI agents. We leave out important information — like “we actually want this boat to finish the race” — all the time.

Some of the most interesting parts of the book have nothing to do with alignment, per se, but instead chronicle the dramatic progress that deep learning, reinforcement learning, imitation learning, and other methods have made at improving AI performance — and the surprising parallels we’ve found between how they work and how the human brain works. The book keeps identifying moments where artificial neural networks are uncannily good at predicting how the literal neural network of the brain works ��� there’s a whole section on dopamine that’s particularly revealing.

As someone who identifies as an effective altruist and who has many EA friends (like my colleague Kelsey Piper) who count AI risk as one of the causes they care most about, I found the book incredibly useful as a crib sheet to get more up to date on what they’re talking about. It’s light on equations and heavy on clear examples. If I were to recommend one book to lay people to convince them to care more about the safety of the intelligent machines humans are building, it would be The Alignment Problem.

My only complaint is that the field moves fast enough that I could use regular Christian-y updates that de-mystify the latest developments.
Profile Image for David Steele.
483 reviews20 followers
June 30, 2023
This book changed my mind and taught me a thing or two along the way. I started out missing the point about the biases in the training data, thinking the author was making a point that the computers weren't 'woke' enough, and that the AI was providing accurate information about society that the developers would rather were not true. It took me a while to get my head around the actual problem. For example, the feedback loops created when a computer makes predictions about the world (based on incomplete or uncontextualized data) which result in decisions being made that lead to even more skewed results being fed back into the algorithm.
There were some fascinating stories and insights in this book. I particularly enjoyed the story of how early diffusion models were developed, and the chapter on motivation that discussed how A.I. game players were encouraged to think their way across different playing experiences was particularly engaging and lively.
There was no shortage of thought provoking philosophical and ethical questions, especially towards the end of the book when I really started to grasp the true implications about the need for uncertainty mechanisms and the fact that there's an important distinction between truth and consensus. As the author says (more or less) the problem isn't about A.I, but about the simplified models that we think it will find useful. It's easy for people to develop working models of the world around them because we can adopt, change and ignore them based on what happens in the real world. A.I. systems might not have the ability to make that switch as easily as we do.
Having stuck with this book to the end (I probably wouldn't have put the work in to the early chapters to get through a paper book, but this was on audio for me) I can absolutely get my head around Elon Musk's assertion that we need more than one model of "truth" for A.I. systems, and that no one organisation should ever have the monopoly on what we define as right and wrong.
There were some fun histories and narratives in this book, but as somebody who loves playing with ideas, I enjoyed the dialectic and taxonomical theory as much as anything.
Profile Image for Nora.
99 reviews
August 7, 2023
4/5
FINALLY finished this. I actually started in June 2022 and abandoned it. For someone with no knowledge of ML, this book is probably too dense and convoluted to get anything out of. There's a lot in there. After getting a bit more exposed and coming back to it, I was able to get all the way through. I get that Brian wanted to emphasize the different researchers and philosophers he interviewed, but the constant name dropping didn't do the book much good. Like, just tell me what they did. BUT this is also one of the books I have quoted the most in conversations about tech. The chapter on Fairness - the COMPAS model and the use of AI in criminal justice - was miles more interesting than the other ones. When fairness is mathematically impossible, where do we turn? This book is also pretty good if you want a background of the field of ML, not the pure technical stuff. Lots of history and connections to philosophy, neuroscience, psychology, etc. Just pick and choose your favorite ideas out of the million that are briefly mentioned in here and go down your own rabbit hole.
Profile Image for Fred Oliveira.
4 reviews3 followers
February 16, 2023
One of the best books - if not the actual best - on AI I've ever read. Perhaps a little dense at times, and potentially challenging for people who have never come across some of the topics. However, if you are in AI or a tangential industry – and one might argue that that's every industry, right now - this almost feels like required reading. Highly recommended.
Profile Image for Yubi.
57 reviews2 followers
March 8, 2023
Fascinating dive into how AI has developed over the last few decades and how we must be aware how our unconscious biases can impact a system.

Also I learned that human children will frequently over imitate even when the action is illogical which no other animal does.
Profile Image for Canyen Heimuli.
106 reviews
March 1, 2024
The way this book made zero sense to me before I understood neural networks and became so engrossing after I watched a very brief explainer video on neural networks 5/5 ⭐️
Profile Image for Jacob Williams.
512 reviews11 followers
October 16, 2021
"We are in danger of losing control of the world not to AI or to machines as such but to models."

This is full of interesting historical anecdotes (like that time William James kept a bunch of chickens in his basement to help out a student) and good high-level explanations of various approaches to machine learning.

Perhaps the most shocking issue discussed is how some US state justice systems used a model (called COMPAS) from a third-party provider for years to guide bail and sentencing decisions without doing any sort of validation of the efficacy or fairness of the model. Christian also gives compelling examples of how dangerous it can be to naively trust a model you don't understand, like the case where a pneumonia-diagnosis model was accurately predicting that some patients were less likely to die of pneumonia: it turned out the reason they were less likely to die is that they had extra health conditions which caused hospital staff to view them as higher-risk and give them additional care. So if the staff had started trusting the model's predictions instead, those patients would have likely been at even higher risk of dying than they were to begin with. Trying to act on the model's advice would have undermined the model's accuracy!

Still, although the description of this book on goodreads calls it a "jaw-dropping exploration of everything that goes wrong when we build AI systems", I found it to take a pretty measured attitude towards the problem, especially in parts 2 and 3. The general impression it left me with is that there are very smart people working hard on making AI safe, and that they've got some good ideas. The question, I guess, is whether society will listen when they urge caution, or if overeager deployment of stuff like COMPAS will be the norm.
Profile Image for LeastTorque.
784 reviews13 followers
June 13, 2023
Excellent interdisciplinary trip through the progress of A.I. over the years. This book might be an overly heavy slog for those who have no experience in the field. In fact, it’s unclear to me who the target audience really is. While the general public will get some insight into issues with A.I. encountered and yet to be fully dealt with, this is not the book I would recommend for that. This book is ever so much more rich if you are someone who could comfortably read the papers paraphrased here.

Anyhow, this was a real treat for me with many strolls down memory lane, so an extra star for that. It made up for the sometimes awkward writing and a seriously bad analogy example.
Profile Image for Maya Jacobs.
69 reviews2 followers
March 23, 2024
500 page BRICK that i read to get a job but it was actually really interesting
Profile Image for Abhi G.
5 reviews
January 18, 2024
Easy to digest introduction to the history of AI and the major problems that continue to pervade the field. I also really enjoyed the references to developments in psychology, neuroscience, and cognitive science as it helps contextualize the discoveries in AI. Would recommend for all readers regardless of technical knowledge.
1 review
July 12, 2021
This is a fantastic book for people who are interested in the ideas of machine learning in the past, present and near future. I have tried to read a few books on AI before but found them to be either too technical, too basic or too boring. Not this book.

The Alignment Problem is an engrossing and thought-provoking exploration of the development of machine learning and human learning intertwined. Through it all, the book stresses on the "Alignment Problem" - How can we develop AI to do what we want it to do without doing what we don't want it to do. Many types of AI are explored, all of them have their own advantages and drawbacks.

Several times throughout the book, I had to pause to share a particularly insightful though or idea with someone lest I forget what I just read. I encourage anyone who is interested in learning to read it now!
Displaying 1 - 30 of 364 reviews

Can't find what you're looking for?

Get help and learn more about the design.