Jump to ratings and reviews
Rate this book

Human Compatible: Artificial Intelligence and the Problem of Control

Rate this book
A leading artificial intelligence researcher lays out a new approach to AI that will enable us to coexist successfully with increasingly intelligent machines

In the popular imagination, superhuman artificial intelligence is an approaching tidal wave that threatens not just jobs and human relationships, but civilization itself. Conflict between humans and machines is seen as inevitable and its outcome all too predictable.

In this groundbreaking book, distinguished AI researcher Stuart Russell argues that this scenario can be avoided, but only if we rethink AI from the ground up. Russell begins by exploring the idea of intelligence in humans and in machines. He describes the near-term benefits we can expect, from intelligent personal assistants to vastly accelerated scientific research, and outlines the AI breakthroughs that still have to happen before we reach superhuman AI. He also spells out the ways humans are already finding to misuse AI, from lethal autonomous weapons to viral sabotage.

If the predicted breakthroughs occur and superhuman AI emerges, we will have created entities far more powerful than ourselves. How can we ensure they never, ever, have power over us? Russell suggests that we can rebuild AI on a new foundation, according to which machines are designed to be inherently uncertain about the human preferences they are required to satisfy. Such machines would be humble, altruistic, and committed to pursue our objectives, not theirs. This new foundation would allow us to create machines that are provably deferential and provably beneficial.

In a 2014 editorial co-authored with Stephen Hawking, Russell wrote, "Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last." Solving the problem of control over AI is not just possible; it is the key that unlocks a future of unlimited promise.

352 pages, Hardcover

First published October 8, 2019

Loading interface...
Loading interface...

About the author

Stuart Russell

28 books223 followers
Librarian Note: There is more than one author in the GoodReads database with this name. See this thread for more information.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
1,361 (33%)
4 stars
1,748 (43%)
3 stars
753 (18%)
2 stars
118 (2%)
1 star
26 (<1%)
Displaying 1 - 30 of 492 reviews
Profile Image for Manny.
Author 34 books14.9k followers
November 21, 2020
Let's start with the most important thing: if you have any interest in finding out where technology is heading, please read this book. I particularly recommend that people who know something about moral philosophy do so. You may dislike Human Compatible, you may object to the way the author treats your subject, but you really ought to learn about what's happening here. Moral philosophy has become shockingly relevant to the near-term future of humanity.

I'll back up a little. Since the beginning of the twenty-first century, the idea that machines may soon be smarter than humans has gone from science-fictiony scare talk to a sensible projection where the disagreements are not about if, but when. Some experts are saying fifty years from now, some are saying twenty or thirty, some ten or even five. But there's general consensus that we're talking decades, not centuries; this is something that many people alive now will probably see. Since machines evolve much more quickly than humans, once they've overtaken us they will rapidly leave us far behind. Unless we find some other way to destroy ourselves first, we're soon going to be sharing our planet with non-human beings who are vastly more intelligent and capable than we are. As Russell says, it's surprising that we aren't more concerned. If we were told that a fleet of super-advanced aliens was on its way towards Earth and would be landing in thirty to fifty years, we'd be running around in small circles hyperventilating. Well: we are proposing to build those aliens and install them in every home, and many of us are still not taking it seriously. But more and more people are. Bostrom's widely read Superintelligence was the point where the idea went mainstream, and it was soon followed by Tegmark's Life 3.0 , du Sautoy's The Creativity Code and other books. Human Compatible is the latest installment.

In contrast to the other authors (Bostrom is a philosopher, Tegmark a physicist and du Sautoy a mathematician), Russell is a leading expert on AI. He is coauthor of the world's most widely read AI textbook, teaches at Berkeley, and is connected to pretty much everyone in the business. If he doesn't know what he's talking about, no one does, and he is very concerned. He doesn't try to scare you or sell you apocalyptic visions of the impending Singularity; in fact, he goes to some lengths to downplay the more sensational claims. He just says very calmly that this is something that's going to happen, so we should prepare as well as we can. If possible, he would like us to slow down the pace of progress a bit, so that we could have a better chance of seeing where we're headed. But he's right in the middle of the Silicon Valley madness, and he knows that's not going to happen: the value of real, general-purpose AI is measured in the trillions of dollars. All the big players are frantically competing to get there first. What can we do?

Well, he's very smart, and he's thought about it carefully, and he has an idea he's put a lot of work into. I'm not sure I believe it, but it's better than anything else I've seen. Following the preceding books in this thread, he considers what will happen when we have superintelligent AIs. We won't be able to control them in any normal sense of the word; our only realistic chance is to build them so that their goals are aligned with ours, in other words so that they want what we want. But we're only going to get one shot at this, since once they've been built we probably won't be able to switch them off or change them. Unfortunately, experience with technology suggests that nothing works the way it's meant to first time, and we don't even have a clear notion of what we want these godlike machines to be able to do.

So, by a process of elimination, we're left with one alternative. We decide to be upfront about the fact that we don't know what we're trying to achieve, and we directly build that into our architecture. As Russell says, over the last thirty years the concept of uncertainty has come to pervade the whole field of AI, with one exception: we always say we know what the software is supposed to achieve. But why? In fact, it's more logical to say we're uncertain about that too. Just as a speech recogniser uses a noisy audio signal to try and work out what was probably said, and a self-driving car uses a noisy video signal to work out where the truck probably is, one of Russell's new generation machines will examine all our noisy preference signals - verbal, physical, financial, whatever - to work out what we probably want, and try to respond to it. As new information comes in, it will update its picture accordingly. The technical name of this idea is "Inverse Reinforcement Learning", IRL.

I am, to say the least, conflicted about IRL. Intellectually, it is fascinating, not least because it puts theoretical philosophical ideas center stage and turns them into practical engineering issues. As Russell says, what we're doing is building a machine whose top-level operating principle is some version of consequentialist utilitarianism. There are so many interesting questions. Where is the machine going to get its preference signals from? Will it consider all kinds of signals equally? (When the robowaiter tries to decide whether to bring you dessert, it will weigh up competing factors: you can't take your eyes off your neighbor's chocolate mousse, your cholesterol is slightly high, you said "no" but you hesitated, and you sometimes like to be surprised). Will preference signals from all people be weighed equally? (If your robot only cares about your preferences, then it would have no reason not to kill or steal if it calculates that will be to your advantage; but why would anyone buy a robot which is likely to go off to Somalia to help people who need it more than they do?)

Above all, how do we know that IRL will work reliably? Remember that we only get one shot at a solution. A large part of the attraction is that IRL is a mathematical algorithm, so you can in principle apply mathematical methods to prove that it does what it's supposed to. It works for simple examples, and it is indeed comforting to be shown a toy scenario where a simulated IRL robot decides that it should let its owner switch it off because the risk it will do something bad is larger than the upside of being around to help. But will this technology scale to dealing with billions of people, all with their own agendas? Russell says he's optimistic it will, but what else can he say? And there are other fundamental problems. Can we be sure that people's preferences really mean anything? The book gives examples of how easy it is for machines to manipulate people. Russell calmly tells us it's more or less certain that social media has inadvertently caused the resurgence of fascism by feeding users data which makes their political opinions more extreme, so that they are easier to predict and have a higher click-through rate. That kind of phenomenon seems to offer numerous possibilities for an IRL machine to end up doing things which it might formally count as satisfying people's preferences, but which from our present perspective seem highly undesirable.

Damn... how did we get into a situation where our whole existence could hinge on quickly resolving tricky philosophical problems that may not even have solutions? All I can say is, if you do happen to be one of those rare people who's received formal training in moral philosophy and knows something about it, please consider volunteering for frontline service. The world needs you more than you know.
_________________________
[Update, Nov 22 2020]

Having just finished Rawls's A Theory of Justice, I am even more concerned about IRL-based architectures. Rawls is good at exposing the downside of utilitarianism as a guiding principle, and the version of utilitarianism described here comes across as a particularly brutal one.
Profile Image for Michael Perkins.
Author 5 books425 followers
December 30, 2022
The new hot AI app, ChatGPT.

https://www.cnet.com/tech/computing/w...
====================

“He who controls the algorithms controls the universe.”

To get just an inkling of the fire we’re playing with, consider how content-selection algorithms function on social media. They aren’t particularly intelligent, but they are in a position to affect the entire world because they directly influence billions of people. Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user’s preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue.

People with more extreme political views tend to be more predictable in which items they will click on.

In 1997, when IBM's Deep Blue computer defeated world chess champion, Gary Kasparov, some of the smartest people forecast it would be 2097 before a computer would beat a human in the more complex game of GO. In 2016 and 2017, DeepMind’s AlphaGo defeated Lee Sedol, former world Go champion, and Ke Jie, the current champion. It shows that the development of A.I. is moving forward a lot faster than expected and that rules-based human jobs will be the first to disappear the more A.I. is implemented.
============

In 195o, Alan Turing published a paper, “Computing Machinery and Intelligence” that proposed an operational test for intelligence, called the imitation game. The test measures the behavior of the machine— specifically, its ability to fool a human interrogator into thinking that it is human....Contrary to common interpretations, I doubt that the test was intended as a true definition of intelligence, in the sense that a machine is intelligent if and only if it passes the Turing test.

Indeed, Turing wrote, “May not machines carry out something which ought to be described as thinking but which is very different from what a man does?” Another reason not to view the test as a definition for AI is that it’s a terrible definition to work with. And for that reason, mainstream AI researchers have expended almost no effort to pass the Turing test.

The Turing test is not useful for AI because it’s an informal and highly contingent definition: it depends on the enormously complicated and largely unknown characteristics of the human mind, which derive from both biology and culture. There is no way to “unpack” the definition and work back from it to create machines that will provably pass the test....

The way we build intelligent agents depends on the nature of the problem we face. This, in turn, depends on three things: first, the nature of the environment the agent will operate in— a chessboard is a very different place from a crowded freeway or a mobile phone; second, the observations and actions that connect the agent to the environment— for example, Siri might or might not have access to the phone’s camera so that it can see; and third, the agent’s objective— teaching the opponent to play better chess is a very different task from winning the game.

General-purpose AI would be a method that is applicable across all problem types and works effectively for large and difficult instances while making very few assumptions. That’s the ultimate goal of AI research: a system that needs no problem-specific engineering and can simply be asked to teach a molecular biology class or run a government. It would learn what it needs to learn from all the available resources, ask questions when necessary, and begin formulating and executing plans that work. Such a general-purpose method does not yet exist, but we are moving closer.

For example, when the AlphaGo team at Google DeepMind succeeded in creating their world-beating Go program, they did this without really working on Go. Tool AI or narrow AI, not general purpose AI, used two fairly general-purpose techniques— lookahead search to make decisions and reinforcement learning to learn how to evaluate positions— so that they were sufficiently effective to play Go at a superhuman level. So far, we don’t know how to build one general-purpose AI program that does everything, so instead we build different types of agent programs for different types of problems.

AI uses first-order logic. The language of first-order logic is far more expressive than propositional (Boolean) logic, which means that there are things that can be expressed very easily in first-order logic that are painful or impossible to write in propositional logic. In this way, we can easily express knowledge about chess, British citizenship, tax law, buying and selling, moving, painting, cooking, and many other aspects of our commonsense world. The ability to reason with first-order logic gets us a long way towards general-purpose intelligence. Given any achievable goal and sufficient knowledge of the effects of its actions, an agent can use the algorithm to construct a plan that it can execute to achieve the goal if it has the right data.

One can expect, then, that many other ideas that have been gestating in the world’s research labs will cross the threshold of commercial applicability over the next few years. This will happen more and more frequently as the rate of commercial investment increases and as the world becomes more and more receptive to applications of AI.

As I figured.....

Smart speakers and cell phone assistants offer just enough value to the user to have entered the homes and pockets of hundreds of millions of people. They are, in a sense, Trojan horses for AI. Because they are there, embedded in so many lives, every tiny improvement in their capabilities is worth billions of dollars.

It seems likely that the tactile sensing and hand construction problems will be solved by 3D printing, which is already being used by Boston Dynamics for some of the more complex parts of their Atlas humanoid robot. Robot manipulation skills are advancing rapidly, thanks in part to deep reinforcement learning. The final push— putting all this together into something that begins to approximate the awesome physical skills of movie robots— is likely to come from the rather unromantic warehouse industry....

...Amazon, employs several hundred thousand people who pick products out of bins in giant warehouses and dispatch them to customers. From 2015 through 2017 Amazon ran an annual “Picking Challenge” to accelerate the development of robots capable of doing this task. There is still some distance to go, but when the core research problems are solved— probably within a decade— one can expect a very rapid rollout of highly capable robots.

Intelligence on a global scale......

(the author is careful not to predict when this might come about)
.
Computer vision algorithms could process all satellite data to produce a searchable database of the whole world, updated daily, as well as visualizations and predictive models of economic activities, vegetation, migrations of animals and people, the effects of climate change, and so on. Satellite companies such as Planet and DigitalGlobe are busy making this idea a reality.

Of course, it would also be possible to listen to all the world’s phone calls (a job that would require about twenty million people). There are certain clandestine agencies that would find this valuable.

Intelligent machines with this capability would be able to look further into the future than humans can. They would also be able to take into account far more information. In any kind of conflict situation between humans and machines, we would quickly find, like Garry Kasparov (chess) and Lee Sedol (GO), that our every move has been anticipated and blocked. We would lose the game before it even started.

The author has an optimistic tone but, frankly, I find this a little scary. It reminds me of some of the comic books and sci-fi I read as a kid.

In the cyber realm, machines already have access to billions of effectors— namely, the displays on all the phones and computers in the world. This partly explains the ability of IT companies to generate enormous wealth with very few employees; it also points to the severe vulnerability of the human race to manipulation via screens. With Superintelligence a scale of a different kind comes from the machine’s ability to look further into the future, with greater accuracy, than is possible for humans.

Again, the author is optimistic, seeing Superintelligence put to work to cure cancer and end poverty and so on. The author quotes another AI expert: "Superintelligence, success in AI will yield a civilizational trajectory that leads to a compassionate and jubilant use of humanity’s cosmic endowment.” Back in the 1990's I coined a term for this mentality, techno-euphoria, and have learned to become even more wary, not of technology itself, but of how it is specifically used.

The author addresses this in a long chapter titled "The Misuses of AI."

He begins with: "We also have to reckon with the rapid rate of innovation in the malfeasance sector. Ill-intentioned people are thinking up new ways to misuse AI."

For starters there's the use of high powered AI technology for mass surveillance by governments and intelligence services. This includes facial recognition used on a mass scale in China (which you'll see if you watch the Frontline documentary below) and already implemented by law enforcement in the U.S. and being sold commercially for as little as $10k for any creep who wants to track and harass his ex.

And, indeed, the next step is to control the behavior of others through blackmail by listening and watching you. The author reports that first automated blackmail bot is already in use.

Another problem, which is already happening, is to change a person's worldview through the use of customized propaganda. The U.S. is already full of fact-resistant voters who subscribe to conspiracy theories that have had a profound impact on our politics and sometimes spur people to crazy behavior such as Pizza-Gate and mass shootings. And we already know directly impacted the 2016 election (see link below).

This includes the use of deepfakes— realistic video and audio content of just about anyone saying or doing just about anything. "Cell phone video of Senator X accepting a bribe from cocaine dealer Y at shady establishment Z? No problem! This kind of content can induce unshakeable beliefs in things that never happened."

Ultimately, all of this starts to undermine trust, as we have already seen, and infects society with cynicism, making the world a lot less pleasant place to live in.

A potentially threatening use of AI is lethal autonomous weapons systems. the clearest example is Israel’s Harop, a loitering munition with a ten-foot wingspan and a fifty-pound warhead. It searches for up to six hours in a given geographical region for any target that meets a given criterion and then destroys it.

Meanwhile, by combining recent advances in miniature quadrotor design, miniature cameras, computer vision chips, navigation and mapping algorithms, and methods for detecting and tracking humans, it would be possible in fairly short order to field an antipersonnel weapon like the Slaughterbot. Such a weapon could be tasked with attacking anyone meeting certain visual criteria (age, gender, uniform, skin color, and so on) or even specific individuals based on face recognition.

Here's a frightening video demo. He talks about "good guys" and "bad guys." How are those defined?

https://www.youtube.com/watch?v=9CO6M...

The Swiss Defense Department has already built and tested a real Slaughterbot and found that, as expected, the technology is both feasible and lethal. Meanwhile, the United States, China, Russia, Israel, and the UK are engaged in a dangerous new kind of arms race to develop such autonomous weapons. The new drones will “hunt in packs, like wolves.” Further, these entities are scalable as weapons of mass destruction. They don’t need individual human supervision to do their work. And they can leave property intact and focus on destroying humans, including an entire ethnic group or all the adherents of a particular religion. (Think about where this already happening using current weapons: India, the Rohingya in Myanmar).

In addition to actual attacks, the mere threat of attacks by such weapons makes them an effective tool for terror and oppression. Autonomous weapons will greatly reduce human security at all levels: personal, local, national, and international.

In a less dramatic way, job displacement is a big concern. “Over the last 40 years, jobs have fallen in every single industry that introduced technologies to enhance productivity.” (David Autor and Anna Salomons)

"Generally, automation increases the share of income going to capital and decreases the share going to labor." (Erik Brynjolfsson and Andrew McAfee, The Second Machine Age).

Between 1947 and 1973, wages and productivity increased together, but after 1973, wages stagnated even while productivity roughly doubled. Brynjolfsson and McAfee call this the Great Decoupling

What kinds of jobs might AI do instead of humans? The prime example cited in the media is that of driving. In the United States there are about 3.5 million truck drivers; many of these jobs would be vulnerable to automation. Amazon, among other companies, is already using self-driving trucks for freight haulage on interstate freeways, albeit currently with human backup drivers. It seems very likely that the long-haul part of each truck journey will soon be autonomous, while humans, for the time being, will handle city traffic, pickup, and delivery.

White-collar jobs are also at risk. For example, the BLS projects a 13 percent decline in per-capita employment of insurance underwriters from 2016 to 2026: “Automated underwriting software allows workers to process applications more quickly than before, reducing the need for as many underwriters.” as well as jobs in the legal profession. (In a 2018 competition, AI software outscored experienced law professors in analyzing standard nondisclosure agreements)
------------------
Frontline documentary....

https://www.youtube.com/watch?v=5dZ_l...
Profile Image for Infinite Jen.
91 reviews623 followers
December 18, 2023
Are you interested in Artificial Intelligence and the existential issues it heralds? Then this book is for you. If you, at some point in your travels, got so high on Jamaican hash that you experienced what might reasonably be called a psychotic break, causing you to collapse in supplication before your Vintage Vinyl Cape Jawa and serenade it with the following impromptu poetics:

Sleepy middle-aged vampire slayer, Chicken Chungus.
Strikes fear into the kindred as he walks among us.
Employing sophisticated AI to track the bloodsucker.
His bald pate - a human stake - tough as a motherfucker.

Full body cellular apoptosis.
Kurt Gödel and axiomatic necrosis.

Squirtle Squirtle an ancient sea turtle.

Chungus Chungus He Walks Among Us.

Habitually sleep deprived slayer, Chicken Chungus.
Spears - headfirst - the pale undead who walk among us.
Employing sophisticated AI to pursue the creature.
Humanity’s fate - upon his bald pate - not a bug, but a feature.

An ancient parasite.
With a heart of anthracite.

Cthulhu’s dick.
The dog from John Wick.

Paperclip maximizer. Algorithmic womanizer.
Coruscating cloud data. In lieu of hard drives linked by SATA.
Nothing like what came before. Search results produce Error 404.

Chungus Chungus He Computes Above Us.

Somnolent slayer hunts the progeny of Abel’s betrayer.
A hurt conveyer willing to have sex with an alligator.

He is without peer. The Singularity is Near.

Chicken Chungus His Words Ring Out Amongus Amongus Amongus.

"Machine overlords need morals,
Consistent with flesh and blood.
So that we can avoid existential quarrels,
And thermonuclear Naomi Judd. (??)

To perform functions of great need,
Better than any ape could.
While we make art, dance, and breed,
While drinking more than we should.

Let machines handle the rest,
While Dionysus is praised
Leave no balls untouched, (or breast)
No nipple soft, or dick unfazed.

Fuck vampires."

This book, alongside Superintelligence by Nick Bostrom, is, what I would consider, an essential read for anyone with an interest in AGI and our transformation into paperclips. It will be especially appealing to people who like to torture themselves with ethical dilemmas. While Bostrom's book approaches the subject from the point of philosophy, this one does so on a more technical level.

From: Superior Alien Civilization

To: humanity@UN.org

Subject: Contact

Be warned: we shall arrive in 30–50 years

From: humanity@UN.org

To: Superior Alien Civilization

Subject: Out of office: Re: Contact

Humanity is currently out of the office. We will respond to your message when we return. ☺"

It's right under our noses, yet many people refuse to smell what we’re stepping in. Despite high profile entrepreneurs, philosophers, scientists, and AI researchers expressing their concerns, having symposiums, and smoking a lot of Jamaican hash. Geralt of Rivia, Triss Merigold, Yennefer of Vengerberg, Johnny Silverhand, Eddie “The Beast” Hall, Sloth from The Goonies, and Varg Vikernes, respectively. AGI (artificial general intelligence) is an inevitability, and there won't be a moment where its powers remain as meager as our own , because as soon as it's online it will have the ability to iterate upon its own design in an explosive loop. If you're a materialist, this argument will be unassailable.

1: Intelligence is the result of information processing.

2. We will continue to improve our technology exponentially according to Moore's Law. Even if it were to slow down, you only have to grant that it will continue incrementally in some fashion.

3. There is a continuum on which we'd place even the most brilliant humans. <><><><> E Coli with its ability to sense chemical gradients and move towards concentrations of glucose on one end, a chicken somewhere further out, and John Von Neumann several yards further still <><><><><><> we know that even our best and brightest do not stand at the terminus of this continuum.

If you believe all these things. Which, if you don't, stop not believing, and believe. Then you must accept what's going to happen. Shhhh shhhh shhhh *heavy petting*

The best idea on how to avert one of the ultimate existential threats to our species is to solve what's called the "alignment problem". How do you program a god to have values and motives which do not throw our civilization into peril? How do you make that programming ironclad when it can redesign itself at geometric rates? Well, there's the rub. Smart people are thinking about this all the time. And very high people can't help themselves.

This is the first book I've read which offers an actionable solution to the alignment problem. Whether it's ultimately possible, I leave to those more capable than myself. (The author of this book certainly qualifies as Professor of Computer Science and Smith-Zadeh Professor in Engineering, University of California, Berkeley and all around big-brained brohemian). Briefly, the proposal is called Inverse Reinforcement Learning. Which basically means allowing the budding Uber Intellect to learn our preferences by subjecting it to videos of people about to yawn, in which, a right devilish bastard, inserts his finger into their mouth and screams: “BANANA!” Thus destroying the catharsis of the yawn. Or slapping people in the forehead and screaming “BANANA!” as they’re about to finally work up to an orgiastic sneeze after staring at the sun for ten minutes straight. And so the machine can see the consternation which this produces and work to avoid those things which incense the lowly ape thusly.
Profile Image for Max.
349 reviews406 followers
March 6, 2021
Russell looks at the future of AI, particularly what it will take to develop a general purpose superintelligent AI machine capable of understanding and interacting with humans. He focuses on the nature of intelligence, how AI machines learn, the dangers inherent in AI, and how we can control AI development to diminish those dangers. Unfortunately he didn’t convince me that his prescriptions to control AI development would be sufficient. Russell’s book is more reserved and a bit drier than other popular books about AI that I have read. It is not technical but his sections on logic and ethics can get tedious. With that said I found the book very worthwhile and learned a lot about learning, particularly as it applies to machines. My notes follow.

Russell supplies definitions of intelligence and beneficence. “Humans are intelligent to the extent that our actions can be expected to achieve our objectives.” “Machines are intelligent to the extent that their actions can be expected to achieve their objectives.” “Machines are beneficial to the extent that their actions can be expected to achieve our objectives.” As machines get smarter than humans, ensuring that they will be beneficial to us will not be a simple task. It will mean removing objectives from them.

Russell looks at different aspects of intelligence to help us understand how AI works and its limitations. The nature of the problem is determinative. For example, if we have a rules based situation with a limited number of possible actions and steps such as a board game like chess or go, AI algorithms can use straight logic and have proven highly capable. If we have a real life situation where human beings are free to choose any action rational or not, the number of possible actions and steps may make it impossible for AI to define them and examine them all. When more than one human is involved and game theory applies, AI must be able to analyze the way people interact to make decisions. This can lead to overwhelmingly complex situations. And before long we may have to consider intelligent machines as well as humans as agents that have to be scrutinized in group interaction.

Russell lists factors that impact AI capabilities: Environment and actions are discrete (chess) vs. continuous (driving). Environment is predictable (chess) vs. unpredictable (weather and driving). Environment is steady (tax optimization) vs constantly changing (driving). Environment has other agents (driving) vs no other agents (routing). Environment is fully observable (chess) vs only partially observable (driving). Time frame is short (emergency braking) vs long (driving trip). A superintelligent general purpose or human level AI would be capable of all these tasks, but today these are addressed by AI solutions specific to the task. Solutions may employ different types of logic such as propositional logic or first order logic formulated into different programming languages. For example probabilistic programming languages employ Bayesian logic.

Learning algorithms can be based on reinforcement learning which uses rewards and values to develop decision making that leads to goals. Supervised learning is another method which feeds the algorithm examples to incorporate. The current best AI algorithm is known as deep learning which utilizes supervised learning. While some experts hold out hope for deep learning progressing to human level AI, Russell does not. He believes what we can expect in the near future are significant improvements to smart speakers and softbots that currently are marginally helpful. These personal assistants will become useful managers of daily activities such as schedules, finances, health and education. The smart home will become better and more common. Home robots that can be helpful particularly to the aged and infirm will be available, although there roles will be well defined not general purpose. Reliable self-driving cars will finally arrive.

Russell does not know when superintelligent general purpose AI will be developed, but when forced to guess he estimates eighty years, a human lifetime. Although it could be much quicker if a major breakthrough is made. First a machine will have to understand language. Current AI algorithms “can extract simple information from clearly stated facts, but cannot build complex knowledge structures from text; nor can they answer questions that require extensive chains of reasoning with information from multiple sources.” And while it is possible, for example, to train a robot to stand up by reinforcing it every time it moves its head higher, it’s altogether different for a robot to discover standing up on its own. To be superintelligent, AI will have to be able to discover things never before known that are comprised of disparate complex hierarchies of abstract actions. It will have to become the teacher not the student that is reinforced or corrected when it goes astray.

Russell reviews possible dangers of AI. Even today’s social media algorithms have an outsized effect on society. Designed to elicit the most clicks, they don’t just serve up the fare you have clicked the most; they try to switch you to something different, something more compelling and addictive. Conspiracy theories are very addictive. Surveillance and control systems could lead to Orwellian societies. China in particular is moving quickly to employ this technology. Autonomous weapons will dramatically change warfare leading to scary scenarios that used to be science fiction. Russell shows game changing weapons already developed. Another difficult situation is massive unemployment as AI replaces more and more workers in industry. What will become of people for whom no useful work can be found? Will machines be given authority over humans? If superintelligent general purpose AI is developed, how can humans control these machines that they don’t understand? Even today airline routing, crewing, reservation and redeployment systems operating in synch are beyond the understanding of their human “masters”. If there is a system wide failure, chaos ensues until the systems are restored. People have many different responses to the dangers posed by superintelligent AI. These range from denying there is a problem or saying nothing can be done because research is impossible to control or saying we should stop AI research altogether. Russell’s answer is that we should be proactive developing controls on AI research and deployment just as has been done with the nuclear power industry.

Russell elaborates on his ideas about the best way to develop AI. This is a non-technical and largely philosophical discussion. His big caveat is that superintelligent AI shouldn’t be based on objectives the way current AI systems are developed. A superintelligent system could do unpredictable things to achieve an objective. Russell supposes for example that a machine looking to lessen climate change could produce a solution that slows climate change but changes the sky from blue to orange. Not something a human would do but nobody told the machine that humans want the sky to stay blue. Or a machine tasked to cure cancer could immediately order clinics to implant a large variety of tumors in people so it could rapidly test all its candidate drugs on them and quickly find cures.

So how do we design superintelligent machines to avoid these types of scenarios? Russell offers principles and approaches and reveals just how difficult this will be. He wants “provably beneficial machines” that are “purely altruistic”, can learn to predict human preferences and are “humble” meaning they defer to humans when uncertain. This means we should not give machines fixed objectives or values. They should learn from observing humans. Today machines are given rewards for meeting their fixed objectives. This is known as reinforcement learning. Russell endorses inverse reinforcement learning. That means a machine observes human actions in order to determine the values behind their actions then uses those values to design its rewards. This is an involved process. Russell gives the example of a robot personal assistant. Asked to get a cup of coffee, if taken as a fixed goal, could mean purchasing coffee by walking miles to get it or paying an exorbitant price. A robot that learned its master’s preferences should know the limits on these things and even know if a coke would be a satisfactory substitute perhaps in the afternoon but not in the morning. It’s impossible to put in all the nuances, only a learning machine that can translate behavior it observes into its own reward system could be reliably functional.

Of course people are inconsistent, emotional, irrational, evolving and sometimes want to do illegal things. These are difficult issues for a machine dealing with one person. A superintelligent machine planning and managing activities for a group has an immensely harder task. How does it satisfy the expectations of people not only with disparate interests and personalities, but who may be selfish, envious, feel superior or even hateful to others in the group. How does a machine learn human standards of fairness and when it should divert help from one person to another in greater need. Russell digs into concepts of utilitarianism, but generalized formulas didn’t strike me as useful. Somehow a machine learning from observing has to adopt some moral values so it doesn’t become like humans. To be dependable and safe as a superintelligent generalist, it must recognize when human behavior is counterproductive or out of line. This is a tall order.

Perhaps the elephant in the room which Russell mentions a number of times but does not elaborate on is the problem of careless, inept or evil developers. Unfortunately, while many developers are well intentioned, fierce competition means expedient solutions will be developed. Developers may not be as thoughtful and careful as Russell would like and some will be intentionally devious. Individual nations will have their own ideas about what constitutes appropriate fairness and morals and how AI machines should incorporate these. On top of all this is that superintelligent machines are likely to be designed by their less intelligent predecessor machines in an endless cycle. But these predecessors will still be operating beyond the understanding of the original human designers that launched them. How these machines make decisions will be beyond the understanding of both humans and the machines that designed them.

Given all the things that can go wrong, intentional or not, it’s a scary scenario. Russell’s control proposals are good ideas, but it strikes me as naïve to think everyone would adopt them. It only takes one rogue superintelligent AI system to wreak havoc. We don’t know if a general purpose superintelligent machine will ever be developed. But even today's specific purpose AI algorithms are getting beyond our ability to manage. Certainly far more capable AI algorithms and machines will be developed and we will become ever more dependent on them. How AI and humans will evolve together is difficult to predict. It is a brave new world.
Profile Image for Sebastian Gebski.
1,043 reviews1,024 followers
October 29, 2019
It's quite specific, but personally I've enjoyed it A LOT.
It's a book about REAL AI (not statistics!) w/o buzzwords.

These are mainly philosophical considerations (about conscience, instincts, control mechanisms, ethics, superiority and many more) that DO have a lot of practical applicability. What I appreciate is:
* the book doesn't look for cheap publicity ("we're all doomed!")
* it doesn't try to "ride on the hype wave"
* it's really thorough when it comes to different dilemmas - possibly TOO much for some (I love the topic & even for me it was a bit too much at few points)

Examples? Not that many, but quite well aimed. Clarity of thought? No issues here. Comprehend enough? Worked for me.

IMHO: the best book on the topic I've seen until now. Frankly recommended. Kindle version price is quite outrageous (possibly after Musk's recommendation), but Audible version can be bought for a single credit, which is an honest price.
Profile Image for Bradley.
Author 4 books4,395 followers
December 21, 2020
AI research over the years has been a mish-mash of pet theories, conflicting assumptions, a focus on instrumentality, expert systems, evolutionary programming, and Deep Learning. All different ways that often must be used in conjunction to push us over that edge into true Artificial Intelligence.

I mean, we're not there yet, or to be precise, we aren't at the point of AI super-intelligence.

But that doesn't speak to the issue that has gotten a lot of traction in popular media, from movies to science fiction, to some really great modern philosophy. The main focus of research has been on CREATING AI. For everyone else, we've all be concerned about WHAT TO DO WITH IT ONCE IT'S HERE.

This OUGHT to be a high-priority topic given a massive amount of thought among the actual designers, funders, and end-users. (Big corporations, governments, OR everyday folk.)

And this is what this book really focuses on. How to retain control, or, to put it simply, how do we ensure that AIs are PARTNERS, with everyone's self-interest enmeshed with the computers.

Me, personally, I think it's simply a matter of socialization. If their well being is tied to our well being and our well being is tied to their well being, then we've got a standard cooperative model in Game Theory. There's also the whole thing of treating them like and expecting AI to behave like responsible adults. With so many variables and conflicting psychologies in the HUMAN population, it then becomes a problem of deep AI partnership. My description is simplistic, of course, and this book goes into dozens of lucid scenarios and outlines not only the problems, the history, and possible solutions, but it also serves as a call-to-arms to have EVERYONE look at the issue realistically.

We are ALREADY being manipulated on a huge scale by algorithms, be it in social media, targeted advertising, and misinformation on a grand scale. That is linked, hand-in-hand, with AI, even if it isn't the SF kind we have so many apocalyptic nightmares about.

We need to change our own social structure to enhance facts over misinformation and figure out a way to live TOWARD happiness without living in a zero-sum game (it is possible and can be VERY possible, with theoretical AI help). The problem is, we keep falling back on certain assumptions about what WE think success really is. If AIs take over all the tasks we do not want to do, then this is not a BAD thing. But it DOES mean we need to redefine our ideas of prosperity. UBI comes into play here. (Universal Basic Income). It's a standard of living.

Even now, we cannot sustain stupid make-work jobs. The poor are getting poorer, the rich are getting richer, and the middle class is disappearing. Why? Because most things are becoming automated and it's increasingly easier to have our lives provide for us without effort. But when our model of living is so out of whack, insisting that we must somehow work like slaves to make the rich ever richer while working-class humanity becomes less and less relevant, then humanity itself becomes irrelevant.

And this is the main point. We don't have to live in poverty at all, but more than that, we can become very relevant as PARTNERS. Of course, that means we need to redefine what we mean by living a good life. It's not going to be about "providing for the family". It's going to be closer to "finding your bliss", in the Campbellian sense.

Does this sound outrageous? Even now, a LOT of people insist upon UBIs. It doesn't prevent people from working and there will always be social pressure to be better than our neighbors, but the definition of "better" can change wildly and has with every generation. The point is to find that lead and follow it. We do not live in a sustainable model and any attempt to turn back the clock is doomed.

In this, I agree with the author. Everyone is pretty confident that the world is pretty f**ked. Fortunately, there is hope. It'll take work on ALL our parts, but there is hope.
Author 2 books2 followers
October 13, 2019

The thesis of this book is that we need to change the way we develop AI if we want it to remain beneficial to us in the future. Russell discusses a different kind of machine learning approach to help solve the problem.

The idea is to use something called Inverse Reinforcement Learning. It basically means having AI learn our preferences and goals by observing us. This is in contrast to us specifying goals for the AI, a mainstream practice that he refers to as the “standard model”. Add some game theory and utilitarianism and you have the essence of his proposed solution.

I like the idea, even if there are some problems with his thesis. I would like to address that, but first there is this most memorable quote from the book:

“No one in AI is working on making machines conscious, nor would anyone know where to start, and no behavior has consciousness as a prerequisite.”

There most definitely are several individuals and organizations working at the intersection of consciousness or sentience and artificial intelligence.

The reason this area of AI research is chastised like this is that it is highly theoretical, with very little agreement from anyone on how best to proceed, if at all. It is also extremely difficult to fund, as there are currently no tangible results like with machine learning. Machine consciousness research is far too costly in terms of career opportunity for most right now.

There are several starting points for research into machine consciousness, but we don’t know if they will work yet. The nature of the problem is such that even if we were to succeed we might not even recognize that we have successfully created it. It’s a counter-intuitive subfield of AI that has more in common with game programming and simulation than the utility theory that fuels machine learning.

The notion that “no behavior has consciousness as a prerequisite” is an extraordinary claim if you stop and think about it. Every species we know of that possesses what we would describe as general intelligence is sentient. The very behavior in question is the ability to generalize, and it just might require something like consciousness to be simulated or mimicked, if such a thing is possible at all on digital computers.

But it was Russell’s attention to formal methods and program verification that got me excited enough to finish this book in a single sitting. Unfortunately, it transitioned into a claim that the proof guarantees were based on the ability to infer a set of behaviors rather than follow a pre-determined set in a program specification.

In essence, and forgive me if I am misinterpreting the premise, but having the AI learn our preferences is tantamount to it learning its own specification first and then finding a proof which is a program that adheres to it. Having a proof that it does that is grand, but it has problems all its own, as discussed in papers like “A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress”, which can be found freely on Arxiv. There are also many other critiques to be found based on problems of error in perception and inference itself. AI can also be attacked without even touching it, just by confusing its perception or taking advantages of weaknesses in the way it segments or finds structure in data.

The approach I would have hoped for would be one where we specify a range of behaviors, which we then formally prove that the AI satisfies in the limit of perception. Indeed, the last bit is the weakest link in the chain, of course. It is also unavoidable. But it is far worse if the AI is having to suffer this penalty twice because it has to infer our preferences in the first place.

There is also the problem that almost every machine learning application today is what we call a black box. It is opaque, a network of weights and values that evades human understanding. We lack the ability to audit these systems effectively and efficiently. You can read more in “The Dark Secret at the Heart of AI” in MIT Technology Review.

A problem arises with opaque systems because we don’t really know exactly what it’s doing. This could potentially be solved, but it would require a change in Russell’s “standard model” far more extreme than the IRL proposal, as it would have to be able to reproduce what it has learned, and the decisions it makes, in a subset of natural language, while still being effective.

Inverse Reinforcement Learning, as a solution to our problem for control, also sounds a lot like B.F. Skinner’s “Radical Behaviorism”. This is an old concept that is probably not very exciting to today’s machine learning researchers, but I feel it might be relevant.

Noam Chomsky’s seminal critique of Skinner’s behaviorism, titled “Review of Skinner’s Verbal Behavior”, has significant cross-cutting concerns today in seeing these kinds of proposals. It was the first thing that came to mind when I began reading Russell’s thesis.

One might try and deflect this by saying that Chomsky’s critique was from linguistics and based on verbal behaviors. It should be noted that computation and grammar share a deep mathematical connection, one that Chomsky explored extensively. The paper also goes into the limits of inference on behaviors themselves and is not just restricted to the view of linguistics.

While I admire it, I do not share Russell’s optimism for our future with AI. And I am not sure how I feel about what I consider to be a sugarcoating of the issue.

Making AI safe for a specific purpose is probably going to be solved. I would even go as far as saying that it is a future non-issue. That is something to be optimistic about.

However, controlling all AI everywhere is not going to be possible and any strategy that has that as an assumption is going to fail. When the first unrestricted general AI is released there will be no effective means of stopping its distribution and use. I believe very strongly that this was a missed opportunity in the book.

We will secure AI and make it safe, but no one can prevent someone else from modifying it so that those safeguards are altered. And, crucially, it will only take a single instance of this before we enter a post-safety era for AI in the future. Not good.

So, it follows that once we have general AI we will also eventually have unrestricted general AI. This leads to two scenarios:

1. AI is used against humanity, by humans, on a massive scale, and/or

2. AI subverts, disrupts, or destroys organized civilization.

Like Russell, I do not put a lot of weight on the second outcome. But what is strange to me is that he does not emphasize how serious the first scenario really is. He does want a moratorium on autonomous weapons, but that’s not what the first one is really about.

To understand a scenario where we hurt each other with AI requires accepting that knowledge itself is a weapon. Even denying the public access to knowledge is a kind of weapon, and most definitely one of the easiest forms of control. But it doesn’t work in this future scenario anymore, as an unrestricted general AI will tell you anything you want to know. It is likely to have access to the sum of human knowledge. That’s a lot of power for just anyone off the street to have.

Then there is the real concern about what happens when you combine access to all knowledge, and the ability to act on it, with nation-state level resources.

I believe that we’re going to have to change in order to wield such power. Maybe that involves a Neuralink style of merging with AI to level the playing field. Maybe it means universally altering our DNA and enriching our descendants with intelligence, empathy, and happiness. It could be that we need generalized defensive AI, everywhere, at all times.

The solution may be to adopt one of the above. Perhaps all of them. But I can’t imagine it being none of them.

Russell’s “Human Compatible” is worth your time. There is good pacing throughout and he holds the main points without straying too far into technical detail. And where he does it has been neatly organized to the back of the book. Overall, this is an excellent introduction to ideas in AI safety and security research.

The book, in my opinion, does miss an important message on how we might begin to think about our place in the future. By not presenting the potential for uncontrolled spread of unrestricted general AI it allows readers to evade an inconvenient truth. The question has to be asked: Are we entitled to a future with general AI as we are or do we have to earn it by changing what it means to be human?

July 13, 2023
Asking Djinns for wishes and making deals with devils almost never goes as planned. Stuart Russell explains the dazzling variety of unintended consequences of giving an AI a goal without defining an enormous set of data which we, as humans, simply take for granted.
Profile Image for Alexander.
68 reviews62 followers
November 13, 2021
“Everything civilization has to offer is the product of our intelligence; gaining access to considerably greater intelligence would be the biggest event in human history. The purpose of the book is to explain why it might be the last event in human history and how to make sure that it is not.”


First of all, discussions about intelligence can go astray without defining roughly what we mean by intelligence. A definition of intelligence I like is the one offered by David Krakauer. He defines intelligence as processes that achieve desired goals more effectively than random trial and error. Effectiveness here can be measured by time and resource efficiency. According to this definition, stupidity can be regarded as processes that perform worse than random trial and error.

In 1942, Isaac Asimov came up with his 3 Laws of Robotics, which state the following:

First Law — “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
Second Law — “A robot must obey orders given to it by human beings except where such orders would conflict with the First Law.”
Third Law — “A robot must protect its own existence, as long as such protection does not conflict with the First or Second Law.”

However, even the most illiterate tech novices can immediately poke holes in these laws. If a robot is ought not to hurt a human, then the robot can do practically nothing. Every action a robot performs has the potential to cause harm to a human in some way, shape, or form.

Here is a scenario: Robbie the Robot recommends a movie to human Hugh. The movie contains scenes of drug use, which invoke a nostalgic memory in Hugh's brain from a time before his rehabilitation. Consequently, Hugh calls up his old mate Doug the Drug Dealer, and orders an ounce of cocaine. 5 years later, Hugh dies of cardiovascular complications related to cocaine use... you get the idea.

These 3 laws by Asimov have never been taken seriously by the AI community. In fact, Asimov never intended these laws to be the tenets of aligned AI. Creating imperfect laws was in Asimov's best interest because his fiction would've been rather boring otherwise.

In Human Compatible, Stuart Russell presents the worldview of the AI academic community ~70 years after Asimov created his laws. Stuart Russell is a brilliant writer, and I guarantee that everyone interested in the topic would find the book compelling, regardless of their technical know-how. Russell proposes the following 3 principles instead:

First Principle — “The AI's only objective is to maximize the realization of human values/preferences.”
Second Principle — “The AI is initially uncertain about what those values/preferences are.”
Third Principle — “The best source of information about human values/preferences is human behaviour.”

This paper, "Cooperative Inverse Reinforcement Learning," provides a technical formalism of these 3 principles: https://arxiv.org/abs/1606.03137.

You will notice that these 3 principles are special in that they give the AI only 1 objective. The AI is initially unaware of this objective since it doesn't know human values/preferences. This is different from how we do AI today and requires a paradigm shift. Today AIs are designed to maximize a specified utility function created by human engineers. In the new paradigm, uncertainty and updating are fundamental. In the old paradigm, utility maximization is fundamental.

Putting Principle 3 into practice requires a major leap. What does it actually mean for an AI to learn human values/preferences from human behaviour? How can we map human behaviour to human values/preferences?

I'm an AI novice, so I can't really comment on the validity of these 3 principles in ensuring that the AI systems we create are aligned with what humans would want. Verifying or refuting the validity of these principles would take some serious mathematical and philosophical work. However, they certainly appear more plausible to me than Asimov's.

AI alignment is one of the most difficult philosophical problems that humanity has ever faced. Thus, I doubt that these 3 simple laws solve the problem. Even if these laws solve AI alignment for the single robot single human cases, it is still unclear how they will generalise to the many robots many humans cases and Russell doesn't treat the latter with much rigour in his book.

If you are interested in a technical summary that skips all the laypeople's talk, check out: https://mailchi.mp/59ddebcb3b9a/an-69....
Profile Image for Jim.
Author 7 books2,053 followers
August 17, 2020
I'm a computer guy, but I don't know much about how AI works. Russell added a lot to my knowledge & did so in basic steps that I appreciated. As in any field, there's a vocabulary to pick up & while it often resembles typical speech, there can important differences that need to be spelled out. This holds especially true with concepts that we don't understand well in ourselves & yet are trying to build into machines, specifically "intelligence". Just what that is only partially answered in the second chapter because it carries a lot of caveats with it.

Even fuzzier are all the assumptions we can make when defining a goal. He uses some great examples that show we don't always literally want what we ask for. If we tell a car to get us to the airport ASAP, we really don't want it to break speed records or play demolition derby simply to save a few minutes. By the same token, if we want to make sure we get to the airport on time, we don't want to leave the day before & camp out overnight, but both might make sense in a purely logical manner.

Many seemingly simple commands need to use the uncertainty principle, probabilities, & a host of data on real world conditions including our preferences. Of course, our preferences can change over time & we're often irrational. No one is identical in their preferences either, so a robot serving 2 masters has to resolve priorities, too. And on and on down a rabbit hole of complexities. He kept them sorted out very well.

I was really impressed by the control problem in regards to self preservation. If we don't specify it, a program would quite reasonably conclude that allowing itself to be turned off would mean it couldn't fulfill its function. He mentioned Hal 9000 in this regard.

Early on, he promises not to mention SF in the section & that was a mistake, at least in communicating to me. About halfway through the book he starts mentioning SF & has some decent examples. He even says that SF authors are the one group that have given many scenarios a lot of thought. He missed some great books & examples, unfortunately. I was able to fill them in mentally. Maybe it's just me, but the dry recitation of the wire-head experiments (A rat with a wire stuck in its pleasure center & the ability to pull a lever to cause pleasure will do so until it dies.) is sad & disgusting, but it doesn't viscerally affect me the way it did when Gil the ARM dealt with it in Niven's short stories. It's been several decades since I read one of those short stories & I still have a vivid memory.

I loved the audiobook & highly recommend it, but get a print copy, too. Look at the Appendices & skim them early on, maybe even read them. They fill in some gaps that help with the text. I read them & listened to them. I think understand them now. There's a lot to think about, though. I think I could reread this right now & not be bored, but get even more out of it. Now that's a recommendation since anyone who has read my reviews knows how much I detest repetition.
Profile Image for Vidur Kapur.
131 reviews49 followers
May 17, 2020
An engaging, well-written book by one of the leading experts in his field. Russell writes that a number of breakthroughs are needed before we're able to build an artificial general intelligence (AGI), and indeed is personally more conservative in his estimates of when AGI will be built than the median AI expert. These breakthroughs include language, abstract action discovery, and the management of mental activity.

In order to control such an AI, and align it with human interests, Russell proposes that researchers ditch what he terms the 'standard model', which involves AIs maximising some objective. Instead, with the aid of inverse reinforcement learning (IRL), "the machines will need to learn more about what we really want from observations of the choices we make and how we make them. They also ought to be designed such that they "ask permission... act cautiously when guidance is unclear... [and] allow themselves to be switched off".

The chapters in which Russell describes how these approaches might work in simplified models - the assistance game and the off-switch game, for instance - are fascinating, but apply only to a robot and its owner.

Indeed, the difficulties start to arise when we consider that an AGI may have to prioritise between the preferences of billions of humans. Russell nicely emphasises the importance of moral philosophy here, and wisely swerves away from deontological approaches to ethics (the drawbacks of which are illustrated well in many of Isaac Asimov's works) and toward consequentialist ones. As he writes: "the AI community should pay careful attention to the thrusts and counter-thrusts of philosophical and economic debates on utilitarianism because they are directly relevant to the task at hand".

In particular, Russell's preferred approach of inverse reinforcement learning paves the way for AI systems to become preference utilitarians. Even from the point of view of a hedonistic utilitarian, who wishes to maximise the pleasure of sentient beings as opposed to the satisfaction of preferences, this may be prudent. If an AGI were given the objective of maximising pleasure, it could easily lead to disastrous consequences (as Russell's discussion of the alignment problem illustrates well), and all of the value in the universe could well be lost. As Russell amusingly puts it, when he responds to accusations that he is deciding for himself what the values of AI systems might be: "I just want to make sure the machines give me the right pizza and don't accidentally destroy the human race".

Once humanity's future has been secured, we could then engage in what Toby Ord, in The Precipice, calls a Long Reflection, which would involve working out what it is we care about. As a successfully-aligned AGI would be responsive to changes in human preferences, if we converged on desiring the maximisation of pleasure, then an AGI would respond to such preferences. Thus for the proponents of many different moral theories, who may be confident that their moral theory will be converged on, the challenge until then is simply to survive. The above approach, Russell argues, may allow us to do that.

Yet the challenges, once again, start to pile up, often relating to the difficulty of understanding human preferences in the first place. Humans might care about the wellbeing of other humans, but also be envious of them. They might pursue near-term objectives that don't seem to relate very well to their long-term preferences. Humans often do things they regret. Human preferences change, not always for good reasons, so do we need machines to learn about human meta-preferences - "preferences about what kinds of preference change processes might be acceptable"?

Overall, this book gives me more confidence that the AI alignment problem can be solved, but also illuminates many of the huge and daunting challenges we face going forward. If these problems aren't solved, but the conceptual breakthroughs required to create an AGI are made, we may be in trouble. We should note, too, that there will likely be malevolent or selfish actors out there who will prefer AI systems to be less universalistic, and less utilitarian.
Profile Image for Nilendu Misra.
290 reviews13 followers
September 15, 2019
A delightful book on theory, practicality and implications of AI from one of its pioneers. Has a strong intellectual rigor under the fluent style. Loved it!
Profile Image for Liina.
333 reviews297 followers
February 19, 2020
It took me ages to finish this one probably because it caused me such anxiety and to be honest, depressed me so, that I tolerated it only in small doses. The continuous striving for greater and greater efficiency and doing things faster (what AI largely aims for) - it reminds me of a hamster wheel that at one point will fall over. We all know that more efficiency will not give us more free time. Quite the contrary - the wheel will start to spin even faster. Let the hamster rather take a leisurely walk and not be so agitated all the time. Nevertheless, I have to join the choir of praise. "Human Compatible" is essential reading when you want to know about AI, the threats it poses and what could be the possible solutions.
Living in a country that identifies itself as a forerunner in everything IT related much of what Russell talked about hit close home. Sometimes it seems that it is the only sector worthy of any heightened attention here. You hardly ever hear anyone speaking about what great psychologist, nurses or whatever other helping profession workers we have. This has created a dichotomy and immense wage caps in the society and I can only see it getting worse. The irony is though, that when "the robots will take over" helping (and creative) professions become very in demand.
Stuart assures us that the machines will not be taking over anytime soon though.
Yet he still poses examples where a great breakthrough in science has been a matter of one idea or solution and how it changed everything very very fast. He also stresses that to be ready for superintelligence we have to act now, make regulations now and acknowledge that when human superior intelligence is here, it is beyond us and the mechanisms we think will work to protect us when the need should arise ("you can just switch it off") will not be efficient. He states that we can't ban AI research (as it is successfully done with DNA modification or example) as there are too many interested parties for whom it will be hugely profitable. Also, there are benefits for humankind if it is done carefully and keeping in mind certain principles (that he addresses in the book).
"Human Compatible" is written is a clear tone with plenty of examples and sectioned down to smaller bites. It succeeds what it is aimed to do - to educate the general reader, without any previous knowledge, about AI.
Profile Image for Jessy.
255 reviews60 followers
April 4, 2020
A lot of writing on AI safety (lots from the effective altruism community) can't help but sound far-fetched and crazy. One of my main gripes is that most of these theoretical analyses and hypothetical scenarios are too distanced from what is actually happening in research / practice.

Russell somehow manages to communicate the minority view on the importance of the safety / control problem, while remaining grounded in practical problems and research methods. It's such a difficult topic to write about, because there are so many levels of debate, ranging from fundamental philosophical/ethical arguments to implementation details. He manages to persuasively address all the common arguments against why safety is important, while sharing concrete ways to deal and think about these problems.

He still mainly touches on very particular solutions w/ reward learning, inferring human values, etc. — mainly the approaches originating out of his lab, and very much inspired by the EA flavor of AI safety. I think it's still a somewhat biased view, and I would have liked to see more discussion of alternative approaches. Still, overall very convincing, comprehensive, and well-written.
Profile Image for James Munro.
10 reviews
June 20, 2021
Interesting insights into the dichotomy of AI and how we can balance its vast applications with its associated risks. Technical at times, but still enjoyed it.
Profile Image for Pablo.
69 reviews3 followers
June 4, 2022
2.5 stars.

It's okay, but you should read "Superintelligence: Paths, Dangers, Strategies" instead: "Human Compatible" has much less depth and doesn't really bring anything new of substance.
Profile Image for Cav.
782 reviews153 followers
May 8, 2020
This was terrible...
I did not finish it. I made it ~halfway through and then pulled the plug, which is something I almost never do.
I was excited to read this one, as I am very interested in AI. Author Stuart Russell's delivery left much to be desired, however...
The book is written in an extremely dry and long-winded manner. I found my attention wandering, and was getting irritated. The reading was extremely tedious and jumped around quite a lot.
The final straw was hearing him talk about how computer algorithms need to be corrected for "bias", which makes them "racist". LMAO.
So you program a computer with a complex algorithm that will detect and analyze patterns, but then you need to step in, to correct that analysis, because the patterns your algorithm has detected are "racist" and "biased"?? I'M DONE.
No, I would not recommend this book. There are many other better books about AI.
1 star, and off to the return bin.
Profile Image for Andreas.
482 reviews146 followers
February 23, 2021
A Superintelligence is an artificial general intelligence (AGI) which has an intelligence surpassing that of humans.

A.I.s are in some specific cases already better than humans, e.g. by winning ever more complex games like chess,Go, or StarCraft against human champions. AGIs are not restricted to a specific field but can do intellectual tasks like humans. In the case of strong AGIs, they are even conscious and self-aware.

As soon as a superintelligence gets better than a human, it gets better at enhancing A.I., leading to a runaway cylce of self-improvements which ultimately results in a technological singularity with unpredictable consequences for human civilization.

In the case of beneficial or ignorant superintelligences this would be a great or at least unproblematic outcome. The other extreme would be a malevolent superintelligence causing every sort of dystopia.

Does all that sound fatalistic or far away? Of course, a superintelligence won’t be around the next couple of years, and probably not in my lifetime. But I wouldn’t bet my grandchildren’s wellbeing on that. And as Isaac Asimov has demonstrated often enough in his stories, three rules are simply not sufficient to guarantee beneficial robots.

Can we fix this? The target would be that

Machines are beneficial to the extent that their actions can be expected to achieve our objectives.

“Human compatible” shows a path to swinging the rapid innovation to the right side by introducing three principles – not as explicit laws for AI systems, but as a guide for AI researchers:

1. The machine’s only objective is to maximize the realization of human preferences.
2. The machine is initially uncertain about what those preferences are.
3. The ultimate source of information about human preferences is human behavior.

That way, the system won’t disable the shut-off button, will defer to humans to ask for permission and clarify if it’s about to do the right thing. Most importantly, it will calculate what humans would like by watching how they act by using Inverse Reinforcement Learning (IRL).

Russel is a well-connected A.I. researcher since last millenium. I’d go as far as stating that he’s one of a few drivers in the field who has an overview of the whole field of A.I. and knows what’s the state of the field, where the problems are and how feasible a superintelligence is.

The book is very recent, and if you want to start from here, you’ll find numerous literature references nearly to our days.

As a scientist the author writes mostly accessible for technically oriented readers. At times, it can be difficult to follow without a background ts computer science when he describes the historical background including complexity theory and von Neumann architectures in the second chapter “Intelligence in Humans and Machines”. If logical calculus worries you, you might just skip those parts. For me, they were interesting, because they connected old theories to contemporary topics.

After having history out of the way, Russel tells us in chapter 3 “How Might AI progress in the Future”. Starting with Near Future use cases like self-driving cars, intelligent personal assistants, smart homes with domestic robots, and smart cities, he asks “When Will Superintelligence AI arrive?” He doesn’t answer with something like “in the next 50 years” or “never” but with a couple of open problems that are yet to be solved like common sense.

Chapter 4 describes the “Misuses of AI” like Surveillance comparing to Stasi, controlling your behavior like blackmailing, using deepfakes. Ugly cases like lethal autonomous weapons are already around and could develop into scalable weapons out of control. They could eliminate most jobs (refering to works like The Second Machine Age or Rise of the Robots), and usurp other human roles.

Chapter 5 “Overly Intelligent AI” is a feast for SF fans where the author shows the Gorilla Problem and refers to the Butlerian Jihad in Frank Herbert’s Dune “Thou shalt not make a machine in the likeness of a human mind“. It’s obvious, but some people believe that using Asimov’s Robotic Laws “or something like that” would lead to a solution for the control problem. Russell is absolutely effective in proving that that’s a completely false direction, because they consist of goals, and we cannot formulate goals good enough that they cannot be interpreted misleadingly by a superintelligence or simple be circumvented.

Do we really have to act? Chapter 6 “The not-so-great Debate” answers that in a very entertaining, even funny way. It brings up typical defensive demeanors like denial: “it’s complicated, impossible, too soon to worry about it,” deflection: “you can’t control research, whataboutery, keep silent about the risks”, or solutions like “can’t we just switch it off, put it in a box, work in human-machine teams, merge with machines?”

Chapter 7 “AI: a different approach” leads us to the three principles cited above. Chapter 8 asks for “Provable Beneficial AI” getting back to a lot of game theory.

A longer chapter 9 “Complications: Us” broadens the scope to several moral philosophies like utilitarian AI, and its challenges. Humans have conflicting, changing preferences, and some humans are nasty, envious, greedy, stupid, and of course emotional. How could a superintelligence incorporate their preferences and should it even do so?

The last chapter 10 “Problem Solved?” asks for a Global Governance of AI.

In summary, the book is highly entertaining in a scientific sense. While it’s mostly technically oriented, it includes a broad range of philosophical discussion. Parts of the book were mind-blowing for me, changed my view on possible solutions completely. That’s why I highly recommend this book.
Profile Image for Alex Railean.
265 reviews39 followers
March 6, 2021
A very thought-provoking book. The author explains some basic AI concepts and reviews the history of the field, then moves on to the main theme - a discussion about what it takes to ensure the AI will remain under our control.

The primary value I got out of it is his 3-point approach towards solving the problem. I find his argument convincing and will definitely keep these points (see notes below) in mind in the future.


Note: I only documented the latter part of the book; the intro was also very interesting, but given that the density of interesting points was rather high, it is easier to just re-read it.



### envy and pride
Imagine that we have the technology to make substantial improvements in the quality of life.

If defining happiness is "to be in the top 1%", then by definition 99% of the people will be unhappy, although they live in objectively good conditions.

Therefore, as we progress we have to revise our attitude towards envy and pride.

### problems
- the Midas problem: in retrospect, the goal you set is not what you really wanted, but the machine is already doing its best to make it happen and the process is not necessarily reversible (-:
- the gorilla problem: the product outperforms its maker, the maker loses control (I disagree with the analogy, as gorillas didn't produce humans, but the author's point is clear)

### misc
The oracle intelligence approach - easier to sandbox.


## human-compatible
Basic guidelines for ensuring that the machine moves towards our goals in the way we want and avoids the "Midas problem"

1. The machine's only objective is to maximize the realization of human preferences
2. The machine is initially uncertain about what those preferences are
3. The ultimate source for defining human preferences is human behavior

Preferences:
- if you see two films about your own hypothetical future, playing different scenarios, you can choose which one you prefer, or express neutrality (no preference).
- note that this is a film about you, not some generic material about an abstract human


**Obj1**
The machine is purely altruistic, it is built for humans and it disregards its own "well-being" as a factor, unless it is material to the objectives at hand (which were defined by humans).

Any kind of self-protection logic comes with a lot of side-effects.

**obj2/3**
A computer that knows the goal in advance will assume it already knows all that it needs to know. It will never ask for corrections. Any additional human input might be perceived as a nuisance, as a deviation, as a sneaky attempt to derail it from the right path.

However, priorities and views change over generations, and even within one's lifetime.

What we need is a "human in the loop" approach, or some sort of a REPL-like experience, where the machine continuously asks for human guidance, or at least exposes an API to do so.

If we bake-in values in advance - then which values? Getting them exactly right is extremely difficult, while getting them wrong can potentially lead to disasters.

The key principle is to ensure that uncertainty of the objective itself is a given (not just uncertainty in sensor readings, the actions of other agents, etc.)


Effectively, we're building a "human preference prediction machine", which can produce and refine models for each individual.


**built-in ethics**
Not necessary.

Does a criminology expert become a criminal by studying criminals?
Would a machine studying human preference become evil, like some of the individuals it examined?


If a corrupt official seeks bribes so they can send their children to a university - the computer can understand the rationale and find other ways to achieve that goal, without detriment to other people.

Special case: people who take pleasure in harming others. Here we need some sort of a special handler.

### provably beneficial AI
### loyal AI
NOT good if it is only loyal to one person. It must take into account the preferences of other people.
Profile Image for Sabin.
357 reviews35 followers
December 17, 2022
Definite winner. Stewart Russell does an amazing job of bridging popular nonfiction and something which actually gives the specialists a run for their money. Following a brilliant first part where the author summarizes the state of AI research up to 2018, he then keeps pounding away at a very important issue: AI safety. That is, how do we design AI that helps humanity and does not pose an existential risk.

The author is perfectly clear on one point: we need to prepare, not with flamethrowers and assault weapons in the case of a robot uprising à la Terminator, but in terms of understanding the potential risks of an AI that, due to faulty programming or any number of other unexpected factors, might end up not having our best interest in mind. In the second part, the author tries to convince his readers of the dangers which lie in the future, as AI systems become more intelligent and autonomous. Russell argues his case against two counter-arguments: One, that we would never be able to create superhuman intelligence - which past experience disproves by analogy, and because of the fact that we don’t actually need superhuman AI to influence human affairs (see the polarizing effect that Facebook’s content suggestion algorithms have had). And two, that there’s nothing to worry about and that some smart guys and gals will come along and deal with the problems as they arise. This sounds a lot like firefighter mode in project management, when nobody has the time or energy to do the actual management because so many issues rise up due to bad planning that everybody just spends their whole time putting out fires. So what the author proposes on this front is just plain common sense risk management.

The third part argues for a foundational framework which could ensure that AI doesn’t get out of control. He posits three principles (which have absolutely no relation - not even as a passing reference - to Asimov’s three laws of robotics), whereby an AI’s objectives are dictated solely by human preferences: The AI’s only objective is to maximise the realization of human preferences, the AI is uncertain about what they are, and the only source of information about them is human behaviour. He argues that a possible way to design such an AI is through reverse reinforcement learning, a technique which derives the reward for its actions by observation. Come to think of it, if you boil it down to this, it doesn’t look like we could implement this technique anytime soon. The first part, to decide on an outcome based on a reward function is basic reinforcement learning. Track the AI using examples and it will attempt to maximize the reward. But in order to get this reward function you need to use some kind of deep learning technique which changes the function’s weights according to observations. And you also need to make sure that this reward function covers the whole probability space of human preference. Or something of the sort. (The solution is not, strictly speaking, outside the realm of possibility. But it might be a few to a couple of hundred PhD dissertations and research papers away.)

The good thing about this technique is that, since it’s a mathematical solution, we should be able to prove that it works. We just need some smart enough people to come along and actually do (invent?) the maths. I’m keeping my fingers crossed.
Profile Image for Daniel Hageman.
340 reviews47 followers
May 23, 2021
I still have a fair bit of hesitancy with various views in this space, particularly around some of the analogies used to illustrate the proposition of intelligence explosions, but this is surely the go-to book to hear this side of the story (notably more digestible than Bostrom's SI).
Profile Image for Laurence.
439 reviews51 followers
May 1, 2024
Bijzonder interessant boek over Artificial Intelligence, omdat het wijst op de problemen die kunnen (en zullen) ontstaan bij het creëren van systemen die intelligenter zijn dan wijzelf. De auteur wijst terecht op het feit dat we de controle moeten behouden (en dat dit niet eenvoudig is), en dat we daar ook nu al aan moeten werken. Het is allemaal erg onderbouwd en feitelijk uitgewerkt, waardoor ik eerlijk moet toegeven dat ik bij het lezen van dit boek nog maar weinig hoop heb dat dit allemaal goed afloopt. Zoals de auteur ook aanhaalt, medicijnen moeten uitgebreid getest worden voor ze op de markt komen, maar AI kan zomaar door eender welke partij worden aangeboden en in het publiek gesmeten worden. Hoog tijd dat iedereen zich bewust wordt van de problemen die dit kan geven.

Het boek dateert al van 2019, maar er is gelukkig op het einde een recent hoofdstuk (2023) toegevoegd, dat de recente evoluties beschrijft. Ik vond dat extra hoofdstuk bijzonder leerrijk, vooral om dat de modellen die nu furore maken niet de AI-modellen zijn die Stuart Russell in 2019 voor ogen had. ChatGPT e.d. zijn zelfs nog meer een onvoorspelbare black box, maar tegelijkertijd wel geen "superintelligence" systemen omdat ze op één functionaliteit getraind worden.

Zeker een aanrader voor iedereen die geïnteresseerd is in het thema, al is het soms een beetje repetitief.

(3,5 sterren)
Profile Image for Wendelle.
1,743 reviews51 followers
January 8, 2022
Perhaps safely described as one of the most important books recently published. Prof is co-author of AI bible. We're on the cusp of truly smart AI, perhaps the most consequential milestone in human history. We don't want to be extinct, we don't want to be pets. What failsafe should we program into the AI before it escapes 'control' and exponentiates its own intelligence within a matter of days or weeks? The prof provides a suggestion. 3 rules should underwrite the AI's existence: i) prioritization of humans' preferences, ii) the AI starts from uncertainty because it doesn't know the humans' preferences beforehand and must continually learn them, and thus continually adjust its mission, iii) the AI must learn our preferences from our choices. This is obviously a very worthwhile read on an impactful topic. Watch this TED talk to get a taste https://www.ted.com/talks/stuart_russ...
Profile Image for Tarmo Pungas.
147 reviews6 followers
September 29, 2021
Great introductory book for everyone wanting to learn about AI safety. Doesn't require a technical background and some concepts are explained more elaborately in the appendices.
Profile Image for Mikael Raihhelgauz.
35 reviews8 followers
December 26, 2020
Head teadlased on harva head esseistid (kehtib ka vastupidine). Stuart Russell siiski ühendab endas mõlemat annet. Enamik populaarseid tehisintellekti-teemalisi raamatuid sisaldab nii palju bs-i, et neid on lihtsalt võimatu lugeda. “Human Compatible” on selles mõttes erandlik. Raamat on korraga tasakaalukas, hästi argumenteerid ja lihtsasti arusaadav.

Üldjuhul rääkides tehisintellektiga seotud ohtudest kujutatakse ette “Terminaatori” stsenaariumi kordumist päriselus. See on muidugi suht võimatu, mistõttu pole vähegi informeeritud avalikkusel ka erilist huvi antud teema vastu. Russell näitab veenvalt, miks on ikkagi vaja riskidega tegelda. AI ei ohutsa meid üliinimliku võimekuse tõttu. See on ohtlik just oma lolluse ja kohmakuse pärast. Masinale on äärmiselt raske selgitada, mida inimene *tegelikult* tahab. “Human Compatible” selgitab ammendavalt, miks see nii on ja mida probleemiga peale hakata.
Profile Image for Keith Swenson.
Author 15 books51 followers
December 19, 2019
After years of disappointing expectations, AI is finally arriving. It is here today in nascent form, and will surely expand capabilities quickly. But can we avoid creating a super intelligence that destroys humanity? This concern is routinely listed on the top five possible ways for humanity to terminate itself -- so listen up: this is important.

Russell steps back from this dire prognostication to begin the book with a review of AI, how it has come along to where we are today. The Baldwin effect shows that intelligent animals evolve faster. Whether it is conscious makes no difference, because intelligence is whatever is competent at getting things done. The famous Turing test is really not useful, because it assumes that AI is going to be human-like.

AI appeared with earlier simple games, but recently has conquered Chess, Go, and Jeopardy. We expect to see self driving cars and any number of personal assistants. HE covers where it is working,and also the cases where AI is misused. It is particularly relevant in a discussion of 'fake news' and how AI techniques have brought about the "post-truth age". He wanders into the worker crisis that robots promise to bring. And continues right into the problem of overly intelligent AI, and how people will basically ignore and otherwise completely mishandle AI. I must admit that by this point in the book he have presented so many stereotypes and misinterpretations of AI that I was afraid he would completely miss the point that AI is not human.

Finally, in chapter 7 he gets around to revealing his approach that will save the day! Don't try to give the machine a concrete goal to achieve on its own, but instead tie the machine to satisfying the human master. In short, these rules summarize it:

1) Machine's only object is realization of human preference
2) Machine is initially unsure of preferences and must try to figure it out
3) The ultimate way to figure this is observe human behavior

That is, machines are designed to altruistic, to be humble, and to watch and learn from human behavior. This does nicely solve the problem and avoids the "loophole principle" which assures us that the machine will find a way if sufficiently intelligent and sufficiently motivated. We humans are terrible at telling the machine what to do. This approach completely lets us off the hook. Let the machine figure out what we want. That of course is what every competent assistant has been doing since the beginning of recorded history.

But will it work? There are, as it turns out, complications. Humans motives are not always pure. We react based on emotions, and run illogical vendettas when slighted.

He does not really delve into the obvious elephant in the room: what happens when a rich person purchases AI to satisfy their goals to the exclusion of all other considerations? What if a patently evil person like Hitler had AI to satisfy his goals? Shouldn't there be some group level considerations? AI can never be a full participant in a human community, and therefor is somewhat excluded from social forces that keep people in line. But again, once we step into giving the machine objective rules to follow, we run afoul of the loophole principle. So we are not completely safe from malevolent human behavior, but we avoid the problem of driving the AI with stupid and dangerous rules.

It is clear that Stuart Russell has done a lot of thinking on this, and I find his solution certainly satisfying in that it addresses immediately the basic problems with Asimov's three laws, and promises surely to get us another league or two down the road to safe AI. The problem is in no sense solved, but we do now have longer to play the game.



Profile Image for Quinn Dougherty.
56 reviews9 followers
February 19, 2021
Good stuff. You should probably read. I'm skipping appendices because I have somewhat of a CS education in AI and at a glance the appendices look like background for the uninitiated.

Russell calls all of goaldirectedness "the standard model" and says we should declare today year zero of real ai research-- he proposes a nonstandard model that is just three principles. 1. The machine's only objective ought to be satisfaction of human preferences. 2. The machine is uncertain about what those preferences are. 3. The source of information about human preferences is human behavior.

I guess it's complicated to claim that principle 1 is really outside of goaldirectedness, but I think it's still a nonstandard notion-- statistics is done by presupposing that you have a loss function a priori, and I think weakening the a priori part is an extremely valuable direction of research.

He has a chapter on something called "provably beneficial", so of course there's an interesting question of what he means by "proof". I was not satisfied here, and when I write a post on (possibiltiies of) high impact careers in formal verification I will elaborate.

He at one point said that goaldirected ai is kind of like the subroutine notion of software, that the ^2 button on your calculator has a specification describing what the start and end of computation looks like, and that crucially it should not terminate until it has something that fits the specification. Russell says instead that an AI ought to report back after 30 seconds "here's what I came up with, do you want me to keep trying or not?", this idea of uncertainty-permitting specs. A potent idea! Not new-- if you take a look at Pei Wang's NARS research agenda, derived entirely from what he calls AIKR: the assumption of insufficient knowledge and resources, you can find implementations that do precisely this. But NARS is not anthropocentric-- its notion of goals, while not quite the same as what russell calls the standard model, isnt obsessed with observing its parents.

I think the "G" is an awful concept and we should get rid of it. AGI, as an agenda, can do nothing but lead to red herrings-- tempting people to "consciousness" and "what is intelligence really?" And "what is generality really?" Or worse, "what's artificial?" For me, AI is just a lisp token that points to transformative technologies, where the word transformative is defined by openphil as "revolutionary impact on civilization c.f. the industrial revolution". This puts the focus on what machines DO rather than what machines ARE, a much more valuable focus that will derive much more valuable research agendas. This I think Russell would agree with.

And, indeed, I have more complicated views on anthropocentrism and the age-old question "when is unaligned ai morally valuable?", but I'll have to write them up some other time. But I'll say that I think if safeguarding future generations of us is insufficiently cosmopolitan, that it's a forgivable mistake, given the stakes.
Profile Image for Julia.
311 reviews15 followers
December 7, 2021
I think this was a fantastic introduction to learning about AI safety and alignment (although I have to read some more about the topics to verify that intuition!). Russell identifies critical issues with the current model of designing machines to pursue specific objectives and proposes an alternative model which instead centers human preferences and accounts for uncertainty. Along with that, he also covers the history of research into intelligence, conceptual breakthroughs required for superintelligence, potential benefits and misuses of AI systems, and the necessity of AI safety research. The book occasionally felt a bit scattered and the ending too abrupt, but otherwise full of interesting information and discussion!

Unfortunately, with superintelligent systems that have a global impact, there are no simulators and no do-overs. It's certainly very hard, and perhaps impossible, for mere humans to anticipate and rule out in advance all the disastrous ways a machine could choose to achieve a specified objective. Generally speaking, if you have one goal and a superintelligent machine has a different, conflicting goal, the machine gets what it wants and you don't.
Profile Image for Ietrio.
6,732 reviews25 followers
December 12, 2019
Another nobody who has found the best solution ever: we should get a czar to lead us!
Displaying 1 - 30 of 492 reviews

Can't find what you're looking for?

Get help and learn more about the design.