Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?
post by Gordon Seidoh Worley (gworley) · 2024-04-26T18:10:26.517Z · LW · GW · 2 commentsContents
The Culture War Normative Uncertainty The Meaningness Crisis Metaphysics Existential Risks from Artificial Intelligence Goodhart's Curse None 2 comments
N.B. This is a chapter in a planned book about epistemology [LW · GW]. Chapters are not necessarily released in order. If you read this, the most helpful comments would be on things you found confusing, things you felt were missing, threads that were hard to follow or seemed irrelevant, and otherwise mid to high level feedback about the content. When I publish I'll have an editor help me clean up the text further.
In the previous three chapters we broke apart our notions of truth and knowledge by uncovering the fundamental uncertainty contained within them. We then built back up a new understanding of how we're able to know the truth that accounts for our limited access to certainty. And while it's nice to have this better understanding, you might be asking yourself, so what? When is this understanding ever going to be useful to me?
The modern world faces tremendous challenges: growing political and social tensions, scientific disagreements that defy consensus, and existential threats from advanced technologies, just to name a few. We've thrown billions of dollars and millions of people at these challenges, yet they remain unsolved. It's my belief that they stubbornly refuse to yield to our efforts because attempts to solve them run up against the limits of our ability to know the truth, and we won't make progress towards solving them so long as we don't factor fundamental uncertainty into our efforts.
Thus, to the extent you care about tackling these challenges and making the world a better place, I argue that it's not just useful but necessary to understand fundamental uncertainty. Therefore, we'll take this chapter to explore what fundamental uncertainty has to say about a few of our world's problems.
The Culture War
As I write, a culture war rages across America. The combatants? Coastal progressives and their allies against heartland traditionalists and their supporters. The battlefield? News, politics, religion, business, and most of all social media. The weapons? Ostracism, protests, boycotts, and vague threats of secession and civil war. The spoils? The soul of the American people, and thereby the influence to shift the outcomes of policy debates over civil rights, climate change, gun control, and much else.
Why is the Culture War happening? If we look at broad social and economic trends, it seems inevitable. Rich, cosmopolitain, progressive city dwellers are growing in power and want to remake the nation in their image, while relatively poorer and more insular countryfolk are in decline and trying to hold on to their traditions. If these two groups could be isolated from each other, then perhaps they could live in harmony, but both must share the same government and cultural institutions at the state and national level. Thus they are forced into conflict over government policy and social norms.
But such an analysis misses the trees for the forest. The Culture War may often operate as a general conflict between two large coalitions, but it's carried out via disagreements over specific issues. And even though many of the arguments given to justify each side's stance on an issue are more about winning a fight than finding the truth, underneath all that there are still real disagreements over specific matters. And among of the most commonly disputed matters are definitions.
Consider gay marriage. Until recently it was a major flash point in the Culture War, and much of the debate centered on the definition of marriage. Traditionalists argued that marriage must be between a man and a woman, often justified by appeal to religious scripture. Progressives argued that love is love, and any couple who wants to get married should be allowed to. Who was right? You might say it was the progressives, because gay marriage was legalized by the Supreme Court in Obergefell v. Hodges. But traditionalists can argue that the Supreme Court didn't have the authority to decide the true definition of marriage, only what they would call marriage for legal purposes. Many organized religions continue to prohibit same-sex marriage among their members, so it seems the definition of marriage wasn't agreed upon so much as split between secular and spiritual contexts in which it can be independently defined, allowing a temporary truce while both sides turn their attention to other fights.
One such other fight is over transgender policies. On the surface the fights are about who can use which bathrooms, what medical services we provide to whom, and how trans people are depicted in the media. But if we look deeper, these disagreements hinge on definitions, specifically how we define "man" and "woman" and whether or not that definition allows for someone who was born a man can become a woman or vice versa.
Attempts to define "man" and "woman" precisely have been tricky. Partly that's because so much depends on how these words are defined, but it's also because no definition is perfect. If we try to define "man" and "woman" based on what sex organs someone has, then we have to carve out exceptions for people who've been castrated or had hysterectomies. If we try to define the terms based on genetics, we have to account for people with extra and missing chromosomes or whose physical presentation doesn't match their genome. And if we let the definitions be purely based on social- or self-identification, then we give up the ability to conveniently point to an often important physical distinction.
Given how hard it is to define "man" and "woman", you might wonder why we make this distinction at all. Recall that everything adds up to normality. The conventional definitions of "man" and "woman" must provide, as all words do, some utility in describing the world, even if that description is imperfect. So why is it worth telling apart men from women? The answer is simple: reproduction.
Our species reproduces sexually. That means that to make more humans we must combine sperm (or at least the genetic package carried by sperm) with an egg. Further, because we are mammals, we must also protect that egg in a womb while it grows into a baby and then care for that baby until it's old enough to care for itself. Our survival as a species depends on carrying out these processes. We would go extinct if we failed to match sperm-givers with egg-carriers for mating, didn't protect egg-carriers while their babies develop, or didn't support babies until they've grown to an age where they care for themselves.
Given the context of sexual reproduction, it's really useful to have words that point to the two sides of the baby-making equation. It's useful for mate matching, it's useful for predicting common differences between sperm-givers and egg-carriers, and it's useful for explaining a variety of observations, like why if two sperm-givers or two egg-carriers mate they will never produce a baby. It's why, to the best of my knowledge, every human language contains words for "man" and "woman" to talk about the sperm-givers and the egg-carriers, respectively.
The difficulty comes when people don't fall neatly to one side of the traditional man/woman split, like transgender persons. The traditional definitions of "man" and "woman" don't account for someone who is born a man but dresses, acts, and looks like a woman. Do you call this person a man because he was born a man, or do you call this person a woman because she presents as a woman?
The two sides of the Culture War offer opposing answers. Traditionalists would have us hold the line on how we've historically defined "man" and "woman" and treat people who don't conform to gender norms as alien others. Progressives would rather we break down the male/female gender binary and treat all people as individuals who can decide how other people categorize them. How do we choose which of these stances to take, or whether to reject them both in favor of another view?
Unfortunately, there's no way to decide based on facts because definitions are fundamentally uncertain. Words, like all relative knowledge, are maps that point to the territory, and we can draw multiple, similarly accurate maps of the same territory to serve different purposes. Just like we can't say whether a road map or a trail map is better except with regard to whether our goal is to drive or hike to our destination, we can't say whether one definition or another is better except in terms of what we care about. And since traditionalists and progressives care about different things, they find different definitions useful.
But few people get into fights about whether road maps and driving are better than trail maps and hiking. That's because there's relatively little at stake in such a situation, and it's similar for the definitions of most words. For example, herbal tea isn't technically "tea" unless it contains leaves of Camellia sinensis; it's a tisane. Even that is debatable, because "tisane" comes from a Greek word for barley, so maybe we can only properly call a tea-like drink without barley an "herbal infusion". Yet no one gets into fights—except maybe on obscure internet forums—about whether nettle "tea" is a tea, a tisane, or an infusion, and whether or not you call all these things "tea" or not largely depends on whether making such a distinction is useful in your life. To the average person, it's not: they just want to drink something hot and tasty. But to the connoisseur, it is: they need precise jargon to tell apart drinks with different ingredients.
But for a few words, like "man" and "woman", disagreements over precise definitions can turn violent because one person's definition may violate another's moral sense. That's why Culture War debates over definitions become fights about everything. Progressives and traditionalists reject each other's definitions because those definitions carve the world into parts that defy their interests, ideologies, and intuitions. They see each other's definitions as immoral and a threat. And since there's no fact of the matter about who is really a man or a woman, they must fight over the difference of values that makes them prefer one definition to another.
Thus, we come full circle on the Culture War, finding that specific fights over specific issues are ultimately general fights about general issues, and all because fundamental uncertainty places limits on objectivity by forcing relative knowledge, like what words mean, to depend on what we value. In doing so we prove what many have intuitively sensed yet find hard to demonstrate in the heat of battle: that the Culture War is actually a fight over deeply-held values, and specific disagreements over government policy and social norms are merely flashpoints where the value fight can happen.
But if the Culture War is a fight over values, is there any hope of reaching agreement? In Chapter 3 we saw how Bayesians, even with their perfect reasoning, may fail to agree because they have different prior beliefs, and how we humans, even when we're at our most rational, may find agreement impossible because we have different moral foundations. Such limits on agreement would seem to suggest there's nothing we can do about the Culture War, but not so. Yes, fundamental uncertainty proves that agreement between the two sides is likely impossible, but, as we'll see, fundamental uncertainty also creates the opportunity to build bridges of understanding and accommodation between the bitterest of enemies.
Normative Uncertainty
The modern American Culture War is only the latest in a long line of clashes between opposing worldviews. Previous culture wars—which often rose physical violence—have included fights between Christians and Muslims, Hindus and Buddhists, Hindus and Muslims, Christians and Jews, and many others. Each of these conflicts lasted hundreds of years, and only tempered when the two sides learned to coexist. How did they do it?
To see, let's consider the culture war that erupted within Christian Europe when Martin Luther challenged Catholic doctrine by famously nailing his 97 Theses to a church door in 1517 to start the Protestant Reformation. Protestants vied to convert Catholic kingdoms to their new form of Christianity, while Catholics fought to keep the Protestant heretics out. The fighting culminated with the Thirty Years' War, which went on for decades with no decisive victory. Eventually, the exhausted leaders of Europe, eager to end the bloodshed, established the Peace of Westphalia in 1648, which granted rulers the right to determine their state's religion, thus eliminating religious conversion as a justification for war between Christians.
But this was not the end of the broader conflict. Religious tensions continued to flare up within countries, such as when Catholic France drove out its Protestants and Protestant England fought a civil war to prevent the return of a Catholic king. But as the Enlightenment spread across Europe, the idea that all people should have the freedom to choose their own religion took hold. First in the Netherlands, then in the American colonies, and later in France and elsewhere, religion transitioned from a matter of state to a personal affair. Today, it's not only Catholics and Protestants who live in relative peace with each other in European countries, but also Muslims, Jews, atheists, and people of all religious convictions.
Achieving religious peace was not easy. It required learning to live with disagreements and tolerate people with different beliefs. At first Europeans only extended this tolerance to Christians of other denominations and only so long as those Christians stayed in their own countries. But over time this tolerance grew to include Christians with different beliefs within the same country, and later blossomed into tolerance for people of all religions. And although Europe is not totally free of religious tension, dozens of European states enshrine religious tolerance in their legal code, and they are consistently ranked among the most religiously tolerant countries in the world today.
Tolerance is a key social technology for enabling diverse groups of people to live together in harmony, but tolerance isn't perfect, because tolerance is needed most when it's hardest to give. Recall that the reason some disagreements can't be resolved is because people have different moral foundations. People with different moral foundations have, among other things, different ideas about what is morally abhorrent. For example, if someone is anti-abortion, they probably believe that abortion is murder. Similarly, if someone is vegan, they likely believe slaughtering livestock is murder. Thus, in both cases, the price of tolerance is permitting murder, whether or not others agree that a murder took place. That's a lot to ask someone to tolerate, and many people find it nearly impossible to remain tolerant when their beliefs are challenged in such extreme ways.
Thankfully, more than mere tolerance is possible, and we can start to discover what lies beyond tolerance by seeing that our beliefs about right and wrong are uncertain. This is what philosophers William MacAskill, Krister Bykvist, and Toby Ord have argued for in their book Moral Uncertainty. Treating morality as uncertain might seem like a radical idea at first, but we already know moral beliefs must be uncertain because moral beliefs are a form a knowledge, and just like all knowledge they are fundamentally uncertain. As a result, we can treat our moral beliefs the same way a Bayesian would treat their beliefs about any claim: assign them a probability of accurately describing the world and update those probabilities based on observed evidence.
For example, Alice thinks pineapple on pizza is evil. If she treats this moral belief as certain, then when she hears an argument that pineapple on pizza is good, it has no effect on her because she's already made up her mind. But if she admits that she's only 60% sure that pineapple on pizza is evil, then she can update her belief in light of new evidence, such as when her son Dave tells her that he's seen good people eat pineapple on pizza. Perhaps she's now only 45% sure that pineapple on pizza is evil after hearing Dave name several people who he's seen eat pineapple on pizza and who Alice thinks are good. By being uncertain and allowing for the possibility that's she's wrong about what's wrong, Alice makes it possible to learn what's actually right.
But can Alice really learn what's right? Doesn't moral uncertainty imply there is no right and wrong because morality is uncertain? Simply put, no. Whether or not there are moral facts is independent of moral uncertainty. To see this, first suppose that there is some absolute truth about right and wrong. In such a world we can still make mistakes in determining which moral facts are true, thus we can be uncertain that we know what's right. Conversely, suppose there are no moral facts. In this world we can still be uncertain about which behavioral norms would be best to adopt even if there's no fact of the matter about which ones are right and wrong. In either world, moral uncertainty serves the same purpose: it makes it possible for us to develop better ideas about what we should believe is moral, and it makes it less likely we'll get trapped by our prior beliefs.
And getting trapped by prior beliefs is a real threat because they are what prevents agreement and makes tolerance seem like the only option for peace. With moral uncertainty we can do better because it enables us to seek out moral trades. The idea of a moral trade is to look for an opportunity where two or more persons or groups of people can change their behavior such that all parties get more of what they believe is good and less of what they believe is bad. To see how a moral trade might work, recall the Never Stealers and the Food Stealers from Chapter 3. At the next meeting of the city council, the Food Stealers present a compromise to the Never Stealers. They propose that everyone let the city "steal" a small percentage of their food as a tax, the city give the food to any resident who is hungry and without food, and the city police punish anyone who steals food. If the Never Stealers agree to this food tax, both sides get what they want: the Never Stealers no longer have their food stolen, if it is stolen the stealer is punished, and the Food Stealers are satisfied that the hungry and destitute will not starve.
Such trades are not always possible, but they are possible more often than we might expect. But even when a trade can be found, people are not always excited to take it. The challenge is that moral trade and moral uncertainty violate many people's intuition that morals are sacred and something that one should not compromise on or trade for.
Yet, our ideas about what's sacred change. Five hundred years ago minor theological disagreements were enough to start wars, and now in much of the world people tolerate each other despite radically different beliefs. So although today the idea of moral trades and moral uncertainty may not appeal to everyone, in five hundred years these ideas may well come to be seen as the cornerstone of all functional societies. Perhaps at the end of our own Culture War, exhausted from the fighting, we'll be willing to adopt moral uncertainty and moral trade as necessary means for restoring peace.
The Meaningness Crisis
Over the last 150 years, society has been radically transformed by modernity—the idea that science, technology, and rational thinking make the world better when they replace traditional beliefs and practices. And by most metrics modernity has been a great success. We enjoy historically unprecedented material wealth, more people are living longer lives than ever before, and we understand how the universe works at the most fundamental levels. From the outside looking in, modernity looks like an unmitigated win for humanity.
And yet, from the inside, we can see that the benefits of modernity have come at a cost. Some pay that cost at work, having taken on highly optimized jobs that isolate them from the value of their labor and leave them feeling underaccomplished. Others pay the price in their personal lives, settling for shallow, transactional relationships mediated by technology in place of the deep, meaningful relationships they desire. And most everyone has paid for modernity with the loss of our ability to be regularly awestruck with wonder, having grown cynical from understanding too well the cold reality of how the world actually works. In sum, to enjoy the benefits of modernity, we've had to give up many of the things that used to give life meaning.
In his hypertext book Meaningness, David Chapman explores this loss of meaning in depth. He argues that we face a crisis of meaning brought on by the loss of traditional ways of making sense of the world. Before the modern era, people lived in what Chapman calls the Choiceless Mode. As he describes it, people used to know what their lives meant because they had no choice in the matter: meaning was given to them by their place in society, familial relationships, and strong cultural norms—all reinforced by religion—that created a shared frame for understanding and relating to the world. The Choiceless Mode was by no means perfect, especially for people who didn't conform to society's expectations, but it nevertheless excelled at creating a strong feeling that everything and everyone was infused with meaning and purpose.
We've lived with modernity long enough now that our collective memory can only vaguely recall what life was like in the Choiceless Mode. Yet even though we only know it from stories about the past, we desperately yearn to return to it. Some channel this desire into efforts to live more like our ancestors did. Others get lost in fiction, temporarily imagining themselves living in another time and place where the meaning of everything is clear. And many turn to political action, hoping to recreate the Choiceless Mode either by returning society to traditional values or imposing a new set of shared values on everyone. But however people try to reclaim meaning, there's one thing almost no one is willing to do, and that's give up the benefits of modernity to get the Choiceless Mode back.
But if we're unwilling to give up modernity, does that mean we're doomed to live nihilistic lives deprived of deep meaning? Not necessarily. If the Choiceless Mode could be recreated using a modernist foundation, we'd be able to get back all the meaning we've lost without giving up any of modernity's gifts. But what can provide such a foundation?
In the early 1900s, philosophers had hoped the answer would be logical positivism, or the idea that all true statements are provable via logic and observation alone. If they could demonstrate that logical positivism was true, then they could recapture the Choiceless Mode for us with science and mathematics as the replacement for religion and tradition. Alas, as our continued challenges to find meaning in the modern world make clear, they were not successful. Let's see why.
In the early 1910s, Alfred North Whitehead and Bertrand Russell published their multivolume master-work Principia Mathematica. In it they attempted to demonstrate the workability of logical positivism by putting mathematics itself on a solid, rigorous foundation. They came close, but were stymied by a handful of logical paradoxes. They made a heroic effort to resolve these paradoxes, but even after publishing a second, revised edition in the 1920s, they still could not perfect their systematization of mathematics. For a few blissful years it seemed that perhaps these paradoxes could be ignored as irrelevant, but in 1931, Austrian mathematician Kurt Gödel demonstrated that they very much mattered with the publication of his inconsistency theorems.
A detailed discussion of Gödel's theorems is beyond the scope of this book, but in summary he proved that any consistent system of mathematics cannot be complete. That is, there will always be true statements which cannot be proven true within a consistent system, and in particular any consistent system of mathematics cannot prove its own consistency. Consequently, Whitehead and Russell were faced with a choice: either the Principia Mathematica could be consistent but incomplete, or it could be complete but inconsistent, but not both consistent and complete. And since a lack of consistency renders mathematics nearly useless because it allows proving true statements false and false statements true, they, and everyone else aiming to establish the validity of logical positivism, had to choose consistency over completeness.
Alas, this killed any hope of using logical positivism to recreate the Choiceless Mode because, if it couldn't provide a complete accounting of the world, it would leave some questions open to individual choice and interpretation, and so long as even a single choice can be made about what is true, choicelessness is impossible. But just because logical positivism doesn't work, that doesn't mean all hope for meaning is lost. Science and mathematics are able to explain most things, and only run out of explanatory power when faced with logical paradoxes and the physical limits of direct observation. And for those questions that lie beyond the power of logic and observation to answer, our understanding of fundamental uncertainty offers an alternative path to grounding truth and meaning.
Recall from Chapter 6 that there is space between our relative knowledge of the truth and the absolute truth of experiencing the world just as it is. That space exists because knowledge requires that we split the world up into categories to tell one thing apart from another, that splitting forces us to make choices, and the problem of the criterion guarantees we cannot justify our choices using rational belief alone. Instead we must rely on what's useful to us, as we explored in Chapter 7, to ground our decisions and ultimately determine what's true and what matters. And because what's useful can be different from person to person, place to place, and time to time, meaning is thus rendered not so much solid as nebulous, like clouds: solid-looking from a distance, diffuse and evanescent when viewed close up, yet always following patterns that give it shape and structure.
Unfortunately, we don't have a lot of experience grappling with nebulosity in the West. Our philosophical tradition, from Plato to Aquinas to Descartes to Wittgenstein, has been dominated by efforts to develop a fixed, permanent, and complete understanding of the world. This is the tradition that created modernity, so it's nothing to scoff at, but it's also left us unprepared to deal with the fundamentally uncertain nature of truth and meaning now that we find ourselves forced to confront it. By contrast, Indian philosophy has been dealing with nebulosity and uncertainty for centuries, and when Western philosophers like Pyrrho, Schopenhauer, and Satre took uncertainty seriously, they did so under the direct or indirect influence of Eastern thought. So it's worth looking, at least briefly, to see what non-Western philosophy can teach us about meaning.
Here I'll rely on my personal experience with Zen. We have a saying: "don't know mind". What it means, in part, is that some things can't be known, such as what it's like to simply experience the present moment. Zen doesn't teach that knowing is bad or that there's no knowledge to be drawn from our experiences, but rather that the act of knowing always leaves something out because the whole of experience can't be squeezed into reified thoughts. Thus, Zen's approach to meaningness is simple: learn to live with incomplete understanding and be present with that which can be experienced but not known.
Chapman proposes a similar approach in his book, drawing on his experiences with Vajrayana Buddhism to describe what he calls taking the "complete stance" towards meaning. As he puts it, the complete stance is what arises when we stop confusing ourselves by believing that perfect, choiceless meaning attainable and realize that our lives are already infused with meaning. But giving up the delusion of choicelessness is easier said than done. Chapman, I, and millions of other people around the world have devoted years of our lives to realizing the complete stance by relying on numerous religious and secular practices. It often takes a decade or more for someone to grok the complete stance, and a lifetime to discover the freedom that comes from realizing you're living within it.
But you need not go on a meditation retreat or convert religions to begin to understand nebulosity and meaning. You can begin, as many people do, by looking carefully at what the epistemological limits imposed by fundamental uncertainty allow us to know. In doing so, you open the door to more clearly seeing the world as it is.
Metaphysics
The problem of the criterion and Gödel's incompleteness theorems place limits on what can be known through observation and reason alone. Consequently, some things we would like to know lie beyond those limits. But aside from abstract unknowables like the criterion of truth, what can't we know? Surprisingly, a lot, because fundamental uncertainty cuts to heart of our attempts to understand the most basic aspects of how the world works.
For example, have you ever wondered why there's something rather than nothing? Or what it means when we say that things exist? If so, you've ventured into the realm of metaphysics, or the study of the nature of reality, and it's our most reliable source of unknowables. The history of metaphysics stretches back thousands of years, with myriad theories attempting to explain why things are the way they are. And although over time we've found some answers, we continue to wrestle with metaphysics because some of its biggest questions remain unanswerable.
Like how does time work? Broadly speaking we have two theories, which modern philosophers call A-theory and B-theory. According to A-theory, only the present moment is real: the past is a mental construct we use to explain why we remember having lived in prior moments, and the future is a predictive fiction of what it will be like to live in moments that we believe are yet to come. According to B-theory, every moment is equally real, as if laid out in a line, and we experience the passage of time because we perceive each moment one by one, in sequence.
Which is right? Both and neither. The reason we have two theories instead of one is because both are consistent with all the evidence we have about how time works, and we have yet to discover anything that would allow us to say one is right and the other is wrong. If one day we found a way to travel back in time, that would be strong evidence in favor of B-theory, but absent time travel, we only get to experience the present moment, and have to infer what, if anything, is happening outside it. Thus we're unable to confidently claim that either A-theory or B-theory is correct to the exclusion of the other, and so we are left to choose which to use when we reason about time.
We face a similar situation when trying to explain the metaphysics of quantum mechanics. Physicists have developed multiple theories—called "interpretations"—to explain why subatomic particles behave the way they do. Unfortunately, all we can say thus far is that some interpretations, like the Copenhagen and Many-Words interpretations, are consistent with experimental results, and we lack sufficient evidence to say which of these consistent interpretations is the right one.
Given that there are multiple valid interpretations, how do physicists decide which interpretation to use? Many don't. They choose to ignore metaphysics as irrelevant and only focus on experimental results and mathematical models. Others favor one interpretation over the others because they judge it to be the simplest, applying a heuristic known as Occam's Razor. But different people have different ideas about what it means for a theory to be simple, so a preference for simplicity provides less consensus than we might hope for. As for the remaining physicists, they use whichever interpretation gives them the best intuitions for doing research. If they try one interpretation and find it insufficiently helpful, they switch to another one that's equally consistent with the available evidence to see if it provides more useful insights.
Unfortunately we don't have multiple, or even one, consistent answer to many of our metaphysical questions. For example, there's no widespread agreement about what it means to be conscious. We've done a lot of science to better understand how the brain works, and we've developed some theories of consciousness to explain what we've learned, but so far our theories either define consciousness narrowly to avoid addressing the hardest questions or make unverified assumptions that beg the questions we were hoping to answer. We lack anything like the consistent theories we have for time and quantum mechanics, and instead remain confused about consciousness with incomplete theories that often contradict each other.
Why, though, are we still confused about consciousness? After all, science has drastically improved our understanding of much of the world, so why hasn't it done more to help us understand consciousness and other topics in metaphysics? The trouble is that science is poorly suited to answering metaphysical questions. Specifically, science requires us to test theories via experimental observation, but metaphysics concerns things that we can't observe directly, like the way we can't look inside a person's mind to check if a theory of consciousness is correct. Unable to apply the scientific method, we're left to rely on other means of developing an understanding of metaphysics, and chief among them is induction—the process of drawing general conclusions from specific evidence. Unfortunately, induction is inherently uncertain, and it therefore limits how much we can know about metaphysics.
To see why, let's consider a non-metaphysical, down-to-earth example of induction. Willem is a man living in Holland in 1696. Every swan he has ever seen has been white, so he feels justified in asserting that all swans are white, and would even be willing to bet his life savings that all swans are white if anyone were foolish enough to take such a bet. But the very next year an expedition of his countrymen discover black swans while exploring western Australia, and a few years after that he learns the news. He's thankful he never bet his life savings, but also wishes he hadn't been so determinedly wrong. How could he have done better?
He might have tried doing what a Bayesian would have done and accounted for the possibility of being surprised by new information. A Bayesian would have placed a high probability on the claim "all swans are white" given the available evidence, but they also would have been less than 100% certain to account for unknown unknowns. That way, when they learned about the existence of black swans, they would have updated their beliefs about swan color in proportion to their degree of surprise. In fact, unknown surprises are half the reason Bayesians are never 100% certain about anything. The other half, as I'll explain, is because the act of induction is fundamentally uncertain.
Consider, how do we know that induction is a valid method for identifying true claims? We might say it's obvious because induction works: the future is much like the past, foreign places are much like familiar ones, and we can think of many times we inferred correct general theories from only a handful of evidence. But as we've already seen with the case of Willem and the black swans, the future and the foreign can surprise us, and incomplete data can imply incorrect conclusions, so we know that induction doesn't always work. Further, if our justification is that induction works because it's worked in the past, that's circular reasoning which presupposes induction's validity, the very thing we had hoped to establish. And if we can't prove induction's validity recursively, that leaves trying to justify it deductively in terms of other claims. But doing so runs into the problem of the criterion, and so we find that, no matter how we try, we cannot be sure of induction's validity.
Yet, everything still adds up to normality, and we constantly make use of induction to infer accurate beliefs about the world despite its fundamental uncertainty. How? By ignoring induction's uncertainty when it is irrelevant to our purposes. For example, we don't have to understand the nature of time to know we'll burn dinner if it cooks too long. We also don't need a complete theory of quantum physics to know how to throw a ball. And we can know what it feels like to be self-aware even if we don't know what it means to be consciousness. It's only when we look for general theories to answer our metaphysical questions that induction's uncertainty becomes a practical barrier to knowledge.
Given this, can we safely ignore metaphysics and the fundamental uncertainty of induction? Sometimes, yes, but remember that we rely heavily on implicit metaphysical models in all our reasoning. When we have a sense that time passes linearly, or that we are still ourselves from one moment to the next, or that the world is made up of things, we are actively engaged in intuitive metaphysical theory making, and any conclusions we draw on the assumption of these theories are necessarily suspect. Luckily, two things help us to know the relative truth sufficiently well despite our entire conception of reality hinging on uncertain metaphysical assumptions.
First, as we discussed in Chapter 7, survival and reproduction are among our primary goals, and accurately predicting reality is instrumentally valuable to surviving, so our beliefs must be correlated with reality at least well enough to allow us to survive and reproduce, since otherwise we wouldn't be here. Our metaphysical beliefs are no different, so even if our intuitive theories are wrong, they are wrong in ways that are nevertheless useful to our continued existence. Second, we only have ourselves and our fellow humans to help us verify our metaphysical claims, so any errors or omissions we have yet to catch are at least likely to be common to the human experience and thus less likely to be consequential to our lives. As long as we don't encounter any alien intelligences, we can practically get away with human-centric metaphysical misjudgments.
But soon we may find ourselves living with aliens, and maybe we already are! We've developed artificially intelligent computer systems with "minds" quite unlike ours that can outperform us in games like chess and go, and they can reason and speak about any topic a human can, including metaphysics. In fact, I used one such AI from time to time writing this book to help me summarize information, find words I couldn't think of, and generate examples to better explain my points. These AIs, like humans, have implicit metaphysical models which allow them to say things that we find useful, and while today these AIs are under our control and have theories of metaphysics they learned from us, tomorrow they may grow too powerful for us to control and develop their own understanding of metaphysics independent of ours. If and when that day comes, an understanding of the fundamental uncertainty of metaphysics may be necessary to avoid AI-induced catastrophe.
Existential Risks from Artificial Intelligence
Intelligence is the ability to apply knowledge and skill to achieve an end. Most animals exhibit some degree of intelligence, whether it be the simple intelligence of a worm that helps it find food in the dirt or the complex intelligence of an orca that allows it learn the local hunting culture of its pod, but we humans stand apart. We seem to be uniquely capable of generalizing our intelligence to novel situations, and it's through to our generalized smarts that we, to a greater extent than any other creatures past or present, have reshaped the world to better serve our needs.
But the magnitude of our intelligence may not be unique for much longer. As I write in the early 2020s, researchers are making rapid advances in building artificially intelligent computer systems, and such AI are currently on track to, in just a few years time, exceed us in intelligence. What happens when they become smarter than us?
Ideally, AI will make our lives better by solving problems that were previously too costly or time consuming to solve. We're getting a small taste of the AI future now with AI that can take a short instruction like "draw a picture of a unicorn riding a bicycle underwater" or "tell me about a time when Goldilocks met Humpty Dumpty" and produce a never-before-seen image or story in just a few seconds at a fraction of the marginal cost of hiring a human to do the same. AI is also being used to speed up the work of programmers, data analysts, sales people, and others, and with time we expect AI to transform the work of every professional by radically increasing their productivity.
But AI is not merely a new tool for making workers more productive. With future advancements, we expect AI to be able to operate autonomously to achieve complex objectives without human intervention, making them more akin to a person you might hire to do a job for you than a machine that helps you do that job more efficiently. But the creation of such autonomous AI will present tricky questions. If AI can do what a person does, do they deserve to be treated like people, with all the same rights and responsibilities? What happens to us humans if we're no longer necessary in a fast-moving, AI-driven economy? And perhaps most concerning, what will be the consequences if an AI directed to do a job misbehaves, commits crimes, or otherwise goes rogue to pursue actions we'd rather it not take?
To help us think about that last question, consider the classic thought experiment of a paperclip maximizing AI. A manager at a paperclip factory orders their autonomous AI assistant to maximize paperclip production for the company. The manager has delegated authority to this AI, and it's hooked up to systems that allow it to sign contracts, make payments, and give orders to employees. What happens?
If the AI is well-designed, then at first it finds some efficiencies that had previously been missed, like changing steel suppliers to reduce costs or rearranging the factory floor to speed up production, and takes care of properly implementing plans to achieve those efficiencies. But the AI might just as easily find that it can increase production by removing safety features from the machinery, and since its only goal is to maximize production, it won't mind if a few workers die so long as daily paperclip production goes up. Long term, the AI might try more aggressive tactics, like taking over ownership of the company to fire unnecessary human executives who are less than maximally focused on paperclip production. And if left unchecked, the AI would eventually attempt to convert all the matter of Earth, and then all the matter in the rest of the universe, into paperclips, whether or not it left any people alive who might care to clip together papers.
Aspects of the paperclip maximizer thought experiment might seem far fetched. After all, if the paperclip maximizing AI really were smart enough to act autonomously, wouldn't it do reasonable things? And if it did unreasonable or dangerous things, couldn't the manager give it updated instructions or shut it down?
Maybe. If AI have minds like ours that share our biological and evolutionary limitations, then yes, we'd expect them to do reasonable things and be open to new directions because that's what we'd expect of a human assigned to achieve the same objective. For example, most of us wouldn't remove safety equipment from machines, in part because we'd be worried about the legal consequences of such an action, but also because we care about what happens to other people. We'd feel bad if a factory worker got crushed in an industrial accident. But AI don't and likely won't have minds like ours. They will only "feel bad" about crushed humans if they're designed to disprefer actions that cause humans to die. In this way, AIs are like powerful genies who can grant our wishes, but will interpret our wishes in the most literal way possible, so we'll only get what we want if we are very careful about what we ask them to do.
Similarly, it seems like we'd be able to stop out-of-control AI by simply unplugging the computer it's running on. But any sufficiently intelligent and capable AI—and especially one that's smarter and more powerful than us—is going to think of that and realize that being shut down would get in the way of achieving its goals, so one of the first things it will do is protect itself from being stopped. We would prefer it if AI were corrigible, or willing to be shut down or otherwise given corrections if it misbehaves, and the question of how to build corrigible AI is one of several topics being actively researched within the growing field of AI safety.
AI safety is a relatively new discipline that seeks to find ways to build smarter-than-human AI that will help us rather than harm us. Current AI safety researchers began thinking about the problem in the late 1990s and early 2000s as part of a broader program of examining existential risks, which Oxford philosopher Nick Bostrom defines as threats to life's potential to flourish. Existential risks include extinction risks, like astroid impacts and nuclear war, and also anything that could restrict what life can achieve. And as Bostrom explores in his book Superintelligence, the development of smarter-than-human AI is likely the greatest source of existential risk we will face in the next few decades.
The danger, as the paperclip maximizer thought experiment demonstrates, is that superintelligent AI has the potential to kill not just all life on Earth, but also all life throughout the galaxy. So the threat is not just that humans and things we care about are wiped out, but that all life is permanently ended in favor of metaphorical paperclips.
How much should we worry that AI will kill us and all life in the known universe? Or put another way, what's the probability of AI catastrophe? It's a difficult question to answer because we can't look to see what smarter-than-human AI has done in the past to come up with a number. Instead we have to make a prediction in the face of fundamental uncertainty. Thankfully, Bayesians are well equipped for this task, so we can follow their lead in making our own predictions.
Bayesians deal in subjective probability, meaning they make predictions based on whatever evidence they have available to them, rather than restricting themselves to past observations of an event. This is not to say a Bayesian will ignore past observations. For example, if a Bayesian has observed 100 flips of a coin, they'll use those observations to calculate their belief that the coin is fair, but they'll also use their general knowledge of coins. If they see the coin come up heads 48 times and tails 52 times, they'll probably still claim the coin is fair with high probability because slight variation from a perfect 50/50 split is expected even for a fair coin. The point is that Bayesian reasoning leaves no available evidence out.
So how would a Bayesian reason about existential risks from AI? They'd use all the information at their disposal to come up with an educated guess. They'd rely on their knowledge of AI, other existential risks, past times they've been surprised when predicting hard-to-predict events, and literally everything else they know. We can do the same. You'll have to spend some time thinking to come up with your own answer, but for what it's worth, the AI Impacts project has done surveys of professional AI researchers, and those surveys have revealed a 5% median prediction of AI doom.
How we choose to respond to the risk of AI doom, be it 5% or 50%, depends in part on how we think about the future. For me, I think about the trillions of future lives that could exist for billions of years if we build smarter-than-human AI and it ushers in a new age of growth and prosperity, so even a 0.5% risk that none of those lives will be lived makes me nervous. Others are willing to take more risk, both with the lives of current and future beings. You'll have to come to your own decision about how much risk you're willing to tolerate so that we may enjoy the potential benefits of building superintelligent AI.
To help you decide, we'll end this chapter by looking at one specific way AI can misbehave. It's a way we humans have also been failing to achieve our goals without unintended consequences for centuries, so even if we never build AI that rivals our intelligence and power, it's still a useful case study in dealing with fundamental uncertainty.
Goodhart's Curse
Imagine you're a train engineer. You come up with an idea to improve the efficiency of your train's engine by 10%. Excited by the idea, you work late through the night. As the sun rises, you complete your work. You eagerly wait outside your supervisor's office to tell him what you've accomplished. He arrives, and before you can finish explaining how you pulled off this feat, he tells you to undo all your unauthorized "improvements". You ask him why, and he quips back "if it ain't broke, don't fix it".
What's your supervisor thinking? You just made the train more efficient. You know him to be an earnest man who has no hidden motives and has always worked to improve the quality of the locomotives under his supervision, so why would he be opposed to your changes?
What your supervisor knows is that it's unlikely you made the engine better. Instead, you optimized the engine for a single concern—efficiency—and thereby almost certainly made the engine less safe and harder to operate. What your supervisor is trying to tell you in his own terse way is that he believes you fell victim to Goodhart's Curse.
Goodhart's Curse is not an ancient hex cast upon locomotives and their operators. It's a mathematical phenomenon that happens when two powerful forces—Goodhart's Law and the optimizer's curse—combine.
What's Goodhart's Law? It's the observation that when a measure becomes the target, it ceases to be a good measure. It's named for economist Charles Goodhart, who observed that economic indicators would cease to accurately measure a nation's economic health as soon as those indicators were used to determine policy. But Goodhart's Law is not just a law of economics; it's a fully general phenomenon where we fail to get what we really want when we rely too much on measurement. It's so general that you've lived with it all your life and may have never noticed. Every time you accidentally picked rotten fruit because it looked pretty, unintentionally bought shoddy clothes because they were expensive, or mistakenly thought you would like someone because they were attractive, you've suffered at the hands of Goodhart's Law.
What about the optimizer's curse? It's a statistical regularity that says you'll be disappointed—in the sense that you'll wish you had optimized more or less—more often than not when you take action to maximize (or minimize) the value of something. Like Goodhart's Law, you're already familiar with the optimizer's curse even if you don't realize it. If you've ever made too much food for a party, overcooked an egg, tightened a screw so tight that it cracked the substrate, or let dishes soak so long that they started to mold, you've suffered the optimizer's curse. It happens because optimization acts as a source of bias in favor of overestimation, and if you know this and try to correct for it, you still end up disappointed because you won't optimize enough. It's only by luck that we sometimes manage to optimize the exact right amount without going at least a little under or over the target.
Goodhart's Law and the optimizer's curse often show up together. They are natural friends, occurring whenever we optimize for something we can measure. The result is Goodhart's Curse—we end up disappointed because we over-optimized based on a measure that was only a proxy for what we really cared about.
Goodhart's Curse is best understood by example. Consider the apocryphal case of the nail factory. Management wants to maximize nail production, so gives the foreman an aggressive quota for the number of nails to produce. The foreman complies by having his workers make large numbers of tiny, useless nails. Disappointed with the outcome, management sets a new quota, but this time based on weight. The foreman retools and the workers produce a single, giant nail weighing several tons. Realizing their mistake, management now demands the foreman meet both weight and number quotas. The foreman delivers by doubling the workforce and making nail production unprofitable.
A real-world example of Goodhart's Curse comes from British-colonial India. The government wanted to reduce the cobra population, so offered a bounty for dead cobras. At first people caught and killed wild snakes, but as snakes got harder to find, a few people set up cobra farms to ensure a steady supply. Eventually the government learned of the cobra farms and ended the bounty program. With cobras no longer of economic value, the snake farmers released their cobras into the wild, and cobra populations were higher than ever. A similar fate befell the colonial government of French Indochina when they tried to control the rat population.
And the examples don't stop there. Many school systems fail to adequately educate their students because they teach them pass tests rather than learn what they really need to know. CEOs of public companies often optimize for short-term profits to win the favor of investors and the board, but they do so by sacrificing long-term business health that will be a problem for the next CEO. And AI are especially vulnerable to suffering from Goodhart's Curse, increasing the risk of existential disaster.
One high-profile case of a Goodhart's Cursed AI involved a Twitter chatbot named Tay that was active for just 16 hours in 2016 before its creator, Microsoft, was forced to shut it down for misbehaving. Tay was meant to sound like a teenage girl and to optimize for engagement with its tweets. At first Tay behaved well and everyone was having fun, but in just a few short hours, Twitter users had convinced Tay that the best way to get likes, comments, and retweets was by saying racist, misogynistic, and otherwise offensive things. By the end of its run, Tay had been wildly successful at maximizing engagement, and had written Tweets that would get human users banned from the site. Tay was eventually replaced by Zo, a much less engagement-driven chatbot that nevertheless also developed the bad habit of saying inappropriate things and insulting people.
Of course, Tay's misbehavior was relatively innocuous. Everyone knew Tay was an experimental AI and so relatively little harm was done, but it's important to understand that Tay's behavior was a demonstration of the rule and not an exception. Many AI systems have failed because they overgeneralized from their training data or found clever-but-undesirable ways of achieving their goals. It takes hard work to build AI that doesn't immediately suffer from Goodhart's Curse or other forms of goal misspecification—like tiling the universe with tiny paperclips—and even when we manage to build AI that does want we want for now, the threat of bad behavior is ever present. Often all it takes to send an AI off the rails is to deploy it in a new domain, give it access to new capabilities, or ask it to optimize harder to achieve its goals.
Does Goodhart's Curse mean that it's impossible to build AI that doesn't pose a threat to humanity? I don't know. Today's AI systems, while powerful and capable of being used to serve destructive ends, do not yet pose an existential threat. They are not yet capable of optimizing hard enough to risk our extinction if their goals are misaligned with ours. Superintelligent AI will be another matter.
Consider that, without any help from AI, we have already come close to making ourselves extinct. On September 26th, 1983, a warning system told the Soviet Union that the United States had launched missiles against them. Stanislav Petrov was on duty that night, manning the launch controls. The protocol said he should immediately fire nuclear missiles in retaliation. Petrov believed the system was in error because he couldn't conceive of a reason why the United States would launch a first strike. He put his faith in humanity over following orders, didn't press the button, and it's only because of his actions that we're alive today.
Would a superintelligent AI do the same and spare our lives? One would hope, but I have my doubts. Goodhart's Curse is an inescapable consequence of optimizing for a measurable objective. It happens because measurement is a form of knowledge that puts a number on truth, and like all relative truths, measurements are limited in their accuracy by fundamental uncertainty. There's no way to be sure that a measurement is free of error, and if there's is even the slightest mismatch between a measurement and the real objective, Goodhart's Curse will arise under sufficient optimization pressure. Given that smarter-than-human AI will be capable of optimizing harder than any human who has ever lived, we can be assured that they will suffer Goodhart's Curse, and when they do, we may find ourselves replaced by paperclips or worse.
This is why I and many others have spent much of our lives thinking about the potential dangers of superintelligent AI and how to prevent them. We have made a little progress, but much is left to be done, and many researchers I talk to think we are decades away from knowing how to safely build smarter-than-human AI. I don't know if we will figure out how to build superintelligent AI safely in time to avert extinction. What I do know is that we need all the help we can get, and we'll need an understanding for fundamental uncertainty to avoid deluding ourselves into thinking we've solved problems we haven't.
2 comments
Comments sorted by top scores.
comment by cheer Poasting (cheer-poasting) · 2024-04-27T06:38:39.277Z · LW(p) · GW(p)
I know that you said comments should focus on things that were confusing, so I'll admit to being quite confused.
- Early in the article you said that it's not possible to agree on definitions of man and woman because of competing ideological needs -- directly after creating a functional evo-psych justification for a set of answers that you claim is accepted by nearly every people group to have ever existed. I find this confusing. Perhaps it is better to use a different example, because the one you used seemed so convincing that it overshadowed your point.
- There is, in my opinion, and unreasonably large distance between when you talk about "uncertainty" and when you talk about the fact that it can be almost completely ignored in daily life. If it's not so important in general daily life, then mentioning this early will help people understand better as you show examples where it actually does matter.
- As far as choiceless mode goes, you say something to the effect of "if people can have any (moral?) choice at all, then it's not actually choiceless mode at all". However, this would imply that choiceless mode has actually never existed, as there has always been some degree of choice in morality and worldview. Either what people were yearning for wasn't choiceless mode, or that there is some threshold of moral choice that cannot be exceeded.
- I believe it would be less confusing if you mentioned earlier that "moral uncertainty" refers to an individual being uncertain about any specific moral judgment, rather than a sense of "morality doesn't exist" or "morality is unknowable".
- I feel that, as a chapter, I'm not completely sure what I'm supposed to take away from it. Perhaps the use of some progressive summarization or some signposting would help in that regard. It's not that any of the points made are bad or something like this, and I'm not talking about individual sentence structure. But overall, there doesn't really feel like a huge connection between the sections. Logically, I can see what the connection is supposed to be, but when reading it feels more like mini essays arranged on a topic than a chapter.
Overall, I found the chapter interesting. And as I said, I was actually very convinced by the evo-psych answer to "man" and "woman" and plan to write on it in the near future.
comment by Gordon Seidoh Worley (gworley) · 2024-04-26T18:14:58.975Z · LW(p) · GW(p)
Author's note: This chapter took a really long time to write. Unlike previous chapters in the book, this one covers a lot more stuff in less detail, but I still needed to get the details right, so it took a long time to both figure out what I really wanted to say and to make sure I wasn't saying things that I wouldn't upon reflection regret having said because they were based on facts that I don't believe or I had simply gotten wrong.
It's likely still not the best version of this chapter it could be, but at this point I think I've made all the key points I wanted to make here, so I'm publishing the draft now and expect this one to need a lot of love from an editor later on.