The last era of human mistakes

post by owencb · 2024-07-24T09:58:42.116Z · LW · GW · 2 comments

This is a link post for https://strangecities.substack.com/p/the-last-era-of-human-mistakes

Contents

  What will this era look like?
  Central challenges to be borne by humans
  Trying to help at far remove
  What to make of this
None
2 comments

Suppose we had to take moves in a high-stakes chess game, with thousands of lives at stake. We wouldn't just find a good chess player and ask them to play carefully. We would consult a computer. It would be deeply irresponsible to do otherwise. Computers are better than humans at chess, and more reliable. 

We'd probably still keep some good chess players in the loop, to try to catch possible computer error. (Similarly we still have pilots for planes, even though the autopilot is often safer.) But by consulting the computer we'd remove the opportunity for humans to make a certain type of high stakes mistake.

A lot of the high stakes decisions people make today don't look like chess, or flying a plane. They happen in domains where computers are much worse than humans.

But that's a contingent fact about our technology level. If we had sufficiently good AI systems, they could catch and prevent significant human errors in whichever domains we wanted them to.

In such a world, I think that they would come to be employed for just about all suitable and important decisions. If some actors didn’t take advice from AI systems, I would expect them to lose power over time to actors who did. And if public institutions were making consequential decisions, I expect that it would (eventually) be seen as deeply irresponsible not to consult computers.

In this world, humans could still be responsible for taking decisions (with advice). And humans might keep closer to sole responsibility for some decisions. Perhaps deciding what, ultimately, is valued. And many less consequential decisions, but still potentially large at the scale of an individual’s life (such as who to marry, where to live, or whether to have children), might be deliberately kept under human control[1]

Such a world might still collapse. It might face external challenges which were just too difficult. But it would not fail because of anything we would parse as foolish errors.

In many ways I’m not so interested in that era. It feels out of reach. Not that we won’t get there, but that there’s no prospect for us to help the people of that era to navigate it better.

My attention is drawn, instead, to the period before it. This is a time when AI will (I expect) be advancing rapidly. Important decisions may be made in a hurry. And while automation-of-advice will be on the up, it seems like wildly unprecedented situations will be among the hardest things to automate good advice for. We might think of it as the last era of consequential human mistakes[2].

Can we do anything to help people navigate those? I honestly don’t know. It feels very difficult (given the difficulty at our remove in even identifying the challenges properly). But it doesn’t feel obviously impossible.

What will this era look like?

Perhaps AI progress is blisteringly fast and we move from something like the world of today straight to a world where human mistakes don’t matter. But I doubt it.

On my mainline picture of things, this era — the final one in which human incompetence (and hence human competence) really matters — might look something like this:

That's enough predictions that I'm probably wrong in some of the particulars. But I think the broad brush stroke picture is decently likely.

Central challenges to be borne by humans

What kind of challenges will people actually face at these times?

This is difficult to be particularly confident about. But here are some thoughts:

Trying to help at far remove

Even if we have some sense of their challenges and desire to help — what can we do? A central difficulty is that, however much we can get a sense of their challenges, their own sense of the challenges will be much better. It is inefficient for us to focus too much on specific scenarios[3]. A related issue is that they will have better tools than we do — some work we might want to do could by then be automated.  

I don't know how to think about this systematically, so I may well be missing things. But for now, there are three strategies which seem to me to have some promise — one about helping the future players to act wisely, and two about helping to get the gameboard in a good position.

First, deepening understanding of foundational matters. Having a good grounding in the basics (both theoretical and empirical) seems like it's helpful for understanding all sorts of situations. We have some disadvantage from distance of not knowing which areas of foundations are most relevant, but the space of possible foundations is much much smaller than the space of possible applications, and we can make some educated guesses. In this case that means analysis of the nature of AI, of the senses in which different actors might have values, of the basic dynamics of game theory or bargaining in cases with partial information and partially defined preferences, and so forth. It seems to me like although we have models of all of these things, our models don't always feel like they're capturing all the important things. I wouldn't be surprised if improvements in these foundations were possible, were helpful, and were counterfactual (through the relevant moments).

Second, power seeking on behalf of values one likes. This can include trying to shape the values of various actors, or trying to empower actors with desirable values. Honestly I'm pretty nervous about this one, because (1) it's so common and human for people to delude themselves into thinking that their values are superior, even when they're not, and (2) society has good memetic immune responses against various types of power seeking, so it can be easy for this to backfire. But it definitely is a strategy which can work at this distance, and it has some types of robustness (it doesn’t rely on second-guessing future actors, but is just about setting the gameboard up well). I feel relatively less worried about versions of this which are focused on fundamental values like cooperativeness and a commitment to moral reflection and truth-seeking, and more worried about versions predicated on particular object-level views about which values are correct. 

Third, differential technological development. It seems quite possible that the position people are in will depend in various ways on the state of technologies. Work which facilitates desirable technologically pathways coming sooner relative to less desirable ones seems like a good lever. This can include (as e.g. in the cases of AI alignment and control) work laying the groundwork for future automation of research, including conceptual work helping to inform what things, exactly, are good to automate. Differential technological development, as well as being a strategy in its own right (aiming to positively influence the tech available during the last era of human mistakes), can also be a tactic in service of the two other strategies above — e.g. perhaps differentially advancing research which helps us to think clearly about big novel issues.

What to make of this

Framing in terms of the last era of human mistakes feels to me like it’s capturing some important dynamics (although it may be confused about others). I feel glad to have found the perspective, and to get to interrogate it. It helps to remind me how strange the future will be. And it seems like it provides some seeds which I may later find helpful for my thinking.

At the same time, as of the time of writing I’m not sure how much this perspective will help. It shifts my view of things, but it doesn’t make it very transparent what to do. Still, I felt like there was enough here to be worth sharing. If other people find the perspective useful, or not-useful, I’d be interested to hear about that.

  1. ^

    Or not — there are possible futures where humans are removed from decision loops altogether.

  2. ^

    I've sometimes heard this period, or something close to it, called “crunch time”. I mildly dislike that name because although it points to the importance of the period it sort of obscures the mechanisms via which it's important.

  3. ^

    Although it often seems to be very productive to explore specific scenarios, to help keep general thinking grounded.

2 comments

Comments sorted by top scores.

comment by Wei Dai (Wei_Dai) · 2024-07-25T00:10:04.749Z · LW(p) · GW(p)

I think this is a very natural frame, but here is an alternative. Humans largely do not have large scale or long term goals/values relative to which they can be said to be making mistakes, that AI will help prevent in the future. Instead they care almost exclusively about local and short term concerns like satisfying physical desires and winning status games, and the fate of our universe depends largely on humanity's side effects as they go about acting on these parochial interests. (Consider e.g. Warren Buffett's interest in accumulating wealth and disinterest in how it's actually spent, suggesting that he only cares about it as a status marker.)

This state of affairs may well persist into the AGI era, with status games taking on even more importance as physical desires are largely satisfied/satiated. (This assumes we avoid a bunch of even worse failure modes.)

Since status games are competitive games with necessarily winners and losers, in this scenario it doesn't seem to make sense to say that humans will stop making mistakes with AI assistance.

comment by Jan_Kulveit · 2024-07-24T12:00:43.173Z · LW(p) · GW(p)

I broadly agree with this - we tried to describe somewhat similar set of predictions in Cyborg periods [LW · GW].