In Defense of a Butlerian Jihad

post by sloonz · 2025-01-11T19:30:17.641Z · LW · GW · 5 comments

Contents

  By default, humanity is going to be defeated in details
  Wait, what about my Glorious Transhumanist Future ?
  What is your plan ? You have a plan, right ?
  Fiat iustitia, et pereat mundus
  Conclusion
None
5 comments

[Epistemic Status: internally strongly convinced that it is centrally correct, but only from armchair reasoning, with only weak links to actual going-out-in-the-territory, so beware: outside view tells it is mostly wrong]

I have been binge-watching the excellent Dwarkesh Patel during my last vacations. There is, however, one big problem in his AI-related podcasts, a consistent missing mood in each of his interviewees (excepting Paul Christiano) and probably in himself.

"Yeah, AI is coming, exciting times ahead", say every single one, with a bright smile on their face.

The central message of this post is: the times ahead are as exciting as the perspective of jumping out of a plane without a parachute. Or how "exciting times" was the Great Leap Forward. Sure, you will probably have some kind of adrenaline rush at some point. But exciting should not be the first adjective that comes to mind. The first should be terrifying.

In the rest of this post, I will make the assumption that technical alignment is solved. Schematically, we get Claude 5 in our hands, who is as honest, helpful and harmless as 3.5 is (who, credit when credit is due, is good at that), except super-human in every cognitive task. Also we’ll assume that we have managed to avoid proliferation: initially, only Anthropic has this technology on hands, and this is expected to last for an eternity (something like months, maybe even a couple of years). Now we just have to decide what to do with it.

This is, pretty much, the best case scenario we can hope for. I’m claiming that we are not ready even for that best case scenario, we are not close to being ready for it, and even in this best case scenario we are cooked — like that dog who caught the car, only the car is an hungry monster.

By default, humanity is going to be defeated in details

Some people argue about AI Taking Our Jobs, and That’s Terrible. Zvi disagrees. I disagree with Zvi.

He knows that Comparative Advantages won’t save us. I’m pretty sure he also knows that the previous correct answers of previous waves of automation (it will automate low-value and uninteresting jobs, freeing humans to do better and higher-values jobs) is wrong (the next higher-value job is also automatable. Also it’s the AI that invented it in the first place, you probably don’t even understand what it is). I’m pretty sure he doesn’t buy the Christian Paradise of "having no job, only leisure is good actually" either. Removing all those possible sources of disagreement, how can we still disagree ? I have no clue.

We are about to face that problem head-on. We are not ready for it, because all proposals that don’t rely of one of those copes above (comparative advantages / better jobs for humans / UBI-as-Christian-Paradise) are of the form "we’ll discuss democratically it and decide rationally".

First, I don’t want to be this guy, but I will have to: you have noticed that the link from "democratic discussion" to "rational decisions" is currently tenuous at best, right ? Do you really want that decision to be made at the current levels of sanity waterline ? I for sure don’t.

Second, let’s pull my crystal ball out of the closet and explain to you how that will pan out. It will start with saying we need "protected domains" where AI can’t compete with humans (which means: where AI are not allowed at all). There are some domains where, sure, let’s the AI do it (cure cancer). Then we will ask which domains are Human Domains, and which ones will be handled by AI. Spoiler Alert : AI will encompass all domains. There won’t be any protected domain.

Each of this point is reasonable. Even when I put my Self-Proclaimed Second Prophet of the Butlerian Jihad hat, I have to agree that much of those individual points actually make perfect sense. This is a picture of a society that value Health, Justice, Equality, Education and so on, just like us, and achieve those values, if not Perfectly, at least way better than we do.

I also kinda notice that there are no meaningful place left for humans in that society.

Resisting those changes means denying Health, Justice, Equality, Education etc. Accepting those changes means removing ourselves from the Big Picture.

The only correct move is not to play.

Wait, what about my Glorious Transhumanist Future ?

  1. If you believe that the democratic consensus made mostly of normal people will allow you that, I have a bridge to sell to you.

  2. I strongly believe that putting the option on the table only makes things worse, but this post is already way too long to expand on this.

What is your plan ? You have a plan, right ?

So let’s go back to Dwarkesh Patel. My biggest disappointment was Shane Legg/Dario Amodei. In both cases, Dwarkesh asks a perfectly reasonable question close to "Okay, let’s say you have ASI on your hands in 2028. What do you do ?". He does not get anything looking like a reasonable answer.

In both cases, the answer is along the lines of "Well, I don’t know, we’ll figure it out. Guess we ask everyone in an inclusive, democratic, pluralistic discussion ?".

If this is your plan then you don’t have a plan. If you don’t have a plan then don’t build AGI, pretty please ? The correct order of tasks is not "built it and then figure it out". It’s "figure it out and then build it". It blows my mind how seemingly brilliant minds seems to either miss that pretty important point or disagree with that.

I know persons like Dario or Shane are way too liberal and modest and nice to even entertain the plan "Well, I plan to use the ASI to become the Benevolent Dictator of Humanity and lead us to a Glorious Age with a Gentle but Firm Hand". Which is a shame: while I will agree it’s a pretty crappy plan, it’s still a vastly better plan that "let’s discuss it after we build it". I would feel safer if Dario was preparing himself for the role of God-Emperor the same time he is building AGI.

Fiat iustitia, et pereat mundus

Or: "Who cares about Humans ? We have Health, Justice, Equality, Education, etc., right ?"

This is obviously wrong. I won’t argue for why it is wrong — too long post, and so on.

The wrongness of that proposition shows you (I hope it wasn’t needed, but it is a good reminder) that what we colloquially call here "Human Values" is way harder to pin down that we may initially think. Here we have a world which achieve a high score on Health, Justice, Equality, Education, etc., which nonetheless seems a pretty bad place for humans.

So what are Human Values and how can we achieve this ? Let me answer it by not answering it, but pointing you at reasons why it is actually harder than you thought, even taking into account that is harder that you thought.

Let’s start with an easier question: what is Human Height ?

On the Territory, you have, at any point of time, a bag of JBOH (Just a Bunch of Humans). Each Human in it has a different height. At a different point of time, you get different humans, and even humans that are common to two points in time will have different heights (due mainly to aging).

So what is Human Height ? That question is already underdetermined. Either you have a big CSV file of all living (and ever having lived ?) humans heights, and you answer by reciting it. Any other answer will be a map, a model requiring to make choices like what’s important to abstract over and what isn’t. And there are many different possible models, each with their different tradeoffs and focal points.

It’s the same for Human Values. You have to start with the bag of JBOH (at a given point in time ! Also, do you put dead people in your JBOH for the purpose of determining "Human Values" ?), and their preferences. Except you don’t know how to measure their preferences. And most humans probably have inconsistent values. And from there, you have to… build a model ? It sure won’t be as easy as "fit a gaussian distribution over some chosen cohorts".

There’s probably no Unique Objective answer to Axiology, in the same (but harder) way that there is no unique answer to "What is Human Height ?". Any answer needs to be one of those manually, carefully, intentionally crafted models. An ASI can help us create better models, sure. It won’t go all the way. And if you think that the answer can be reduced to an Abstract Word like "Altruism" or "Golden Rule" or "Freedom" or "Diversity"… well, there are probably some models which will vindicate you. Most won’t. I initially wrote "Most reasonable models won’t", but that begs the question (what is a reasonable model ?).

"In My Best Judgment, what is the Best Model of Human Values ?" is already an Insanely Hard problem (you will have to take into account your own selfish preferences, then to take into account other persons preferences, how much you should care about each one, rules for resolving conflicts…). There is no reason to believe there will be convergence to a single accepted model even among intelligent, diligent, well-intentioned, cooperating individuals. I’m half-confident I can find some proposals for Very Important Values which will end up being a scissor statement just on LessWrong (don’t worry, I won’t try). Hell, Yudkowsky did it accidentally (I still can’t believe some of you would sided with the super-happies !). In the largest society ? In a "pluralistic, diverse, democratic" assembly ? It is essentially hopeless.

So, plan A, "Solve Human Values" is out. What is plan B ?

Well, given that plan A was already more a generic bullshit boilerplate than a plan, I’m pretty confident that nobody has a plan B.

Conclusion

The last sections looks like abstract, esoteric and not very practically useful philosophy (and not even very good philosophy, I’ll give you that, but I do what I can)

And I agree it was that, more or less 5 years ago, when AGI was still "70 years away, who cares ?" (at least for me, and a lot of people). How times have changed, and not for the better.

It is now fundamental and pressing questions. Wrong answers will disempower humans forever at best, reducing them to passive leafs in the wind. Slightly wrong answers won’t go as far as that, but will result in the permanent loss of vast chunks of Human Values — the parts we will decide to discard, consciously or not. There are stories to be written of what is going to be lost, should we be slightly less than perfectly careful in trying to salvage what we can. We most likely won’t be close to that standard of carefulness. Given some values are plainly incompatible, we probably will have to discard some even with perfect play. There will be sides and fights when it will come to decide that.

Maybe the plan should be, don’t put ourselves in a situation where we have to decide that in a rushed fashion ? Hence the title : "In Defense of the Butlerian Jihad".

I’ll end with an Exercise for the Reader (except I don’t know the Correct Answer. Or if there is any), hoping it won’t end up as another Accidental Scissor Statement, just to illustrate the difficulties you encounter when you literally sit down for 5 minutes and think.

You build your ASI. You have that big Diverse Plural Assembly that is apparently plan A, trying its best to come with a unique model of Human Values which will lose as little as possible. Someone comes up with a AI persona that perfectly represent uncontroversial and important historical figures like Jesus and Confucius, to allow them to represent the values they carry. Do you grant them a seat at the table ? If yes, someone comes with the same thing, but for Mao, Pol Pot and Hitler. Do you grant them a seat on the table ?

5 comments

Comments sorted by top scores.

comment by R. Mutt (r-mutt) · 2025-01-11T21:33:39.000Z · LW(p) · GW(p)

What are you on about Christian Paradise equating not working? The book of Genesis says man will toil by the sweat of his brow. This is a good. 

Personal experience tells me I would degenerate under Ubi. I'm clearly meant to work for my daily bread. 

Replies from: sloonz
comment by sloonz · 2025-01-11T21:49:04.619Z · LW(p) · GW(p)

I’m pretty sure "man will toil by the sweat of his brow" is about down there, before you die and (hopefully) go to the paradise, and you don’t have to work in paradise. And anyway I know next to nothing to Christianism, it’s mostly a reference to Scott Alexander (or was it Yudkowsky ? now I’m starting to doubt…) who said something like "the description of christian paradise seems pretty lame, I mean just bask in the glory of god doing nothing for all eternity, you would be bored after two days, but it makes sense to describe that as a paradise if you put yourself in the shoes of the average medieval farmer that toil all day".

(I did all that from my terrible memories, so apologies if I’m depicting anything wrongly here).

comment by davekasten · 2025-01-11T19:42:52.728Z · LW(p) · GW(p)

I think you're missing at least one strategy here.  If we can get folks to agree that different societies can choose different combos, so long as they don't infringe on some subset of rights to protect other societies, then you could have different societies expand out into various pieces of the future in different ways.  (Yes, I understand that's a big if, but it reduces the urgency/crux nature of value agreement). 

Replies from: sharmake-farah, sloonz
comment by Noosphere89 (sharmake-farah) · 2025-01-11T20:35:39.118Z · LW(p) · GW(p)

I think the if condition is relying on either an impossibility as presented, or it requires you to exclude some human values, at which point you should at least admit that what values you choose to retain is a political decision, based on your own values.

comment by sloonz · 2025-01-11T20:30:16.299Z · LW(p) · GW(p)

I’m not missing that strategy at all. It’s an almost certainty that any solution will have to involve something like that, barring some extremely strong commitment to Unity which by itself will destroy a lot of Values. But there are some pretty fundamental values that some people (even/especially) here care a lot about, like negative utilitarianism ("minimize suffering"), which are flatly incompatible with simple implementations of that solution. Negative utilitarians care very much about the total suffering in the universe and their calculus do not stop at the boundaries of "different societies".

And if you say "screw them", well, what about the guy who basically goes "let’s create the baby eaters society ?". If you recoil at that, it means there’s at least a bit of negative utilitarianism in you. Which is normal, don’t worry, it’s a pretty common human value, even in people who doesn’t describe themselves as "negative utilitarians".

Now you can recognize the problem, which is that every individual will have a different boundary in the Independence-Freedom-Diversity vs Negative-Utilitarianism tradeoff.

(which I do not think is the only tradeoff/conflict, but clearly one of the biggest one, if not THE biggest one, if you set aside transhumanism)

And if you double down on the "screw them" solution ? Well, you enter exactly in what I described with "even with perfect play, you are going to lose some Human Values". For it is a non-negligible chunk of Human Values.