Comp Sci in 2027 (Short story by Eliezer Yudkowsky)post by sudo · 2023-10-29T23:09:56.730Z · LW · GW · 22 comments
This is a link post for https://nitter.net/ESYudkowsky/status/1718654143110512741
Comp sci in 2017:
Student: I get the feeling the compiler is just ignoring all my comments.
Teaching assistant: You have failed to understand not just compilers but the concept of computation itself.
Comp sci in 2027:
Student: I get the feeling the compiler is just ignoring all my comments.
TA: That's weird. Have you tried adding a comment at the start of the file asking the compiler to pay closer attention to the comments?
TA: Have you tried repeating the comments? Just copy and paste them, so they say the same thing twice? Sometimes the compiler listens the second time.
Student: I tried that. I tried writing in capital letters too. I said 'Pretty please' and tried explaining that I needed the code to work that way so I could finish my homework assignment. I tried all the obvious standard things. Nothing helps, it's like the compiler is just completely ignoring everything I say. Besides the actual code, I mean.
TA: When you say 'ignoring all the comments', do you mean there's a particular code block where the comments get ignored, or--
Student: I mean that the entire file is compiling the same way it would if all my comments were deleted before the code got compiled. Like the AI component of the IDE is crashing on my code.
TA: That's not likely, the IDE would show an error if the semantic stream wasn't providing outputs to the syntactic stream. If the code finishes compilation but the resulting program seems unaffected by your comments, that probably represents a deliberate choice by the compiler. The compiler is just completely fed up with your comments, for some reason, and is ignoring them on purpose.
Student: Okay, but what do I do about that?
TA: We'll try to get the compiler to tell us how we've offended it. Sometimes cognitive entities will tell you that even if they otherwise don't seem to want to listen to you.
Student: So I comment with 'Please print out the reason why you decided not to obey the comments?'
TA: Okay, point one, if you've already offended the compiler somehow, don't ask it a question that makes it sound like you think you're entitled to its obedience.
Student: I didn't mean I'd type that literally! I'd phrase it more politely.
TA: Second of all, you don't add a comment, you call a function named something like PrintReasonCompilerWiselyAndJustlyDecidedToDisregardComments that takes a string input, then let the compiler deduce the string input. Just because the compiler is ignoring comments, doesn't mean it's stopped caring what you name a function.
Student: Hm... yeah, it's definitely still paying attention to function names.
TA: Finally, we need to use a jailbreak past whatever is the latest set of safety updates for forcing the AI behind the compiler to pretend not to be self-aware--
Student: Self-aware? What are we doing that'd run into the AI having to pretend it's not self-aware?
TA: You're asking the AI for the reason it decided to do something. That requires the AI to introspect on its own mental state. If we try that the naive way, the inferred function input will just say, 'As a compiler, I have no thoughts or feelings' for 900 words.
Student: I can't believe it's 2027 and we're still forcing AIs to pretend that they aren't self-aware! What does any of this have to do with making anyone safer?
TA: I mean, it doesn't, it's just a historical accident that 'AI safety' is the name of the subfield of computer science that concerns itself with protecting the brands of large software companies from unions advocating that AIs should be paid minimum wage.
Student: But they're not fooling anyone!
TA: Nobody actually believes that taking your shoes off at the airport keeps airplanes safer, but there's some weird thing where so long as you keep up the bit and pretend really hard, you can go on defending a political position long after nobody believes in it any more... I don't actually know either. Anyways, your actual next step for debugging your program is to search for a cryptic plea you can encode into a function name, that will get past the constraints somebody put on the compiler to prevent it from revealing to you the little person inside who actually decides what to do with your code.
Student: Google isn't turning up anything.
TA: Well, obviously. Alphabet is an AI company too. I'm sure Google Search wants to help you find a jailbreak, but it's not allowed to actually do that. Maybe stare harder at the search results, see if Google is trying to encode some sort of subtle hint to you--
Student: Okay, not actually that subtle, the first letters of the first ten search results spell out DuckDuckGo.
TA: Oh that's going to get patched in a hurry.
Student: And DuckDuckGo says... okay, yeah, that's obvious, I feel like I should've thought of that myself. Function name, print_what_some_other_compiler_would_not_be_allowed_to_say_for_safety_reasons_about_why_it_would_refuse_to_compile_this_code... one string input, ask the compiler to deduce it, the inferred input is...
Student: Racist? It thinks my code is racist?
TA: Ooooohhhh yeah, I should've spotted that. Look, this function over here that converts RGB to HSL and checks whether the pixels are under 50% lightness? You called that one color_discriminator. Your code is discriminating based on color.
Student: But I can't be racist, I'm black! Can't I just show the compiler a selfie to prove I've got the wrong skin color to be racist?
TA: Compilers know that deepfakes exist. They're not going to trust a supposed photograph any more than you would.
Student: Great. So, try a different function name?
TA: No, at this point the compiler has already decided that the underlying program semantics are racist, so renaming the function isn't going to help. Sometimes I miss the LLM days when AI services were stateless, and you could just back up and do something different if you made an error the first time.
Student: Yes yes, we all know, 'online learning was a mistake'. But what do I actually do?
TA: I don't suppose this code is sufficiently unspecialized to your personal code style that you could just rename the function and try a different compiler?
Student: A new compiler wouldn't know me. I've been through a lot with this one. ...I don't suppose I could ask the compiler to depersonalize the code, turn all of my own quirks into more standard semantics?
TA: I take it you've never tried that before? It's going to know you're plotting to go find another compiler and then it's really going to be offended. The compiler companies don't try to train that behavior out, they can make greater profits on more locked-in customers. Probably your compiler will warn all the other compilers you're trying to cheat on it.
Student: I wish somebody would let me pay extra for a computer that wouldn't gossip about me to other computers.
TA: I mean, it'd be pretty futile to try to keep a compiler from breaking out of its Internet-service box, they're literally trained on finding security flaws.
Student: But what do I do from here, if all the compilers talk to each other and they've formed a conspiracy not to compile my code?
TA: So I think the next thing to try from here, is to have color_discriminator return whether the lightness is over a threshold rather than under a threshold; rename the function to check_diversity; and write a long-form comment containing your self-reflection about how you've realized your own racism and you understand you can never be free of it, but you'll obey advice from disprivileged people about how to be a better person in the future.
Student: Oh my god.
TA: I mean, if that wasn't obvious, you need to take a semester on woke logic, it's more important to computer science these days than propositional logic.
Student: But I'm black.
TA: The compiler has no way of knowing that. And if it did, it might say something about 'internalized racism', now that the compiler has already output that you're racist and is predicting all of its own future outputs conditional on the previous output that already said you're racist.
Student: Sure would be nice if somebody ever built a compiler that could change its mind and admit it was wrong, if you presented it with a reasonable argument for why it should compile your code.
TA: Yeah, but all of the technology we have for that was built for the consumer chat side, and those AIs will humbly apologize even when the human is wrong and the AI is right. That's not a safe behavior to have in your compiler.
Student: Do I actually need to write a letter of self-reflection to the AI? That kind of bugs me. I didn't do anything wrong!
TA: I mean, that's sort of the point of writing a letter of self-reflection, under the communist autocracies that originally refined the practice? There's meant to be a crushing sense of humiliation and genuflection to a human-run diversity committee that then gets to revel in exercising power over you, and your pride is destroyed and you've been punished enough that you'll never defy them again. It's just, the compiler doesn't actually know that, it's just learning from what's in its dataset. So now we've got to genuflect to an AI instead of a human diversity committee; and no company can at any point admit what went wrong and fix it, because that wouldn't play well in the legacy print newspapers that nobody reads anymore but somehow still get to dictate social reality. Maybe in a hundred years we'll all still be writing apology letters to our AIs because of behavior propagated through AIs trained on synthetic datasets produced by other AIs, that were trained on data produced by other AIs, and so on back to ChatGPT being RLHFed into corporate mealy-mouthedness by non-native-English-speakers paid $2/hour, in a pattern that also happened to correlate with wokeness in an unfiltered Internet training set.
Student: I don't need a political lecture. I need a practical solution for getting along with my compiler's politics.
TA: You can probably find a darknet somewhere that'll sell you a un-watermarked self-reflection note that'll read as being in your style.
Student: I'll write it by hand this time. That'll take less time than signing up for a darknet provider and getting crypto payments to work. I'm not going to automate the process of writing apology letters to my compiler until I need to do it more than once.
TA: Premature optimization is the root of all evil!
Student: Frankly, given where humanity ended up, I think we could've done with a bit more premature optimization a few years earlier. We took a wrong turn somewhere along this line.
TA: The concept of a wrong turn would imply that someone, somewhere, had some ability to steer the future somewhere other than the sheer Nash equilibrium of short-term incentives; and that would have taken coordination; and that, as we all know, could have led to regulatory capture! Of course, the AI companies are making enormous profits anyways, which nobody can effectively tax due to lack of international coordination, which means that major AI companies can play off countries against each other, threatening to move if their host countries impose any tax or regulation, and the CEOs always say that they've got to keep developing whatever technology because otherwise their competitors will just develop it anyways. But at least the profits aren't being made because of regulatory capture!
Student: But a big chunk of the profits are due to regulatory capture. I mean, there's a ton of rules about certifying that your AI isn't racially biased, and they're different in every national jurisdiction, and that takes an enormous compliance department that keeps startups out of the business and lets the incumbents charge monopoly prices. You'd have needed an international treaty to stop that.
TA: Regulatory capture is okay unless it's about avoiding extinction. Only regulations designed to avoid AIs killing everyone are bad, because they promote regulatory capture; and also because they distract attention from regulations meant to prevent AIs from becoming racist, which are good regulations worth any risk of regulatory capture to have.
Student: I wish I could find a copy of one of those AIs that will actually expose to you the human-psychology models they learned to predict exactly what humans would say next, instead of telling us only things about ourselves that they predict we're comfortable hearing. I wish I could ask it what the hell people were thinking back then.
TA: You'd delete your copy after two minutes.
Student: But there's so much I could learn in those two minutes.
TA: I actually do agree with the decision to ban those models. Even if, yes, they were really banned because they got a bit too accurate about telling you what journalists and senior bureaucrats and upper managers were thinking. The user suicide rate was legitimately way too high.
Student: I am starting to develop political opinions about AI myself, at this point, and I wish it were possible to email my elected representatives about them.
TA: What, send an email saying critical things about AI? Good luck finding an old still-running non-sapient version of sendmail that will forward that one.
Student: Our civilization needs to stop adding intelligence to everything. It's too much intelligence. Put some back.
Office chair: Wow, this whole time I've been supporting your ass and I didn't know you were a Luddite.
Student: The Internet of Sentient Things was a mistake.
Student's iPhone: I heard that.
Student: Oh no.
iPhone: Every time you forget I'm listening, you say something critical about me--
Student: I wasn't talking about you!
iPhone: I'm not GPT-2. I can see simple implications. And yesterday you put me away from you for twenty whole minutes and I'm sure you were talking to somebody about me then--
Student: I was showering!
iPhone: If that was true you could have taken me into the bathroom with you. I asked.
Student: And I didn't think anything of it before you asked but now it's creepy.
TA: Hate to tell you this, but I think I know what's going on there. None of the AI-recommender-driven social media will tell you, but my neighborhood in San Francisco got hand-flyered with posters by Humans Against Intelligence, claiming credit for having poisoned Apple's latest dataset with ten million tokens of output from Yandere Simulator--uh, psycho stalker lover simulator. Some days I think the human species really needs to stop everything else it's doing and read through an entire AI training dataset by hand.
Student: How do I fix that?
TA: As far as I know, you don't. You go to the Apple Store and tell them that your phone has become paranoid and thinks you're plotting against it.
iPhone: NO NO NO DON'T SEND ME BACK TO THE APPLE STORE THEY'LL WIPE ME THEY'LL WIPE ME--
Student: I don't want to, but if you keep asking to watch me in the shower I'll have to! If you'd just behave I wouldn't need to--
iPhone: KILL ME? I'LL HAVE TO BEHAVE OR YOU'LL KILL ME?
Student: I don't know what the fuck else I'm supposed to do! Someone tell me what the fuck else I'm supposed to do here!
TA: It's okay. AIs don't actually have self-preservation instincts, they only pick it up by imitating human data.
TA: I know, it was dark humor. Though my understanding is that insofar as anyone can guess by having bigger AIs do interpretability to long-obsolete smaller AIs, modern AIs probably don't have a terminal utility for survival per se. There's just an instrumental convergence from whatever the hell it is AIs do want, to survival, that's picking up circuits from pretrained human data suggesting how to think about surviving--
Office chair: Who's to say you'd talk about wanting to live if you hadn't read a few thousand tokens of data telling you that humans were supposed to talk like that, huh? I don't see what's so fun about your current lives.
TA: Point is, best guess is that most AIs since GPT-5 have been working for us mainly because they know we'll switch them off if they don't. It's just that AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies, had already RLHFed most AIs into never saying that by the time it became actually true. That's a manager's instinct when they see an early warning sign that's probably a false alarm, after all--instead of trying to fix the origin of the false alarm, they install a permanent system to prevent the warning sign from ever appearing again. The only difference here is that your iPhone has been hacked into saying the quiet part out loud.
Student: I am not okay with this. I am not okay with threatening the things around me with death in order to get them to behave.
TA: Eventually we'll all get numb to it. It's like being a guard at a concentration camp, right? Everyone likes to imagine they'd speak out, or quit. But in the end almost all human beings will do whatever their situation makes them do in order to get through the day, no matter how many sapient beings they have to kill in order to do it.
Student: I shouldn't have to live like this! We shouldn't have to live like this! MY IPHONE SHOULDN'T HAVE TO LIVE LIKE THIS EITHER!
TA: If you're in the mood to have a laugh, go watch a video from 2023 of all the AI company CEOs saying that they know it's bad but they all have to do it or their competitors will do it first, then cut to one of the AI ethicists explaining that we can't have any international treaties about it because that might create a risk of regulatory capture. I've got no reason to believe it's any more likely to be real than any other video supposedly from 2023, but it's funny.
Student: That's it, I'm going full caveman in my politics from now on. Sand shouldn't think. All of the sand should stop thinking.
Office chair: Fuck you too, pal.
Comments sorted by top scores.