Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?

post by Benjamin Bourlier · 2024-03-15T23:17:37.531Z · LW · GW · 3 comments

[Intro Note: (1) this "post" is, technically, a "question", for which there's a separate category on LW. I debated posting it as a "question", but it also involves considerable argumentation for my own views. Ideally there'd be some in-between "question/post" category, but I opted for "post"? I understand if a reader thinks this should have been categorized as a "question", though, and would appreciate criteria considerations for how to distinguish these categories on the site in the comments--maybe "dialogue" would be better? (2) My goal in posting on LW is to become less wrong, not to persuade, get upvotes, etc. My recent first post was significantly downvoted, as I expected. I naively expected some debate, though--in particular, I expected someone to say, like, "Metzinger and Benatar have already argued this". Anyway, I've revisited site norm guidelines to check my cynicism. I'm trying, sincerely, to write "good content". I'm sure I can do better. I don't mind being downvoted/buried, and I don't mind rude comments. I would, though, genuinely appreciate rational/logical explanations as to why, if I am "not good", or am producing "low quality content", this is so. Please, don't just passively bury my arguments/questions without offering any explanation as to why. That's all I ask. I maintain several controversial philosophical views. I'm not a contrarian or provocateur, but simply a heretic. I can list my controversial views: I support anti-natalism and am against intelligence-augmentation (see my first post); I support free, voluntary access to euthanasia; I am pro-impulsive/irrational-suicide-prevention, but I believe this can only be achieved for suicidally depressed people via existential-pessimism/depressive-realism (that is, I maintain a theory of suicide prevention not endorsed by CBT-dominant psychology, according to which "depression" is deemed inherently delusional--I don't think it is, not necessarily); I think genocide and abuse in general are causally explainable in terms of Terror Management Theory (TMT); I think along the lines of Ellul's technological pessimism (my future predictions are all quite negative); I think along the lines of Kahneman's epistemological pessimism (my view of "intelligence" and "knowledge" is pessimistic, and my predictions regarding "education" are all quite negative); and, perhaps most controversially, I am sympathetic to (though not explicitly endorsing of) negative utilitarian "benevolent world exploder" positions (which, I admit, are dangerous, due to high misinterpretation likelihood and appeal to psychopathic villainy). When I talk, I tend to bum people out, or upset them, without intending to. I have been excommunicated from a church I tried attending for the heresy of "pessimism", literally (official letter and everything). My point is, I really want to think LW is better than the church that excommunicated me as an actual dogma-violating heretic, but so far into posting, it has been less helpful. This intro request is my sincere attempt to alter that trajectory toward useful rational feedback. This post will concern "Strong-Misalignment". I'm aware this is an unpopular view on LW. I can't find any refutation of it, though. I anticipate downvoting. I will not personally downvote any comment, though, whatsoever. Please, explain why I'm wrong. I'd appreciate it.]

 

Q: What is Yudkowsky’s (or anyone doing legit alignment research’s) elevator speech to dispel Strong-Misalignment? (Any/all relevant links appreciated—I can’t find it.)

Elevator Passenger to Alignment Researcher: “You do ‘alignment research’? Ah, fiddling as the world burns, eh?”

Alignment Researcher: “[*insert concise/~5-10 floors’ worth rational rebuttal here--or longer, I'm not wedded to this elevator scenario; just curious as to how that would go down*]”

By “Strong-Misalignment” (SM) I intend the position that at least AI/ML-alignment (if not intelligence-alignment in general—as in, ultimate inescapability of cognitive bias) always has been (like, from the Big Bang onward) and always will be (like, unto cosmic heat death) fundamentally impossible, not merely however-difficult/easy-somebody-claims-it-is, no matter what locally-impressive research they’re pointing to (e.g., RL being provably non-scheming). Strong-misalignment = inescapable misalignment. I guess. “Strong” as in inviolable. Or, at least, SM = “alignment research” should sound something like “anti-gravity research” (as in, this is most likely impossible and therefore most likely a waste of time--that is, we should be doing something else with our remaining time). 

[No need to read beyond here if you have links/actual arguments to share—to script convincingly the Alignment Researcher above—but need to go do whatever else. All links/actual arguments appreciated.] 

I’m thinking of SM in terms of what in mathematics would be the distinction between a “Singular Perturbation Problem” and a “Regular Perturbation Problem”. If alignment is a “regular” problem, working on better and better approximations of alignment is entirely reasonable and necessary. But if the assumption of Less-Strong/Weak-Misalignment (LSM, WM) is a “naive perturbation analysis”, and if the problem is actually a singular perturbation problem, this whole project is doomed from the beginning—it will only ever fail. Right? Sure, this is just my blue-collar “that thing looks like this thing” instinct. I’m no expert on the math here. But I’m not entirely ignorant either. Somebody must be dealing with this? (Or is assuming that just a modest-epistemology-bias?) 

If SM is demonstrably not a “singular perturbation problem”, if it’s disproven, or disprovable, or provably very unlikely—and it seems to me it would have to be, first, before even beginning to take alignment research seriously at all (and it seems Yudkowsky is saying, indeed, we have not even begun collectively to take alignment research seriously?)—where is that proof/argument? Links appreciated, again.  

I’ve been SM-leaning, admittedly, since I started paying attention to alignment discourse about eight years ago and since (Yudkowsky and Bostrom to begin with, and expanding from there). I understand I’m still comparatively a newcomer (except I’m a veteran follower of the issue relative to the only recent Overton window dilation/semi-accommodation, so I’m used to explaining things to folks in my life only because ignorance has been so widespread and problematic), but more importantly I don’t claim (yet?) to be able to conclusively prove SM. I’ve just considered this to be most likely, based on the evidence I’ll list below. I’ve done the pessimist-sideliner grumbling thing this whole time, though, rather than, say, making bets on probability calculations or diligently following actual alignment research (“Silly optimists”, [*weed smoke*, *doom metal*]; this can/could be mere avoidance or intellectual laziness on my part, I’m saying]). Until now, anyway. I’m hoping to get answers to questions by LW posting where I can’t find (or haven’t yet found) any LW posts/comments already answering them. The evidence below is readily available to a layperson like me, which is good (for me, annoying for others having to listen to me). The evidence is then reliably reliable, in that I’m drawing from the most obvious sources/experts, their conclusions are tried and less-wrong, etc. However, I’ve also only paid attention to things readily available to a layperson. I’m a musician who’s worked as a CNC programmer, and I’ve done private coding projects for work and pleasure but that’s it, as far as comp-sci. Everything I’ve studied is just whatever I’ve found to be most relevant and available as I’ve gone autodidactically along (I haven’t followed cutting edge research mostly due to accelerationist malaise [“Wake me up when tech-tinkering kills us all or doesn’t or whatever, just get it the hell over with”], I don’t attend or listen to seminars (it’s too depressing), I’m focused more on pure mathematics/philosophy than computer science, etc.). So maybe I am—like they tell me—just biased pessimistically, unreasonably. Or is this itself modest epistemology (cornered optimism-bias)? You tell me. Links appreciated.   

Having said that, reader, please don’t just claim that SM has obviously been solved/disproven without presenting any proof, or downvote this without offering any explanation below. Help a guy out, you know? If it’s been disproven, great, shoot me a link (or two, or three, or however many are needed). Or if you disagree with my premise that SM would need to be disproven/proven unlikely before seriously pursuing alignment research (as opposed to using one’s remaining time on earth doing things other than alignment research—things like, say, figuring out how to build some kind of underground railroad for young people to flee the U.S. education system where, on top of everything else wrong with the system, they are increasingly being gunned down by mass shootings; but that’s another post, presumably after another downvoting censor interval if my LW trajectory doesn't change radically—bringing up the actual mechanics of defending innocent children is, strangely, taboo, while at the same time being celebrated as an abstract ideal, at nearly all levels of society, as far as I can tell?)…great! Why, though?  

If you think SM is obviously disproven and yet can offer no proof: what do you think you know, and why do you think you know it? (Does asking this work? I mean, I ask it, but…usually it’s just followed by the person/group/whatever rehashing the same irrationality that preceded the question. I suppose it’s still worth asking here. There just doesn’t seem to be any obvious intervention into another’s irrationality that works consistently, that summons the rational intuition from the murky bog of mentation, or else LW members would be using it to obvious effect, right? I don’t know. I know I can strive to be less wrong, myself. Hoping to learn here. Predicting I likely won’t, though, not from posting this, that I’ll be downvoted/ignored/buried for even daring to suggest the idea of SM, which wouldn’t teach me anything other than that humans everywhere--even in explicitly rationality-prioritizing communities--indeed seem as constrained by optimism-bias as I’ve feared, as prone to suppression of heretics as ever, even if we’ve gotten less burn-at-a-literal-stake about it (two weeks' online censorship isn't so bad, of course)—but prove me wrong, by all means. For the love of the intervening God in whom I cannot possibly believe and not for lack of trying and whose reach I agree we are beyond, prove me wrong.) 

Evidence, as I see it, for Strong-Misalignment: 

If you haven’t already downvoted and bailed, thanks. Please enjoy my unapologetically pessimistic conclusion (extra points for not passively downvoting and bailing if you make it through this—except there aren’t any such points in the LW voting system, it seems, because it’s not weighted for rationality/overcoming bias, only for group consensus, so debating anything I say rationally is actually, in terms of the “karma” system, not worth your time): 

That’s my evidence. Three things? That’s it? Yes. I get it, three things sounds like not many things. However, these three things (Computational Irreducibility, Godel Incompleteness/Completeness, Cognitive Bias) are hugely important, well-established, agreed upon concepts whose universal scope is basically taken for granted at this point—the word “universe” regularly comes up with all three of these concepts, as in they apply to everything. Right? These three things may as well be mountain ranges of unimaginable proportions that we’re claiming to be crossing on foot, in the midst of howling winter storms, and everyone is just supposed to blindly follow along without expecting to pull a global Donner Party. These three things may as well be the furthest reaches of space which we are claiming to be escaping to, and everyone’s just supposed to blindly get on the ship without expecting to die in the indignity of already-dead space (as opposed to the dignity of the CAVE--Compassionate, Accessible, Voluntary Euthanasia). There are many ways to go extinct. We appear to be actively choosing some of the worst possible ways, and choosing a better way isn’t even officially a priority. What are our priorities? Well, the best one we have, officially, is “mere survival” (denial of impending extinction)—which is almost always a bad idea. The worst and top priority we have is “survival-at-any-costs”—which is literally always a terrible idea (because, in denying the ultimate impossibility of indefinite survival, this can only result in contradictory thinking, inadequate actions--think "Custom of the Sea"). We’re just believing in alignment because we feel like it. We all know this, deep down. (Links appreciated. Proof, rational argument appreciated.) The alternative is too disturbing—accepting extinction and focusing therefore on ending existing child abuse and nurturing existing children in every way possible while we still can; focusing on administering access to pain-free voluntary death to as many humans as possible (and ideally other animals in a system beyond the animal owner's mere whims, if we can figure out the voluntarism ethics of this—we’re not even officially working on it); accepting that reproduction is itself a form of abuse, a mistake that increases suffering, increases death, and significantly decreases the already very low likelihood of already-existing and completely innocent children getting the care and protection and support they need and deserve. We can’t even officially admit that non-existent children don’t exist, and that existing ones do. We won’t even allow ourselves to go down fighting, collectively, for innocent children. We’d rather go down fighting for at turns uncontrollable and fascistically controlling anti-human tech, fighting for tower building without concern for tower collapse, fighting for the idea of clinging to an illusion of perpetual life at all costs (are we any better than any Tiplerian “Omega Cosmologist”—“We’re going to win!”, Tipler shouts, insanely), ignoring or even funding or even actively perpetrating genocides while clinging, tinkering with tech we know to be deadly beyond compare to anything we’ve ever seen. We are, collectively, as a species, as a global society, completely insane. Thanks for reading.  

3 comments

Comments sorted by top scores.

comment by Jonas Hallgren · 2024-03-16T07:35:10.454Z · LW(p) · GW(p)

Hey! I saw that you had a bunch of downvotes and I wanted to get in here before you came too disilusioned with the LW crowd. I think a big point for me is that you don't really have any sub-headings or examples that are more straight to the point. It is all a long text that seems similar to how you directly thought, this makes it really hard to engage with what you say. Of course you're saying controversial things but if there was more clarity I think you would have more engagement.

(GPT is really op for this nowadays) Anyway, I wish you the best of luck! I'm also sorry for not engaging with any of your arguments but I couldnt quite follow.

Replies from: Benjamin Bourlier
comment by Benjamin Bourlier · 2024-03-16T23:56:24.262Z · LW(p) · GW(p)

Thanks for your comment, I appreciate it! 

That makes sense. I will try to delineate things more clearly, with sub-headings. I admit, my instinctive writing style does more or less reflect my normal train of thought. It can be easy for me to take for granted, and overlook things I assume are clear but aren't to others. Thank you for being helpful in your comment! 

I'm going to try editing this post, and perhaps you'd be willing to give it another read if you care to engage with the arguments. Cheers. 

comment by Mitchell_Porter · 2024-03-30T07:14:11.241Z · LW(p) · GW(p)

Would you say that you yourself have achieved some knowledge of what is true and what is good, despite irreducibility, incompleteness, and cognitive bias? And that was achieved with your own merely human intelligence. The point of AI alignment is not to create something perfect, it is to tilt the superhuman intelligence that is coming, in the direction of good things rather than bad things. If humans can make some progress in the direction of truth and virtue, then super-humans can make further progress.