Posts

'Empiricism!' as Anti-Epistemology 2024-03-14T02:02:59.723Z
My current LK99 questions 2023-08-01T22:48:00.733Z
GPTs are Predictors, not Imitators 2023-04-08T19:59:13.601Z
Pausing AI Developments Isn't Enough. We Need to Shut it All Down 2023-04-08T00:36:47.702Z
Eliezer Yudkowsky's Shortform 2023-04-01T22:43:50.929Z
Manifold: If okay AGI, why? 2023-03-25T22:43:53.820Z
Alexander and Yudkowsky on AGI goals 2023-01-24T21:09:16.938Z
A challenge for AGI organizations, and a challenge for readers 2022-12-01T23:11:44.279Z
Don't use 'infohazard' for collectively destructive info 2022-07-15T05:13:18.642Z
Let's See You Write That Corrigibility Tag 2022-06-19T21:11:03.505Z
AGI Ruin: A List of Lethalities 2022-06-05T22:05:52.224Z
Six Dimensions of Operational Adequacy in AGI Projects 2022-05-30T17:00:30.833Z
ProjectLawful.com: Eliezer's latest story, past 1M words 2022-05-11T06:18:02.738Z
Lies Told To Children 2022-04-14T11:25:10.282Z
MIRI announces new "Death With Dignity" strategy 2022-04-02T00:43:19.814Z
Shah and Yudkowsky on alignment failures 2022-02-28T19:18:23.015Z
Christiano and Yudkowsky on AI predictions and human intelligence 2022-02-23T21:34:55.245Z
Ngo and Yudkowsky on scientific reasoning and pivotal acts 2022-02-21T20:54:53.979Z
(briefly) RaDVaC and SMTM, two things we should be doing 2022-01-12T06:20:35.555Z
Ngo's view on alignment difficulty 2021-12-14T21:34:50.593Z
Conversation on technology forecasting and gradualism 2021-12-09T21:23:21.187Z
More Christiano, Cotra, and Yudkowsky on AI progress 2021-12-06T20:33:12.164Z
Shulman and Yudkowsky on AI progress 2021-12-03T20:05:22.552Z
Biology-Inspired AGI Timelines: The Trick That Never Works 2021-12-01T22:35:28.379Z
Soares, Tallinn, and Yudkowsky discuss AGI cognition 2021-11-29T19:26:33.232Z
Christiano, Cotra, and Yudkowsky on AI progress 2021-11-25T16:45:32.482Z
Yudkowsky and Christiano discuss "Takeoff Speeds" 2021-11-22T19:35:27.657Z
Ngo and Yudkowsky on AI capability gains 2021-11-18T22:19:05.913Z
Ngo and Yudkowsky on alignment difficulty 2021-11-15T20:31:34.135Z
Discussion with Eliezer Yudkowsky on AGI interventions 2021-11-11T03:01:11.208Z
Self-Integrity and the Drowning Child 2021-10-24T20:57:01.742Z
The Point of Trade 2021-06-22T17:56:44.088Z
I'm from a parallel Earth with much higher coordination: AMA 2021-04-05T22:09:24.033Z
A Semitechnical Introductory Dialogue on Solomonoff Induction 2021-03-04T17:27:35.591Z
Your Cheerful Price 2021-02-13T05:41:53.511Z
Movable Housing for Scalable Cities 2020-05-15T21:21:05.395Z
Coherent decisions imply consistent utilities 2019-05-12T21:33:57.982Z
Should ethicists be inside or outside a profession? 2018-12-12T01:40:13.298Z
Transhumanists Don't Need Special Dispositions 2018-12-07T22:24:17.072Z
Transhumanism as Simplified Humanism 2018-12-05T20:12:13.114Z
Is Clickbait Destroying Our General Intelligence? 2018-11-16T23:06:29.506Z
On Doing the Improbable 2018-10-28T20:09:32.056Z
The Rocket Alignment Problem 2018-10-04T00:38:58.795Z
Toolbox-thinking and Law-thinking 2018-05-31T21:28:19.354Z
Meta-Honesty: Firming Up Honesty Around Its Edge-Cases 2018-05-29T00:59:22.084Z
Challenges to Christiano’s capability amplification proposal 2018-05-19T18:18:55.332Z
Local Validity as a Key to Sanity and Civilization 2018-04-07T04:25:46.134Z
Security Mindset and the Logistic Success Curve 2017-11-26T15:58:23.127Z
Security Mindset and Ordinary Paranoia 2017-11-25T17:53:18.049Z
Hero Licensing 2017-11-21T21:13:36.019Z

Comments

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on My Clients, The Liars · 2024-03-11T00:58:41.351Z · LW · GW

Wow, that's fucked up.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on The Commitment Races problem · 2024-02-15T18:29:51.435Z · LW · GW

I am denying that superintelligences play this game in a way that looks like "Pick an ordinal to be your level of sophistication, and whoever picks the higher ordinal gets $9."  I expect sufficiently smart agents to play this game in a way that doesn't incentivize attempts by the opponent to be more sophisticated than you, nor will you find yourself incentivized to try to exploit an opponent by being more sophisticated than them, provided that both parties have the minimum level of sophistication to be that smart.

If faced with an opponent stupid enough to play the ordinal game, of course, you just refuse all offers less than $9, and they find that there's no ordinal level of sophistication they can pick which makes you behave otherwise.  Sucks to be them!

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on The Hidden Complexity of Wishes · 2024-02-15T18:23:55.052Z · LW · GW

You have misunderstood (1) the point this post was trying to communicate and (2) the structure of the larger argument where that point appears, as follows:

First, let's talk about (2), the larger argument that this post's point was supposed to be relevant to.

Is the larger argument that superintelligences will misunderstand what we really meant, due to a lack of knowledge about humans?

It is incredibly unlikely that Eliezer Yudkowsky in particular would have constructed an argument like this, whether in 2007, 2017, or even 1997.  At all of these points in my life, I visibly held quite a lot of respect for the epistemic prowess of superintelligences.  They were always going to know everything relevant about the complexities of human preference and desire.  The larger argument is about whether it's easy to make superintelligences end up caring.

This post isn't about the distinction between knowing and caring, to be clear; that's something I tried to cover elsewhere.  The relevant central divide falls in roughly the same conceptual place as Hume's Guillotine between 'is' and 'ought', or the difference between the belief function and the utility function.

(I don't see myself as having managed to reliably communicate this concept (though the central idea is old indeed within philosophy) to the field that now sometimes calls itself "AI alignment"; so if you understand this distinction yourself, you should not assume that any particulary commentary within "AI alignment" is written from a place of understanding it too.)

What this post is about is the amount of information-theoretic complexity that you need to get into the system's preferences, in order to have that system, given unlimited or rather extremely large amounts of power, deliver to you what you want.

It doesn't argue that superintelligences will not know this information.  You'll note that the central technology in the parable isn't an AI; it's an Outcome Pump.

What it says, rather, is that there might be, say, a few tens of thousands of bits -- the exact number is not easy to estimate, we just need to know that it's more than a hundred bits and less than a billion bits and anything in that range is approximately the same problem from our standpoint -- that you need to get into the steering function.  If you understand the Central Divide that Hume's Razor points to, the distinction between probability and preference, etcetera, the post is trying to establish the idea that we need to get 13,333 bits or whatever into the second side of this divide.

In terms of where this point falls within the larger argument, this post is not saying that it's particularly difficult to get those 13,333 bits into the preference function; for all this post tries to say, locally, maybe that's as easy as having humans manually enter 13,333 yes-or-no answers into the system.  It's not talking about the difficulty of doing the work but rather the amount and nature of a kind of work that needs to be done somehow.

Definitely, the post does not say that it's hard to get those 13,333 bits into the belief function or knowledge of a superintelligence.

Separately from understanding correctly what this post is trying to communicate, at all, in 2007, there's the question of whether modern LLMs have anything to say about -- obviously not the post's original point -- but rather, other steps of the larger question in which this post's point appears.

Modern LLMs, if you present them with a text-based story like the one in this parable, are able to answer at least some text-based questions about whether you'd prefer your grandmother to be outside the building or be safely outside the building.  Let's admit this premised observation at face value.  Have we learned thereby the conclusion that it's easy to get all of that information into a superintelligence's preference function?

And if we say "No", is this Eliezer making up post-hoc excuses?

What exactly we learn from the evidence of how AI has played out in 2024 so far, is the sort of thing that deserves its own post.  But I observe that if you'd asked Eliezer-2007 whether an (Earth-originating) superintelligence could correctly predict the human response pattern about what to do with the grandmother -- solve the same task LLMs are solving, to at least the LLM's performance level -- Eliezer-2007 would have unhesitatingly answered "yes" and indeed "OBVIOUSLY yes".

How is this coherent?  Because the post's point is about how much information needs to get into the preference function.  To predict a human response pattern you need (only) epistemic knowledge.  This is part of why the post is about needing to give specifications to an Outcome Pump, rather than it depicting an AI being surprised by its continually incorrect predictions about a human response pattern.

If you don't see any important distinction between the two, then of course you'll think that it's incoherent to talk about that distinction.  But even if you think that Hume was mistaken about there existing any sort of interesting gap between 'is' and 'ought', you might by some act of empathy be able to imagine that other people think there's an interesting subject matter there, and they are trying to talk about it with you; otherwise you will just flatly misunderstand what they were trying to say, and mispredict their future utterances.  There's a difference between disagreeing with a point, and just flatly failing to get it, and hopefully you aspire to the first state of mind rather than the second.

Have we learned anything stunningly hopeful from modern pre-AGIs getting down part of the epistemic part of the problem at their current ability levels, to the kind of resolution that this post talked about in 2007?  Or from it being possible to cajole pre-AGIs with loss functions into willingly using that knowledge to predict human text outputs?  Some people think that this teaches us that alignment is hugely easy.  I think they are mistaken, but that would take its own post to talk about.

But people who point to "The Hidden Complexity of Wishes" and say of it that it shows that I had a view which the current evidence already falsifies -- that I predicted that no AGI would ever be able to predict human response patterns about getting grandmothers out of burning buildings -- have simply: misunderstood what the post is about, not understood in particular why the post is about an Outcome Pump rather than an AI stupidly mispredicting human responses, and failed to pick up on the central point that Eliezer expects superintelligences to be smart in the sense of making excellent purely epistemic predictions.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Updatelessness doesn't solve most problems · 2024-02-14T22:26:46.088Z · LW · GW

This deserves a longer answer than I have time to allocate it, but I quickly remark that I don't recognize the philosophy or paradigm of updatelessness as refusing to learn things or being terrified of information; a rational agent should never end up in that circumstance, unless some perverse other agent is specifically punishing them for having learned the information (and will lose of their own value thereby; it shouldn't be possible for them to gain value by behaving "perversely" in that way, for then of course it's not "perverse").  Updatelessness is, indeed, exactly that sort of thinking which prevents you from being harmed by information, because your updateless exposure to information doesn't cause you to lose coordination with your counterfactual other selves or exhibit dynamic inconsistency with your past self.

From an updateless standpoint, "learning" is just the process of reacting to new information the way your past self would want you to do in that branch of possibility-space; you should never need to remain ignorant of anything.  Maybe that involves not doing the thing that would then be optimal when considering only the branch of reality you turned out to be inside, but the updateless mind denies that this was ever the principle of rational choice, and so feels no need to stay ignorant in order to maintain dynamic consistency.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-28T17:53:32.280Z · LW · GW

They can solve it however they like, once they're past the point of expecting things to work that sometimes don't work.  I have guesses but any group that still needs my hints should wait and augment harder.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-27T18:39:16.037Z · LW · GW

I disagree with my characterization as thinking problems can be solved on paper, and with the name "Poet".  I think the problems can't be solved by twiddling systems weak enough to be passively safe, and hoping their behavior generalizes up to dangerous levels.  I don't think paper solutions will work either, and humanity needs to back off and augment intelligence before proceeding.  I do not take the position that we need a global shutdown of this research field because I think that guessing stuff without trying it is easy, but because guessing it even with some safe weak lesser tries is still impossibly hard.  My message to humanity is "back off and augment" not "back off and solve it with a clever theory".

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on MIRI announces new "Death With Dignity" strategy · 2023-12-24T00:40:26.260Z · LW · GW

Not what comes up for me, when I go incognito and google AI risk lesswrong.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-10T19:39:06.194Z · LW · GW

I rather expect that existing robotic machinery could be controlled by ASI rather than "moderately smart intelligence" into picking up the pieces of a world economy after it collapses, or that if for some weird reason it was trying to play around with static-cling spaghetti It could pick up the pieces of the economy that way too.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-07T17:37:03.861Z · LW · GW

It's false that currently existing robotic machinery controlled by moderately smart intelligence can pick up the pieces of a world economy after it collapses.  One well-directed algae cell could, but not existing robots controlled by moderate intelligence.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-07T01:20:06.240Z · LW · GW

What does this operationalize as?  Presumably not that if we load a bone and a diamond rod under equal pressures, the diamond rod breaks first?  Is it more about if we drop sudden sharp weights onto a bone rod and a diamond rod, the diamond rod breaks first?  I admit I hadn't expected that, despite a general notion that diamond is crystal and crystals are unexpectedly fragile against particular kinds of hits, and if so that modifies my sense of what's a valid metaphor to use.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-06T20:03:49.626Z · LW · GW

"Pandemics" aren't a locally valid substitute step in my own larger argument, because an ASI needs its own manufacturing infrastructure before it makes sense for the ASI to kill the humans currently keeping its computers turned on.  So things that kill a bunch of humans are not a valid substitute for being able to, eg, take over and repurpose the existing solar-powered micron-diameter self-replicating factory systems, aka algae, and those repurposed algae being able to build enough computing substrate to go on running the ASI after the humans die.

It's possible this argument can and should be carried without talking about the level above biology, but I'm nervous that this causes people to start thinking in terms of Hollywood movie plots about defeating pandemics and hunting down the AI's hidden cave of shoggoths, rather than hearing, "And this is a lower bound but actually in real life you just fall over dead."

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-06T18:32:19.662Z · LW · GW

Why is flesh weaker than diamond?  Diamond is made of carbon-carbon bonds.  Proteins also have some carbon-carbon bonds!  So why should a diamond blade be able to cut skin?

I reply:  Because the strength of the material is determined by its weakest link, not its strongest link.  A structure of steel beams held together at the vertices by Scotch tape (and lacking other clever arrangements of mechanical advantage) has the strength of Scotch tape rather than the strength of steel.

Or:  Even when the load-bearing forces holding large molecular systems together are locally covalent bonds, as in lignin (what makes wood strong), if you've got larger molecules only held together by covalent bonds at interspersed points along their edges, that's like having 10cm-diameter steel beams held together by 1cm welds.  Again, barring other clever arrangements of mechanical advantage, that structure has the strength of 1cm of steel rather than 10cm of steel.

Bone is stronger than wood; it runs on a relatively stronger structure of ionic bonds, which are no locally weaker than carbon bonds in terms of attojoules of potential energy per bond.  Bone is weaker than diamond, then, because... why?

Well, partially, IIUC, because calcium atoms are heavier than carbon atoms.  So even if per-bond the ionic forces are strong, some of that is lost in the price you pay for including heavier atoms whose nuclei have more protons that are able to exert the stronger electrical forces making up that stronger bond.

But mainly, bone is so much weaker than diamond (on my understanding) because the carbon bonds in diamond have a regular crystal structure that locks the carbon atoms into relative angles, and in a solid diamond this crystal structure is tesselated globally.  Hydroxyapatite (the crystal part of bone) also tesselates in an energetically favorable configuration; but (I could be wrong about this) it doesn't have the same local resistance to local deformation; and also, the actual hydroxyapatite crystal is assembled by other tissues that layer the ionic components into place, which means that a larger structure of bone is full of fault lines.  Bone cleaves along the weaker fault line, not at its strongest point.

But then, why don't diamond bones exist already?  Not just for the added strength; why make the organism look for calcium and phosphorus instead of just carbon?

The search process of evolutionary biology is not the search of engineering; natural selection can only access designs via pathways of incremental mutations that are locally advantageous, not intelligently designed simultaneous changes that compensate for each other.  There were, last time I checked, only three known cases where evolutionary biology invented the freely rotating wheel.  Two of those known cases are ATP synthase and the bacterial flagellum, which demonstrates that freely rotating wheels are in fact incredibly useful in biology, and are conserved when biology stumbles across them after a few hundred million years of search.  But there's no use for a freely rotating wheel without a bearing and there's no use for a bearing without a freely rotating wheel, and a simultaneous dependency like that is a huge obstacle to biology, even though it's a hardly noticeable obstacle to intelligent engineering.

The entire human body, faced with a strong impact like being gored by a rhinocerous horn, will fail at its weakest point, not its strongest point.  How much evolutionary advantage is there to stronger bone, if what fails first is torn muscle?  How much advantage is there to an impact-resistant kidney, if most fights that destroy a kidney will kill you anyways?  Evolution is not the sort of optimizer that says, "Okay, let's design an entire stronger body."  (Analogously, the collection of faults that add up to "old age" is large enough that a little more age resistance in one place is not much of an advantage if other aging systems or outward accidents will soon kill you anyways.)

I don't even think we have much of a reason to believe that it'd be physically (rather than informationally) difficult to have a set of enzymes that synthesize diamond.  It could just require 3 things to go right simultaneously, and so be much much harder to stumble across than tossing more hydroxyapatite to lock into place in a bone crystal.  And then even if somehow evolution hit on the right set of 3 simultaneous mutations, sometime over the history of Earth, the resulting little isolated chunk of diamond probably would not be somewhere in the phenotype that had previously constituted the weakest point in a mechanical system that frequently failed.  If evolution has huge difficulty inventing wheels, why expect that it could build diamond chainmail, even assuming that diamond chainmail is physically possible and could be useful to an organism that had it?

Talking to the general public is hard.  The first concept I'm trying to convey to them is that there's an underlying physical, mechanical reason that flesh is weaker than diamond; and that this reason isn't that things animated by vitalic spirit, elan vital, can self-heal and self-reproduce at the cost of being weaker than the cold steel making up lifeless machines, as is the price of magic imposed by the universe to maintain game balance.  This is a very natural way for humans to think; and the thing I am trying to come in and do is say, "Actually, no, it's not a mystical balance, it's that diamond is held together by bonds that are hundreds of kJ/mol; and the mechanical strength of proteins is determined by forces a hundred times as weak as that, the part where proteins fold up like spaghetti held together by static cling."

There is then a deeper story that's even harder to explain, about why evolution doesn't build freely rotating wheels or diamond chainmail; why evolutionary design doesn't find the physically possible stronger systems.  But first you need to give people a mechanical intuition for why, in a very rough intuitive sense, it is physically possible to have stuff that moves and lives and self-repairs but is strong like diamond instead of flesh, without this violating a mystical balance where the price of vitalic animation is lower material strength.

And that mechanical intuition is:  Deep down is a bunch of stuff that, if you could see videos of it, would look more like tiny machines than like magic, though they would not look like familiar machines (very few freely rotating wheels).  Then why aren't these machines strong like human machines of steel are strong?  Because iron atoms are stronger than carbon atoms?  Actually no, diamond is made of carbon and that's still quite strong.  The reason is that these tiny systems of machinery are held together (at the weakest joints, not the strongest joints!) by static cling.

And then the deeper question:  Why does evolution build that way?  And the deeper answer:  Because everything evolution builds is arrived at as an error, a mutation, from something else that it builds.  Very tight bonds fold up along very deterministic pathways.  So (in the average case, not every case) the neighborhood of functionally similar designs is densely connected along shallow energy gradients and sparsely connected along deep energy gradients.  Intelligence can leap long distances through that design space using coordinated changes, but evolutionary exploration usually cannot.

And I do try to explain that too.  But it is legitimately more abstract and harder to understand.  So I lead with the idea that proteins are held together by static cling.  This is, I think, validly the first fact you lead with if the audience does not already know it, and just has no clue why anyone could possibly possibly think that there might even be machinery that does what bacterial machinery does but better.  The typical audience is not starting out with the intuition that one would naively think that of course you could put together stronger molecular machinery, given the physics of stronger bonds, and then we debate whether (as I believe) the naive intuition is actually just valid and correct; they don't understand what the naive intuition is about, and that's the first thing to convey.

If somebody then says, "How can you be so ignorant of chemistry?  Some atoms in protein are held together by covalent bonds, not by static cling!  There's even eg sulfur bonds whereby some parts of the folded-spaghetti systems end up glued together with real glue!" then this does not validly address the original point because: the underlying point about why flesh is more easily cleaved than diamond, is about the weakest points of flesh rather than the strongest points in flesh, because that's what determines the mechanical strength of the larger system.

I think there is an important way of looking at questions like these where, at the final end, you ask yourself, "Okay, but does my argument prove that flesh is in fact as strong as diamond?  Why isn't flesh as strong as diamond, then, if I've refuted the original argument for why it isn't?" and this is the question that leads you to realize that some local strong covalent bonds don't matter to the argument if those bonds aren't the parts that break under load.

My main moral qualm about using the Argument From Folded Spaghetti Held Together By Static Cling as an intuition pump is that the local ionic bonds in bone are legitimately as strong per-bond as the C-C bonds in diamond, and the reason that bone is weaker than diamond is (iiuc) actually more about irregularity, fault lines, and resistance to local deformation than about kJ/mol of the underlying bonds.  If somebody says "Okay, fine, you've validly explained why flesh is weaker than diamond, but why is bone weaker than diamond?" I have to reply "Valid, iiuc that's legit more about irregularity and fault lines and interlaced weaker superstructure and local deformation resistance of the bonds, rather than the raw potential energy deltas of the load-bearing welds."

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Ngo and Yudkowsky on alignment difficulty · 2023-11-26T22:47:06.270Z · LW · GW

Depends on how much of a superintelligence, how implemented.  I wouldn't be surprised if somebody got far superhuman theorem-proving from a mind that didn't generalize beyond theorems.  Presuming you were asking it to prove old-school fancy-math theorems, and not to, eg, arbitrarily speed up a bunch of real-world computations like asking it what GPT-4 would say about things, etc.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on The Parable of the Dagger · 2023-11-06T17:52:06.843Z · LW · GW

Solution (in retrospect this should've been posted a few years earlier):

let
'Na' = box N contains angry frog
'Ng' = N gold
'Nf' = N's inscription false
'Nt' = N's inscription true

consistent states must have 1f 2t or 1t 2f, and 1a 2g or 1g 2a

then:

1a 1t, 2g 2f => 1t, 2f
1a 1f, 2g 2t => 1f, 2t
1g 1t, 2a 2f => 1t, 2t
1g 1f, 2a 2t => 1f, 2f

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Does davidad's uploading moonshot work? · 2023-11-03T21:19:00.936Z · LW · GW

I currently guess that a research community of non-upgraded alignment researchers with a hundred years to work, picks out a plausible-sounding non-solution and kills everyone at the end of the hundred years.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Does davidad's uploading moonshot work? · 2023-11-03T18:15:52.225Z · LW · GW

I don't think that faster alignment researchers get you to victory, but uploading should also allow for upgrading and while that part is not trivial I expect it to work.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Evaluating the historical value misspecification argument · 2023-10-06T07:24:36.828Z · LW · GW

AI happening through deep learning at all is a huge update against alignment success, because deep learning is incredibly opaque.  LLMs possibly ending up at the center is a small update in favor of alignment success, because it means we might (through some clever sleight, this part is not trivial) be able to have humanese sentences play an inextricable role at the center of thought (hence MIRI's early interest in the Visible Thoughts Project).

The part where LLMs are to predict English answers to some English questions about values, and show common-sense relative to their linguistic shadow of the environment as it was presented to them by humans within an Internet corpus, is not actually very much hope because a sane approach doesn't involve trying to promote an LLM's predictive model of human discourse about morality to be in charge of a superintelligence's dominion of the galaxy.  What you would like to promote to values are concepts like "corrigibility", eg "low impact" or "soft optimization", which aren't part of everyday human life and aren't in the training set because humans do not have those values.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Evaluating the historical value misspecification argument · 2023-10-05T20:37:11.761Z · LW · GW

I have never since 1996 thought that it would be hard to get superintelligences to accurately model reality with respect to problems as simple as "predict what a human will thumbs-up or thumbs-down".  The theoretical distinction between producing epistemic rationality (theoretically straightforward) and shaping preference (theoretically hard) is present in my mind at every moment that I am talking about these issues; it is to me a central divide of my ontology.

If you think you've demonstrated by clever textual close reading that Eliezer-2018 or Eliezer-2008 thought that it would be hard to get a superintelligence to understand humans, you have arrived at a contradiction and need to back up and start over.

The argument we are trying to explain has an additional step that you're missing.  You think that we are pointing to the hidden complexity of wishes in order to establish in one step that it would therefore be hard to get an AI to output a correct wish shape, because the wishes are complex, so it would be difficult to get an AI to predict them.  This is not what we are trying to say.  We are trying to say that because wishes have a lot of hidden complexity, the thing you are trying to get into the AI's preferences has a lot of hidden complexity.  This makes the nonstraightforward and shaky problem of getting a thing into the AI's preferences, be harder and more dangerous than if we were just trying to get a single information-theoretic bit in there.  Getting a shape into the AI's preferences is different from getting it into the AI's predictive model.  MIRI is always in every instance talking about the first thing and not the second.

You obviously need to get a thing into the AI at all, in order to get it into the preferences, but getting it into the AI's predictive model is not sufficient.  It helps, but only in the same sense that having low-friction smooth ball-bearings would help in building a perpetual motion machine; the low-friction ball-bearings are not the main problem, they are a kind of thing it is much easier to make progress on compared to the main problem.  Even if, in fact, the ball-bearings would legitimately be part of the mechanism if you could build one!  Making lots of progress on smoother, lower-friction ball-bearings is even so not the sort of thing that should cause you to become much more hopeful about the perpetual motion machine.  It is on the wrong side of a theoretical divide between what is straightforward and what is not.

You will probably protest that we phrased our argument badly relative to the sort of thing that you could only possibly be expected to hear, from your perspective.  If so this is not surprising, because explaining things is very hard.  Especially when everyone in the audience comes in with a different set of preconceptions and a different internal language about this nonstandardized topic.  But mostly, explaining this thing is hard and I tried taking lots of different angles on trying to get the idea across.

In modern times, and earlier, it is of course very hard for ML folk to get their AI to make completely accurate predictions about human behavior.  They have to work very hard and put a lot of sweat into getting more accurate predictions out!  When we try to say that this is on the shallow end of a shallow-deep theoretical divide (corresponding to Hume's Razor) it often sounds to them like their hard work is being devalued and we could not possibly understand how hard it is to get an AI to make good predictions.

Now that GPT-4 is making surprisingly good predictions, they feel they have learned something very surprising and shocking!  They cannot possibly hear our words when we say that this is still on the shallow end of a shallow-deep theoretical divide!  They think we are refusing to come to grips with this surprising shocking thing and that it surely ought to overturn all of our old theories; which were, yes, phrased and taught in a time before GPT-4 was around, and therefore do not in fact carefully emphasize at every point of teaching how in principle a superintelligence would of course have no trouble predicting human text outputs.  We did not expect GPT-4 to happen, in fact, intermediate trajectories are harder to predict than endpoints, so we did not carefully phrase all our explanations in a way that would make them hard to misinterpret after GPT-4 came around.

But if you had asked us back then if a superintelligence would automatically be very good at predicting human text outputs, I guarantee we would have said yes.  You could then have asked us in a shocked tone how this could possibly square up with the notion of "the hidden complexity of wishes" and we could have explained that part in advance.  Alas, nobody actually predicted GPT-4 so we do not have that advance disclaimer down in that format.  But it is not a case where we are just failing to process the collision between two parts of our belief system; it actually remains quite straightforward theoretically.  I wish that all of these past conversations were archived to a common place, so that I could search and show you many pieces of text which would talk about this critical divide between prediction and preference (as I would now term it) and how I did in fact expect superintelligences to be able to predict things!

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on I'm from a parallel Earth with much higher coordination: AMA · 2023-10-02T02:50:46.242Z · LW · GW

There's perhaps more detail in Project Lawful and in some nearby stories ("for no laid course prepare", "aviation is the most dangerous routine activity").

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on A Contamination Theory of the Obesity Epidemic · 2023-08-07T00:20:11.075Z · LW · GW

Have you ever seen or even heard of a person who is obese who doesn't eat hyperpalatable foods? (That is, they only eat naturally tasting, unprocessed, "healthy" foods).

Tried this for many years.  Paleo diet; eating mainly broccoli and turkey; trying to get most of my calories from giant salads.  Nothing.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on UFO Betting: Put Up or Shut Up · 2023-07-30T00:25:05.899Z · LW · GW

Received $95.51.  :)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on UFO Betting: Put Up or Shut Up · 2023-07-29T18:57:43.894Z · LW · GW

I am not - $150K is as much as I care to stake at my present weath levels - and while I refunded your payment, I was charged a $44.90 fee on the original transmission which was not then refunded to me.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on UFO Betting: Put Up or Shut Up · 2023-07-21T18:28:56.313Z · LW · GW

Though I disagree with @RatsWrongAboutUAP (see this tweet) and took the other side of the bet, I say a word of praise for RatsWrong about following exactly the proper procedure to make the point they wanted to make, and communicating that they really actually think we're wrong here.  Object-level disagreement, meta-level high-five.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on UFO Betting: Put Up or Shut Up · 2023-07-21T17:11:18.690Z · LW · GW

Received.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on UFO Betting: Put Up or Shut Up · 2023-07-20T01:02:21.880Z · LW · GW

My $150K against your $1K if you're still up for it at 150:1.  Paypal to yudkowsky@gmail.com with "UFO bet" in subject or text, please include counterparty payment info if it's not "email the address which sent me that payment".

Key qualifier:  This applies only to UFOs spotted before July 19th, 2023, rather than applying to eg future UFOs generated by secret AI projects which were not putatively flying around and spotted before July 19th, 2023.

ADDED:  $150K is as much as I care to stake at my current wealth level, to rise to this bettors' challenge and make this point; not taking on further bets except at substantially less extreme odds.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on The Commitment Races problem · 2023-07-17T04:30:25.767Z · LW · GW

TBC, I definitely agree that there's some basic structural issue here which I don't know how to resolve.  I was trying to describe properties I thought the solution needed to have, which ruled out some structural proposals I saw as naive; not saying that I had a good first-principles way to arrive at that solution.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Making Nanobots isn't a one-shot process, even for an artificial superintelligance · 2023-06-07T16:27:34.306Z · LW · GW

At the superintelligent level there's not a binary difference between those two clusters.  You just compute each thing you need to know efficiently.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Cosmopolitan values don't come free · 2023-06-02T05:27:13.641Z · LW · GW

I sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to me to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don't expect Earthlings to think about validly.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Making Nanobots isn't a one-shot process, even for an artificial superintelligance · 2023-06-02T03:47:14.957Z · LW · GW

Lacking time right now for a long reply:  The main thrust of my reaction is that this seems like a style of thought which would have concluded in 2008 that it's incredibly unlikely for superintelligences to be able to solve the protein folding problem.  People did, in fact, claim that to me in 2008.  It furthermore seemed to me in 2008 that protein structure prediction by superintelligence was the hardest or least likely step of the pathway by which a superintelligence ends up with nanotech; and in fact I argued only that it'd be solvable for chosen special cases of proteins rather than biological proteins because the special-case proteins could be chosen to have especially predictable pathways.  All those wobbles, all those balanced weak forces and local strange gradients along potential energy surfaces!  All those nonequilibrium intermediate states, potentially with fragile counterfactual dependencies on each interim stage of the solution!  If you were gonna be a superintelligence skeptic, you might have claimed that even chosen special cases of protein folding would be unsolvable.  The kind of argument you are making now, if you thought this style of thought was a good idea, would have led you to proclaim that probably a superintelligence could not solve biological protein folding and that AlphaFold 2 was surely an impossibility and sheer wishful thinking.

If you'd been around then, and said, "Pre-AGI ML systems will be able to solve general biological proteins via a kind of brute statistical force on deep patterns in an existing database of biological proteins, but even superintelligences will not be able to choose special cases of such protein folding pathways to design de novo synthesis pathways for nanotechnological machinery", it would have been a very strange prediction, but you would now have a leg to stand on.  But this, I most incredibly doubt you would have said - the style of thinking you're using would have predicted much more strongly, in 2008 when no such thing had been yet observed, that pre-AGI ML could not solve biological protein folding in general, than that superintelligences could not choose a few special-case solvable de novo folding pathways along sharper potential energy gradients and with intermediate states chosen to be especially convergent and predictable.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Preface · 2023-05-30T18:51:45.284Z · LW · GW

Nope.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on AI Fables · 2023-05-21T22:14:44.614Z · LW · GW

Well, one sink to avoid here is neutral-genie stories where the AI does what you asked, but not what you wanted.  That's something I wrote about myself, yes, but that was in the era before deep learning took over everything, when it seemed like there was a possibility that humans would be in control of the AI's preferences.  Now neutral-genie stories are a mindsink for a class of scenarios where we have no way to achieve entrance into those scenarios; we cannot make superintelligences want particular things or give them particular orders - cannot give them preferences in a way that generalizes to when they become smarter.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-05-07T19:53:21.170Z · LW · GW

Okay, if you're not saying GPUs are getting around as efficient as the human brain, without much more efficiency to be eeked out, then I straightforwardly misunderstood that part.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-05-07T01:42:04.210Z · LW · GW

Nothing about any of those claims explains why the 10,000-fold redundancy of neurotransmitter molecules and ions being pumped in and out of the system is necessary for doing the alleged complicated stuff.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-05-07T01:40:28.617Z · LW · GW

Further item of "these elaborate calculations seem to arrive at conclusions that can't possibly be true" - besides the brain allegedly being close to the border of thermodynamic efficiency, despite visibly using tens of thousands of redundant physical ops in terms of sheer number of ions and neurotransmitters pumped; the same calculations claim that modern GPUs are approaching brain efficiency, the Limit of the Possible, so presumably at the Limit of the Possible themselves.

This source claims 100x energy efficiency from substituting some basic physical analog operations for multiply-accumulate, instead of digital transistor operations about them, even if you otherwise use actual real-world physical hardware.  Sounds right to me; it would make no sense for such a vastly redundant digital computation of such a simple physical quantity to be anywhere near the borders of efficiency!  https://spectrum.ieee.org/analog-ai

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-29T01:25:01.821Z · LW · GW

This does not explain how thousands of neurotransmitter molecules impinging on a neuron and thousands of ions flooding into and out of cell membranes, all irreversible operations, in order to transmit one spike, could possibly be within one OOM of the thermodynamic limit on efficiency for a cognitive system (running at that temperature).

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-29T01:18:04.397Z · LW · GW

And it says:

So true 8-bit equivalent analog multiplication requires about 100k carriers/switches

This just seems utterly wack.  Having any physical equivalent of an analog multiplication fundamentally requires 100,000 times the thermodynamic energy to erase 1 bit?  And "analog multiplication down to two decimal places" is the operation that is purportedly being carried out almost as efficiently as physically possible by... an axon terminal with a handful of synaptic vesicles dumping 10,000 neurotransmitter molecules to flood around a dendritic terminal (molecules which will later need to be irreversibly pumped back out), which in turn depolarizes and starts flooding thousands of ions into a cell membrane (to be later pumped out) in order to transmit the impulse at 1m/s?  That's the most thermodynamically efficient a physical cognitive system can possibly be?  This is approximately the most efficient possible way to turn all those bit erasures into thought?

This sounds like physical nonsense that fails a basic sanity check.  What am I missing?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-27T22:34:29.861Z · LW · GW

I'm confused at how somebody ends up calculating that a brain - where each synaptic spike is transmitted by ~10,000 neurotransmitter molecules (according to a quick online check), which then get pumped back out of the membrane and taken back up by the synapse; and the impulse is then shepherded along cellular channels via thousands of ions flooding through a membrane to depolarize it and then getting pumped back out using ATP, all of which are thermodynamically irreversible operations individually - could possibly be within three orders of magnitude of max thermodynamic efficiency at 300 Kelvin.  I have skimmed "Brain Efficiency" though not checked any numbers, and not seen anything inside it which seems to address this sanity check.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-09T19:00:23.331Z · LW · GW

Nobody in the US cared either, three years earlier.  That superintelligence will kill everyone on Earth is a truth, and once which has gotten easier and easier to figure out over the years.  I have not entirely written off the chance that, especially as the evidence gets more obvious, people on Earth will figure out this true fact and maybe even do something about it and survive.  I likewise am not assuming that China is incapable of ever figuring out this thing that is true.  If your opinion of Chinese intelligence is lower than mine, you are welcome to say, "Even if this is true and the West figures out that it is true, the CCP could never come to understand it".  That could even be true, for all I know, but I do not have present cause to believe it.  I definitely don't believe it about everyone in China; if it were true and a lot of people in the West figured it out, I'd expect a lot of individual people in China to see it too.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on GPTs are Predictors, not Imitators · 2023-04-09T18:56:04.215Z · LW · GW

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs

I didn't say that GPT's task is harder than any possible perspective on a form of work you could regard a human brain as trying to do; I said that GPT's task is harder than being an actual human; in other words, being an actual human is not enough to solve GPT's task.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-09T04:43:52.055Z · LW · GW

If diplomacy failed, but yes, sure.  I've previously wished out loud for China to sabotage US AI projects in retaliation for chip export controls, in the hopes that if all the countries sabotage all the other countries' AI projects, maybe Earth as a whole can "uncoordinate" to not build AI even if Earth can't coordinate.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Manifold: If okay AGI, why? · 2023-03-29T20:29:02.878Z · LW · GW

Arbitrary and personal.  Given how bad things presently look, over 20% is about the level where I'm like "Yeah okay I will grab for that" and much under 20% is where I'm like "Not okay keep looking."

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T22:46:44.257Z · LW · GW

Choosing to engage with an unscripted unrehearsed off-the-cuff podcast intended to introduce ideas to a lay audience, continues to be a surprising concept to me.  To grapple with the intellectual content of my ideas, consider picking one item from "A List of Lethalities" and engaging with that.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T09:29:18.230Z · LW · GW

The "strongest" foot I could put forwards is my response to "On current AI not being self-improving:", where I'm pretty sure you're just wrong.

You straightforwardly completely misunderstood what I was trying to say on the Bankless podcast:  I was saying that GPT-4 does not get smarter each time an instance of it is run in inference mode.

And that's that, I guess.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T08:27:52.590Z · LW · GW

This is kinda long.  If I had time to engage with one part of this as a sample of whether it holds up to a counterresponse, what would be the strongest foot you could put forward?

(I also echo the commenter who's confused about why you'd reply to the obviously simplified presentation from an off-the-cuff podcast rather than the more detailed arguments elsewhere.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on There are no coherence theorems · 2023-03-13T02:46:56.661Z · LW · GW

Things are dominated when they forego free money and not just when money gets pumped out of them.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on There are no coherence theorems · 2023-03-08T07:23:40.082Z · LW · GW

Suppose I describe your attempt to refute the existence of any coherence theorems:  You point to a rock, and say that although it's not coherent, it also can't be dominated, because it has no preferences.  Is there any sense in which you think you've disproved the existence of coherence theorems, which doesn't consist of pointing to rocks, and various things that are intermediate between agents and rocks in the sense that they lack preferences about various things where you then refuse to say that they're being dominated?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on There are no coherence theorems · 2023-03-07T00:43:18.916Z · LW · GW

I want you to give me an example of something the agent actually does, under a couple of different sense inputs, given what you say are its preferences, and then I want you to gesture at that and say, "Lo, see how it is incoherent yet not dominated!"

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on AGI in sight: our look at the game board · 2023-03-07T00:41:55.646Z · LW · GW

If you think you've got a great capabilities insight, I think you PM me or somebody else you trust and ask if they think it's a big capabilities insight.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on There are no coherence theorems · 2023-03-01T07:21:43.501Z · LW · GW

In the limit, you take a rock, and say, "See, the complete class theorem doesn't apply to it, because it doesn't have any preferences ordered about anything!"  What about your argument is any different from this - where is there a powerful, future-steering thing that isn't viewable as Bayesian and also isn't dominated?  Spell it out more concretely:  It has preferences ABC, two things aren't ordered, it chooses X and then Y, etc.  I can give concrete examples for my views; what exactly is a case in point of anything you're claiming about the Complete Class Theorem's supposed nonapplicability and hence nonexistence of any coherence theorems?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on There are no coherence theorems · 2023-02-28T08:26:49.058Z · LW · GW

And this avoids the Complete Class Theorem conclusion of dominated strategies, how? Spell it out with a concrete example, maybe? Again, we care about domination, not representability at all.