Posts

Evidential Cooperation in Large Worlds: Potential Objections & FAQ 2024-02-28T18:58:25.688Z
Everett branches, inter-light cone trade and other alien matters: Appendix to “An ECL explainer” 2024-02-24T23:09:27.147Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
AI Risk & Policy Forecasts from Metaculus & FLI's AI Pathways Workshop 2023-05-16T18:06:54.931Z

Comments

Comment by _will_ (Will Aldred) on Six Plausible Meta-Ethical Alternatives · 2025-01-28T13:43:20.385Z · LW · GW

Great post! I find myself coming back to it—especially possibility 5—as I sit here in 2025 thinking/worrying about AI philosophical competence and the long reflection.

On 6,[1] I’m curious if you’ve seen this paper by Joar Skalse? It begins:

I present an argument and a general schema which can be used to construct a problem case for any decision theory, in a way that could be taken to show that one cannot formulate a decision theory that is never outperformed by any other decision theory.

  1. ^

    Pasting here for easy reference (emphasis my own):

    6. There aren’t any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one “wins” overall.

Comment by _will_ (Will Aldred) on johnswentworth's Shortform · 2025-01-11T10:44:15.787Z · LW · GW

See also ‘The Main Sources of AI Risk?’ by Wei Dai and Daniel Kokotajlo, which puts forward 35 routes to catastrophe (most of which are disjunctive). (Note that many of the routes involve something other than intent alignment going wrong.)

Comment by _will_ (Will Aldred) on The Field of AI Alignment: A Postmortem, and What To Do About It · 2025-01-04T16:50:17.285Z · LW · GW

Any chance you have a link to this tweet? (I just tried control+f'ing through @Richard's tweets over the past 5 months, but couldn't find it.)

Comment by _will_ (Will Aldred) on RobertM's Shortform · 2024-12-06T11:36:06.538Z · LW · GW

On your second point, I think that MacAskill and Ord were more saying “It would be worth it to spend thousands of years figuring out moral philosophy / figuring out what to do with the cosmos, if that’s how long it takes to be ~sure we’ve reached the ‘correct’ answer before locking things in, on account of the astronomical waste argument” than “I literally predict it will take today-humans thousands of years to figure out moral philosophy, even if we make a serious and coordinated effort to do so.” Somewhat relatedly, quoting from the ‘Long Reflection Reading List’ I wrote earlier this year (fn. 4):

Original discussion of the long reflection indicated that it could be a lengthy process of 10,000 years or more. More recent discussion I’m aware of, which is nonpublic, hence no corresponding reading, i) takes seriously the possibility that the long reflection could last just weeks rather than years or millenia, and ii) notes that wall clock time is probably not the most useful way to think about the length of reflection, given that the reflection process, if it happens at all, will likely involve many superfast AIs doing the bulk of the cognitive labor.

On your first point, I continue to be curious about your perspective. I basically agree with the following (written by Zach Stein-Perlman), but, based on what you said in your parentheses, it sounds like you view it as a bad plan?

The outline of the best [post-AGI] plan I’ve heard is build human-obsoleting AIs which are sufficiently aligned/trustworthy that we can safely defer[1] to them (before building wildly superintelligent AI). Assume it will take 5-10 years after AGI to build such systems and give them sufficient time. To buy time (or: avoid being rushed by other AI projects[2]), inform the US government and convince it to enforce nobody builds wildly superintelligent AI for a while (and likely limit AGI weights to allied projects with excellent security and control).

(I could be off, but it sounds like either you expect solving AI philosophical competence to come pretty much hand in hand with solving intent alignment (because you see them as similar technical problems?), or you expect not solving AI philosophical competence (while having solved intent alignment) to lead to catastrophe (thus putting us outside the worlds in which x-risks are reliably ‘solved’ for), perhaps in the way Wei Dai has talked about?)

  1. ^

    We don't need these human-obsoleting AIs to be able to implement CEV. We want to be able to defer to them on tricky wisdom-loaded questions like what should we do about the overall AI situation? They can ask us questions as needed.

  2. ^

    To avoid being rushed by your own AI project, you also have to ensure that your AI can't be stolen and can't escape, so you have to implement excellent security and control.

Comment by _will_ (Will Aldred) on MIRI 2024 Communications Strategy · 2024-10-30T04:40:12.440Z · LW · GW

Thanks, that’s helpful!

(Fwiw, I don’t find the ‘caring a tiny bit’ story very reassuring, for the same reasons as Wei Dai, although I do find the acausal trade story for why humans might be left with Earth somewhat heartening. (I’m assuming that by ‘game-theoretic reasons’ you mean acausal trade.))

Comment by _will_ (Will Aldred) on MIRI 2024 Communications Strategy · 2024-10-30T04:08:35.013Z · LW · GW

I don't think [AGI/ASI] literally killing everyone is the most likely outcome

Huh, I was surprised to read this. I’ve imbibed a non-trivial fraction of your posts and comments here on LessWrong, and, before reading the above, my shoulder Daniel definitely saw extinction as the most likely existential catastrophe.

If you have the time, I’d be very interested to hear what you do think is the most likely outcome. (It’s very possible that you have written about this before and I missed it—my bad, if so.)

Comment by _will_ (Will Aldred) on johnswentworth's Shortform · 2024-10-28T14:37:12.956Z · LW · GW

Hmm, the ‘making friends’ part seems the most important (since there are ways to share new information you’ve learned, or solve problems, beyond conversation), but it also seems a bit circular. Like, if the reason for making friends is to hang out and have good conversations(?), but one has little interest in having conversations, then doesn’t one have little reason to make friends in the first place, and therefore little reason to ‘git gud’ at the conversation game?

Comment by _will_ (Will Aldred) on LTFF and EAIF are unusually funding-constrained right now · 2023-08-31T19:53:14.614Z · LW · GW

So basically I don't think it's possible to do robustly positive actions in longtermism with high (>70%? >60%?) probability of being net positive for the long-term future

This seems like an important point, and it's one I've not heard before. (At least, not outside of cluelessness or specific concerns around AI safety speeding up capabilities; I'm pretty sure that most EAs I know have ~100% confidence that what they're doing is net positive for the long-term future.)

I'm super interested in how you might have arrived at this belief: would you be able to elaborate a little? For instance, is there a theoretical argument going on here, like a weak form of cluelessness? Or is it more empirical, for example, did you get here through evaluating a bunch of grants and noticing that even the best seem to carry 30-ish percent downside risk? Something else?

Comment by _will_ (Will Aldred) on How to have Polygenically Screened Children · 2023-06-03T22:49:29.031Z · LW · GW

"GeneSmith"... the pun just landed with me. nice.

Comment by _will_ (Will Aldred) on Open Thread With Experimental Feature: Reactions · 2023-05-25T15:40:45.758Z · LW · GW

Very nitpicky (sorry): it'd be nice if the capitalization to the epistemic status reactions was consistent. Currently, some are in title case, for example "Too Harsh" and "Hits the Mark", while others are in sentence case, like "Key insight" and "Missed the point". The autistic part of me finds this upsetting.

Comment by _will_ (Will Aldred) on AI Risk & Policy Forecasts from Metaculus & FLI's AI Pathways Workshop · 2023-05-17T12:53:11.913Z · LW · GW

Thanks for this comment. I don't have much to add, other than: have you considered fleshing out and writing up this scenario in a style similar to "What 2026 looks like"?

Comment by _will_ (Will Aldred) on AI Risk & Policy Forecasts from Metaculus & FLI's AI Pathways Workshop · 2023-05-17T12:42:25.541Z · LW · GW

Thanks for this question.

Firstly, I agree with you that firmware-based monitoring and compute capacity restrictions would require similar amounts of political will to happen. Then, in terms of technical challenges, I remember one of the forecasters saying they believe that "usage-tracking firmware updates being rolled out to 95% of all chips covered by the 2022 US export controls before 2028" is 90% likely to be physically possible, and 70% likely to be logistically possible. (I was surprised at how high these stated percentages were, but I didn't have time then to probe them on why exactly they were at these percentages—I may do so at the next workshop.)

Assuming the technical challenges of compute capacity restrictions aren't significant, fixing compute capacity restrictions at 15% likely, and applying the following crude calculation:

P(firmware) = P(compute) x P(firmware technical challenges are met)

= 0.15 x (0.9 x 0.7) = 0.15 x 0.63 = 0.0945 ~ 9%

9% is a little above the reported 7%, which I take as meaning that the other forecasters on this question believe the firmware technical challenges are a little, but not massively, harder than the 90%–70% breakdown given above.

Comment by Will Aldred on [deleted post] 2023-05-07T15:16:53.695Z

There is a vibe that I often get from suffering focused people, which is a combo of

a) seeming to be actively stuck in some kind of anxiety loop, preoccupied with hell in a way that seems more pathological to me than well-reasoned. 

b) something about their writing and vibe feels generally off,

...

I agree that this seems to be the case with LessWrong users who engage in suffering-related topics like quantum immortality and Roko's basilisk. However, I don't think any(?) of these users are/have been professional s-risk researchers; the few (three, iirc) s-risk researchers I've talked to in real life did not give off this kind of vibe at all.

Comment by _will_ (Will Aldred) on What fact that you know is true but most people aren't ready to accept it? · 2023-02-03T15:12:49.938Z · LW · GW

there is no heaven, and god is not real.