Towards_Keeperhood's Shortform

post by Towards_Keeperhood (Simon Skade) · 2022-11-25T11:50:41.595Z · LW · GW · 2 comments

2 comments

Comments sorted by top scores.

comment by Towards_Keeperhood (Simon Skade) · 2022-11-25T12:02:09.206Z · LW(p) · GW(p)

I feel like many people look at AI alignment like they think the main problem is being careful enough when we train the AI so that no bugs cause the objective to misgeneralize.

This is not the main problem. The main problem is that it is likely significantly easier to build an AGI than to build an aligned AI or a corrigible AI. Even if it's relatively obvious that AGI design X destroys the world, and all the wise actors don't deploy it, we cannot prevent unwise actors to deploy it a bit later.

We currently don't have any approach to alignment that would work even if we managed to implement everything correctly and had perfect datasets.

comment by Towards_Keeperhood (Simon Skade) · 2022-11-25T12:38:39.492Z · LW(p) · GW(p)

In case some people relatively new to lesswrong aren't aware of it. (And because I wish I found that out earlier): "Rationality: From AI to Zombies" does not nearly cover all of the posts Eliezer published between 2006 and 2010.

Here's how it is:

  • "Rationality: From AI to Zombies" probably contains like 60% of the words EY has written in that timeframe and the most important rationality content.
  • The original sequences [? · GW] are basically the old version of the collection that is now "Rationality: A-Z", containing a bit more content. In particular a longer quantum physics sequence and sequences on fun theory and metaethics.
  • All EY posts from that timeframe [LW · GW] (or here [LW · GW] for all EY posts until 2020 I guess) (also can be found on lesswrong, but not in any collection I think).

So a sizeable fraction of EY's posts are not in a collection.

I just recently started reading the rest.

I strongly recommend reading:

And generally a lot of posts on AI (i.e. primarily posts in the AI foom debate) are not in the sequences. Some of them were pretty good.