bioshok

Posts
Comments

Posts

Comments

Comment by bioshok (unicode-85) on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-06-26T17:09:48.085Z · LW · GW

In the context of Deceptive Alignment, would the ultimate goal of an AI system appear random and uncorrelated with the training distribution's objectives from a human perspective? Or would it be understandable to humans that the goal is somewhat correlated with the objectives of the training distribution?

For instance, in the article below, it is written that "the model just has some random proxies that were picked up early on, and that's the thing that it cares about." To what extent does it learn random proxies?

https://www.lesswrong.com/posts/A9NxPTwbw6r6Awuwt/how-likely-is-deceptive-alignment

If an AI system pursues ultimate goals such as power or curiosity, there seems to be a pseudocorrelation regardless of what the base objective is.

On the other hand, can it possibly learn to pursue a goal completely unrelated to the context of the training distribution, such as mass-producing objects of a peculiar shape?

Comment by bioshok (unicode-85) on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-06-26T17:08:14.824Z · LW · GW

If an AI system pursues ultimate goals such as power or curiosity, there seems to be a pseudocorrelation regardless of what the base objective is.

On the other hand, can it possibly learn to pursue a goal completely unrelated to the context of the training distribution, such as mass-producing objects of a peculiar shape?

Comment by bioshok (unicode-85) on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-06-26T16:57:05.646Z · LW · GW

Comment by bioshok (unicode-85) on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-05-18T01:34:47.860Z · LW · GW

I would like to know about the history of the term "AI alignment". I found an article written by Paul Christiano in 2018. Did the use of the term start around this time? Also, what is the difference between AI alignment and value alignment?

https://www.alignmentforum.org/posts/ZeE7EKHTFMBs8eMxn/clarifying-ai-alignment

User info

Posts

Comments