Posts

Comments

Comment by Yoshua Bengio (yoshua-bengio) on A case for AI alignment being difficult · 2024-01-07T15:53:21.225Z · LW · GW

If the AI is modeling the real world, then it might in some ways care about it

I am not convinced at all that this is true. Consider an AI whose training objective simply makes it want to model how the world works as well as possible, like a pure scientist which is not trying to acquire more knowledge via experiments but only reasons and explores explanatory hypotheses to build a distribution over theories of the observed data. It is agency and utilities or rewards that induce a preference over certain states of the world.

Comment by Yoshua Bengio (yoshua-bengio) on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-26T12:23:31.831Z · LW · GW

Karl, thanks for the very good summary and interesting analysis. There is one factual error, though, that I would appreciate you fix: my 10 to 50% estimate (2nd row in the table) was not for x-risk but for superhuman AI. FYI it was obtained by polling through hand-raising a group of RL researcher in a workshop (most having no or very little exposure to AI safety). Another (mild) error is that although I have been a reader of (a few) AI safety papers for about a decade, it is only recently that I started writing about it.