Posts
Comments
Are astronomical suffering risks (s-risk) considered a subset of existential risks (x-risk) because they "drastically curtail humanity’s potential"? Or is this concern not taken into account for this research program?
I appreciate the response and stand corrected.
The point about it being an iterated prisoner's dilemma is a good one, and I would rather there be more such ACX instances where he shares even more of his thinking due to our cooperative/trustworthy behavior, than this to be the last one or have the next ones be filtered PR-speak.
A small number of people in the alignment community repeatedly getting access to better information and being able to act on it beats the value of this one single post staying open to the world. And even in the case of "the cat being out of the bag," hiding/removing the post would probably do good as a gesture of cooperation.
Another point about "defection" is which action is a defection with respect to whom.
Sam Altman is the leader of an organization with a real chance of bringing about the literal end of the world, and I find any and all information about his thoughts and his organization to be of the highest interest for the rest of humanity.
Not disclosing whatever such information ones comes into contact with, except in case of speeding up potentially even-less-alignment-focused competitors, is a defection against the rest of us.
If this were an off-the-record meeting with a head of state discussing plans for expanding and/or deploying nuclear weapons capabilities, nobody would dare suggest taking it down, inaccuracy and incompleteness notwithstanding.
Now, Sam Altman appeals emotionally to a lot of us (me included) as much more relatable, being an apparently prosocial nerdy tech guy, but in my opinion he's a head state in control of WMDs and should be (and expect to be) treated as such.
Update: honestly curious about reasons for downvotes if anyone is willing to share. I have no intention to troll or harm the discussion and am willing to adapt writing style. Thank you.
Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?
The only relevant parts so far seem to be these two:
Behavioral cloning probably much safer than evolving a bunch of agents. We can tell GPT to be empathic.
And:
Chat access for alignment helpers might happen.
Both of which are very concerning.
"We can tell GPT to be empathetic" assumes it can be aligned in the first place so you "can tell" it what to do, and "be empathetic" is a very vague description of what a good utility function would be assuming one would be followed at all. Of course it's all in conversational tone, not a formal paper, but it seems very dismissive to me.
GPT-based "behavioral cloning" itself has been brought up by Vitalik Buterin and criticized by Eliezer Yudkowsky in this exchange between the two:
For concreteness: One can see how AlphaFold 2 is working up towards world-ending capability. If you ask how you could integrate an AF2 setup with GPT-3 style human imitation, to embody the human desire for proteins that do nice things... the answer is roughly "Lol, what? No."
As for "chat access for alignment helpers," I mean, where to even begin? It's not hard to imagine a deceptive AI using this chat to perfectly convince human "alignment helpers" that it is whatever they want it to be while being something else entirely. Or even "aligning" the human helpers themselves into beliefs/actions that are in the AI's best interest.
Very interested in this, especially looking out for how to balance or resolve trade-offs between high inner coordination (people agree fast and completely on actions and/or beliefs) and high "outer" coordination (with reality, i.e. converging fast and strongly on the right things), aka how to avoid echo-chambers/groupthink without devolving into bickering and splintering into factions.