Some Intuitions for the Ethicophysics
post by MadHatter, mishka · 2023-11-30T06:47:55.145Z · LW · GW · 4 commentsContents
4 comments
4 comments
Comments sorted by top scores.
comment by TAG · 2023-12-01T11:23:39.804Z · LW(p) · GW(p)
I guess one thing I am curious about is, who would I have to get to check my derivation of the Golden Theorem in order for people to have any faith in it? It should be checkable by any physics major, just based on how little physics I actually know.
If It actually is physics. As far as I can see , it is decision/game theory.
Replies from: MadHatter↑ comment by MadHatter · 2023-12-01T12:20:21.995Z · LW(p) · GW(p)
Yes, it is a specification of a set of temporally adjacent computable Schelling Points. It thus constitutes a trajectory through the space of moral possibilities that can be used by agents to coordinate and punish defectors from a globally consistent morality whose only moral stipulations are such reasonable sounding statements as "actions have consequences" and "act more like Jesus and less like Hitler".
Replies from: MadHattercomment by mishka · 2023-11-30T21:39:47.088Z · LW(p) · GW(p)
So, to summarize, I think the key upside of this dialogue is a rough preliminary sketch of a bridge between the formalism of ethicophysics and how one might hope to use it in the context of AI existential safety.
As a result, it should be easier for readers to evaluate the overall approach.
At the same time, I think the main open problem for anyone interested in this (or in any other) approach to AI existential safety is how well does it hold with respect to recursive self-improvement.
Both the powerful AIs and the ecosystems of powerful AIs have inherently very high potential for recursive self-improvement (which might be not unlimited, but might encounter various thresholds at which it saturates, at least for some periods of time, but nevertheless is likely to result in a period of rapid changes, where not only capabilities, but the nature of AI systems in question, their architecture, algorithms, and, unfortunately, values, might change dramatically).
So, any approach to AI existential safety (this approach, and any other possible approach) needs to be eventually evaluated with respect to this likely rapid self-improvement and various self-modification.
Basically, is the coming self-improvement trajectory completely unpredictable, or could we hope for some invariants to be preserved, and specifically could we find some invariants which are both feasible to preserve during rapid self-modification and which might result in the outcomes we would consider reasonable.
E.g. if the resulting AIs are mostly "supermoral", can we just rely on them taking care that their successors and creations are "supermoral" as well, or are any extra efforts on our part are required to make this more likely? We would probably want to look at "details of the ethicophysical dynamics" closely in connection with this, rather than just relying on the high-level "statements of hope"...