Posts
Comments
Comment by
astrobiscuit on
Risks from Learned Optimization: Introduction ·
2019-07-23T12:21:56.002Z ·
LW ·
GW
To me that the mesaoptimizer in the toy example is:
- aligned with its goal - it reaches the door (which it incorrectly identifies)
- dysfunctional - it incorrectly identifies doors.
From a consequentialist perspective this may be irrelevant, but from safety point of view this distinction is important and big.
In the context of this article I believe that misalignment (pseudo alignment) would occur when the goal of the mesa optimizer would diverge from its original goal (change completely, extend, etc.)
(As a secondary point that I haven't thought a lot about, it seems problematic to discuss alignment unless the mesa optimizer's goal liberally contains the base goal: Find doors in order to achieve Obase.)