PabloAMC's Shortform

post by PabloAMC · 2023-09-02T16:34:52.909Z · LW · GW · 1 comments

1 comments

Comments sorted by top scores.

comment by PabloAMC · 2023-09-02T16:34:53.207Z · LW(p) · GW(p)

The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.