Subagents and impact measures: summary tablespost by Stuart_Armstrong · 2020-02-17T14:09:32.029Z · LW · GW · 2 comments
These tables will summarise the results of this whole sequence, checking whether subagents can neutralise the impact penalty.
First of all, given a subagent, here are the results for various impact penalties and baselines, and various "value difference summary functions" :
Another way of phrasing " decreasing": it penalises too little power, not too much. Conversely, " increasing" penalises too much power, not too little. Thus, unfortunately:
- Subagents do allow an agent to get stronger than the indexical impact penalty would allow.
- Subagents don't allow an agent to get weaker than the indexical impact penalty would allow.
This table presents, for three specific examples, whether they could actually build a subagent, and whether that would neutralise their impact penalty in practice (in the inaction baseline):
Now, whether the RR or AU penalties are undermined technically depends on , not on what measure is being used for value. However, I feel that the results undermine the spirit of AU much more than the spirit of RR. AU attempted to control an agent by limiting its power; this effect is mainly neutralised. RR attempted to control the side-effects of an agent by ensuring it had enough power to reach a lot of states; this effect is not neutralised by a subagent.
Comments sorted by top scores.