Moloch games 2020-10-16T15:19:04.722Z
Subspace optima 2020-05-15T12:38:32.444Z
Risks from Learned Optimization: Conclusion and Related Work 2019-06-07T19:53:51.660Z
Deceptive Alignment 2019-06-05T20:16:28.651Z
The Inner Alignment Problem 2019-06-04T01:20:35.538Z
Conditions for Mesa-Optimization 2019-06-01T20:52:19.461Z
Risks from Learned Optimization: Introduction 2019-05-31T23:44:53.703Z
Alignment problems for economists 2018-07-10T23:43:56.662Z


Comment by chavam on Moloch games · 2020-10-18T09:02:45.074Z · LW · GW

Yes that's what I meant, thanks.

Comment by chavam on Subspace optima · 2020-05-16T11:06:33.306Z · LW · GW

I made up the term on the spot, so I don't think so.

Comment by chavam on Tabooing 'Agent' for Prosaic Alignment · 2019-08-23T06:21:36.359Z · LW · GW

I endorse this. I like the framing, and it's very much in line with how I think about the problem. One point I'd make is: I'd replace the word "model" with "algorithm", to be even more agnostic. "Model" seems for many people already to carry an implicit intuitive interpretation of what the learned algorithm is doing, namely "trying to faithfully represent the problem", or something similar.

Comment by chavam on Two agents can have the same source code and optimise different utility functions · 2018-07-11T00:44:06.954Z · LW · GW

Here are some counterarguments:

There can be scenario's where the agent cannot change his source code without processing observations. e.g. the agent may need to reprogram himself via some external device.

The agent may not be aware that there are multiple copies of him.

It seems that for many plausible agent designs, it would require a significant change in the architecture to change his utility function. E.g. if two human sociopaths would want to change their utility function into a weighted average of the two, they couldn't do so without significantly changing their brain architecture. A TDT agent could do this, but I think it is not prudent to assume that all actually future existing AGI's we will deal with will be TDT's (in fact, most likely most of them won't be it seems to me).

So I don't think your comment invalidates the relevance of the point made by the poster.

Comment by chavam on Two agents can have the same source code and optimise different utility functions · 2018-07-11T00:22:04.825Z · LW · GW

You don't necessarily need "explicit self-reference". The difference in utility functions can also be obtained due to a difference in the location of the agent in the universe. Two identical worms placed in different locations will have different utility functions due to their atoms being not exactly in the same location, despite not having explicit self-reference. Similarly, in a computer simulation, the agents with the same source code will be called by the universe-program in different contexts (if they weren't, I don't see how it makes sense to even speak of them as being "different instances of the same source code". There would just be one instance of the source code.).

So in fact, I think that this is probably a property of almost all possible agents. It seems to me that you need a very complex and specific ontological model in the agent to prevent these effects and have the two agents have the same utility function.