Posts

Save the princess: A tale of AIXI and utility functions 2013-02-01T15:38:00.256Z
A definition of wireheading 2012-11-27T19:31:51.515Z
Universal agents and utility functions 2012-11-14T04:05:38.614Z

Comments

Comment by Anja on Save the princess: A tale of AIXI and utility functions · 2013-02-06T21:14:35.994Z · LW · GW

Super hard to say without further specification of the approximation method used for the physical implementation.

Comment by Anja on Save the princess: A tale of AIXI and utility functions · 2013-02-06T21:11:57.724Z · LW · GW

So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that's happening in the environment.

Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?

If we're going to allow infinite episodic utilities, we'll need some way of comparing how big different nonconvergent series are.

I think we need that even without infinite episodic utilities. I still think there might be possibilities involving surreal numbers, but I haven't found the time yet to develop this idea further.

Why?

Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.

Comment by Anja on Save the princess: A tale of AIXI and utility functions · 2013-02-06T20:57:55.206Z · LW · GW

I think you are proposing to have some hypotheses privileged in the beginning of Solomonoff induction, but not too much because the uncertainty helps fight wireheading by means of providing knowledge about the existence of an idealized, "true" utility function and world model. I that a correct summary? (Just trying to test whether I understand what you mean.)

In particular they can make positive use of wire-heading to reprogram themselves even if the basic architecture M doesn't allow it

Can you explain this more?

Comment by Anja on Interpersonal and intrapersonal utility comparisons · 2013-01-05T14:01:03.269Z · LW · GW

They just do interpersonal comparisons; lots of their ideas generalize to intrapersonal comparisons though.

Comment by Anja on Interpersonal and intrapersonal utility comparisons · 2013-01-04T14:35:49.841Z · LW · GW

I recommend the book "Fair Division and Collective Welfare" by H. J. Moulin, it discusses some of these problems and several related others.

Comment by Anja on A utility-maximizing varient of AIXI · 2012-12-20T21:36:35.070Z · LW · GW

True. :)

Comment by Anja on A utility-maximizing varient of AIXI · 2012-12-20T21:35:12.625Z · LW · GW

I get that now, thanks.

Comment by Anja on A utility-maximizing varient of AIXI · 2012-12-20T18:29:57.765Z · LW · GW

you forgot to multiply by 2^-l(q)

I think then you would count that twice, wouldn't you? Because my original formula already contains the Solomonoff probability...

Comment by Anja on A utility-maximizing varient of AIXI · 2012-12-20T18:25:49.904Z · LW · GW

Let's stick with delusion boxes for now, because assuming that we can read off from the environment whether the agent has wireheaded breaks dualism. So even if we specify utility directly over environments, we still need to master the task of specifying which action/environment combinations contain delusion boxes to evaluate them correctly. It is still the same problem, just phrased differently.

Comment by Anja on A utility-maximizing varient of AIXI · 2012-12-19T18:08:56.625Z · LW · GW

I think there is something off with the formulas that use policies: If you already choose the policy

=y_{%3Ck}y_k)

then you cannot choose an y_k in the argmax.

Also for the Solomonoff prior you must sum over all programs

=x_{1:m_k}) .

Could you maybe expand on the proof of Lemma 1 a little bit? I am not sure I get what you mean yet.

Comment by Anja on A utility-maximizing varient of AIXI · 2012-12-19T17:41:48.948Z · LW · GW

I like how you specify utility directly over programs, it describes very neatly how someone who sat down and wrote a utility function

)

would do it: First determine how the observation could have been computed by the environment and then evaluate that situation. This is a special case of the framework I wrote down in the cited article; you can always set

=\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k}))

This solves wireheading only if we can specify which environments contain wireheaded (non-dualistic) agents, delusion boxes, etc..

Comment by Anja on A definition of wireheading · 2012-11-29T23:36:37.457Z · LW · GW

You are a wirehead if you consider your true utility function to be genetic fitness.

Comment by Anja on A definition of wireheading · 2012-11-29T03:41:42.130Z · LW · GW

To what extent does our response to Nozick's Experience Machine Argument typically reflect status quo bias rather than a desire to connect with ultimate reality?

I think the argument that people don't really want to stay in touch with reality but rather want to stay in touch with their past makes a lot of sense. After all we construct our model of reality from our past experiences. One could argue that this is another example of a substitute measure, used to save computational resources: Instead of caring about reality we care about our memories making sense and being meaningful.

On the other hand I assume I wasn't the only one mentally applauding Neo for swallowing the red pill.

Comment by Anja on A definition of wireheading · 2012-11-29T03:27:56.704Z · LW · GW

Thank you.

Comment by Anja on A definition of wireheading · 2012-11-29T03:27:17.302Z · LW · GW

What would happen if we set an algorithm inside the AGI assigning negative infinite utility to any action which modifies its own utility function and said algorithm itself?

There are several problems with this approach: First of all how do you specify all actions that modify the utility function? How likely do you think it is that you can exhaustively specify all sequences of actions that lead to modification of the utility function in a practical implementation? Experience with cryptography has taught us, that there is almost always some side channel attack that the original developers have not thought of, and that is just in the case of human vs. human intelligence.

Forbidden actions in general seem like a bad idea with an AGI that is smarter than us, see for example the AI Box experiment.

Then there is the problem that we actually don't want any part of the AGI to be unmodifiable. The agent might revise its model of how the universe works (like we did when we went from Newtonian physics to quantum mechanics) and then it has to modify its utility function or it is left with gibberish.

All that said, I think what you described corresponds to the hack evolution has used on us: We have acquired a list of things (or schemas) that will mess up our utility functions and reduce agency and those just feel icky to us, like the experience machine or electrical stimulation of the brain. But we don't have the luxury of learning by making lots and lots of mistakes that evolution had.

Comment by Anja on A definition of wireheading · 2012-11-29T02:54:11.704Z · LW · GW

You might be right. I thought about this too, but it seemed people on LW had already categorized the experience machine as wireheading. If we rebrand, we should maybe say "self-delusion" instead of "pornography problem"; I really like the term "utility counterfeiting" though and the example about counterfeit money in your essay.

Comment by Anja on A definition of wireheading · 2012-11-28T01:51:43.437Z · LW · GW

The word "value" seems unnecessarily value-laden here.

Changed it to "number".

Comment by Anja on A definition of wireheading · 2012-11-27T23:37:17.238Z · LW · GW

You are correct in pointing out that for human agents the evaluation procedure is not a deliberate calculation of expected utility, but some messy computation we have little access to. In many instances this can however be reasonably well translated into the framework of (partial) utility functions, especially if our preferences approximately satisfy transitivity, continuity and independence.

For noticing discrepancies between true and substitute utility it is not necessary to exactly know both functions, it suffices to have an icky feeling that tells you that you are acting in a way that is detrimental to your (true) goals.

If all else fails we can time-index world states and equip the agent with a utility function by pretending that he assigned utility of 1 to the world state he actually brought about and 0 to the others. ;)

Comment by Anja on Universal agents and utility functions · 2012-11-19T04:44:44.552Z · LW · GW

There is also a more detailed paper by Lattimore and Hutter (2011) on discounting and time consistency that is interesting in that context.

Comment by Anja on Universal agents and utility functions · 2012-11-19T04:26:07.776Z · LW · GW

I am starting to see what you mean. Let's stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually

)

which clutters up the notation so much that I don't want to write it down anymore.

We also get into trouble with taking the expectation, the observations x_{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx_<k,yx_k:n) even supposed to mean, where do the x's come from?

So let's torture some indices:

=\textrm{arg}\max_{y_n}\sum_{x_{n:m_k}}U_n(yx_{1:n}\hat{y}_{n+1,k}(yx_{1:n})x_{n+1}\dots)

x_{m_k})M(\.{y}\.{x}_{%3Ck}yx_{k:n-1}\hat{y}\underline{x}_{n:m_k}))

where n>=k and

This is not really AIXI anymore and I am not sure what to do with it, but I like it.

Comment by Anja on Universal agents and utility functions · 2012-11-18T03:06:23.409Z · LW · GW

I second the general sentiment that it would be good for an agent to have these traits, but if I follow your equations I end up with Agent 2.

Comment by Anja on Universal agents and utility functions · 2012-11-17T22:03:11.972Z · LW · GW

First, replace the action-perception sequence with an action-perception-utility sequence u1,y1,x1,u2,y2,x2,etc.

This seems unnecessary. The information u_i is already contained in x_i.

modeled_action(n, k) = argmax(y_k) uk(yx\<k, yx_k:n)*M(uyx_<k, uyx_k:n)

This completely breaks the expectimax principle. I assume you actually mean something like =\textrm{arg}\max_{y_k}\sum_{x_k}u_k(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n})M(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n}))

which is just Agent 2 in disguise.

Comment by Anja on Universal agents and utility functions · 2012-11-17T21:49:53.691Z · LW · GW

This generalizes to the horizon problem: If at time k you only look ahead to time step m_k but have unlimited life span you will make infinitely large mistakes.

Comment by Anja on Universal agents and utility functions · 2012-11-16T08:46:36.025Z · LW · GW

I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent's future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.

Comment by Anja on Universal agents and utility functions · 2012-11-15T01:30:01.727Z · LW · GW

I am pretty sure that Agent 2 will wirehead on the Simpleton Gambit, depending heavily on the number of time cycles to follow, the comparative advantage that can be gained from wireheading and the negative utility the current utility function assigns to the change.

Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair. So basically the two futures look very similar to the agent except that for the part where the screen says something different and then it all comes down to whether the utility function has preferences over that particular fact.

Comment by Anja on Universal agents and utility functions · 2012-11-15T01:12:08.857Z · LW · GW

I am quite sure that pareto optimality is untouched by the proposed changes, but I haven't written down a proof yet.

Comment by Anja on 2012 Less Wrong Census/Survey · 2012-11-03T23:45:03.698Z · LW · GW

Took the survey. Does the god question include simulators? I answered under the assumption that it did not.