## Posts

Decision Theory is multifaceted 2020-09-13T22:30:21.169Z · score: 6 (3 votes)
Goals and short descriptions 2020-07-02T17:41:52.578Z · score: 14 (8 votes)
Wireheading and discontinuity 2020-02-18T10:49:42.030Z · score: 22 (6 votes)
Thinking of tool AIs 2019-11-20T21:47:36.660Z · score: 6 (5 votes)

## Comments

Comment by michele-campolo on Decision Theory is multifaceted · 2020-09-17T08:40:04.437Z · score: 1 (1 votes) · LW · GW

Ok, if you want to clarify—I'd like to—we can have a call, or discuss in other ways. I'll contact you somewhere else.

Comment by michele-campolo on Decision Theory is multifaceted · 2020-09-16T22:08:54.503Z · score: 1 (1 votes) · LW · GW
Omega, a perfect predictor, flips a coin. If it comes up heads, Omega asks you for $100, then pays you$10,000 if it predict you would have paid if it had come up tails and you were told it was tails. If it comes up tails, Omega asks you for $100, then pays you$10,000 if it predicts you would have paid if it had come up heads and you were told it was heads.

Here there is no question, so I assume it is something like: "What do you do?" or "What is your policy?"

That formulation is analogous to standard counterfactual mugging, stated in this way:

Omega flips a coin. If it comes up heads, Omega will give you 10000 in case you would pay 100 when tails. If it comes up tails, Omega will ask you to pay 100. What do you do?

According to these two formulations, the correct answer seems to be the one corresponding to the first intuition.

Now consider instead this formulation of counterfactual PD:

Omega, a perfect predictor, tells you that it has flipped a coin, and it has come up heads. Omega asks you to pay 100 (here and now) and gives you 10000 (here and now) if you would pay in case the coin landed tails. Omega also explains that, if the coin had come up tails—but note that it hasn't—Omega would tell you such and such (symmetrical situation). What do you do?

The answer of the second intuition would be: I refuse to pay here and now, and I would have paid in case the coin had come up tails. I get 10000.

And this formulation of counterfactual PD is analogous to this formulation of counterfactual mugging, where the second intuition refuses to pay.

Is your opinion that

The answer of the second intuition would be: I refuse to pay here and now, and I would have paid in case the coin had come up tails. I get 10000.

is false/not admissible/impossible? Or are you saying something else entirely? In any case, if you could motivate your opinion, whatever that is, you would help me understand. Thanks!

Comment by michele-campolo on Decision Theory is multifaceted · 2020-09-16T11:21:28.522Z · score: 3 (2 votes) · LW · GW

It seems you are arguing for the position that I called "the first intuition" in my post. Before knowing the outcome, the best you can do is (pay, pay), because that leads to 9900.

On the other hand, as in standard counterfactual mugging, you could be asked: "You know that, this time, the coin came up tails. What do you do?". And here the second intuition applies: the DM can decide to not pay (in this case) and to pay when heads. Omega recognises the intent of the DM, and gives 10000.

Maybe you are not even considering the second intuition because you take for granted that the agent has to decide one policy "at the beginning" and stick to it, or, as you wrote, "pre-commit". One of the points of the post is that it is unclear where this assumption comes from, and what it exactly means. It's possible that my reasoning in the post was not clear, but I think that if you reread the analysis you will see the situation from both viewpoints.

Comment by michele-campolo on Decision Theory is multifaceted · 2020-09-15T21:21:33.517Z · score: 1 (1 votes) · LW · GW

If the DM knows the outcome is heads, why can't he not pay in that case and decide to pay in the other case? In other words: why can't he adopt the policy (not pay when heads; pay when tails), which leads to 10000?

Comment by michele-campolo on Decision Theory is multifaceted · 2020-09-15T10:41:36.456Z · score: 1 (1 votes) · LW · GW

The fact that it is "guaranteed" utility doesn't make a significant difference: my analysis still applies. After you know the outcome, you can avoid paying in that case and get 10000 instead of 9900 (second intuition).

Comment by michele-campolo on Decision Theory is multifaceted · 2020-09-14T14:31:18.901Z · score: 1 (1 votes) · LW · GW

Hi Chris!

Suppose the predictor knows that it writes M on the paper you'll choose N and if it writes N on the paper you'll choose M. Further, if it writes nothing you'll choose M. That isn't a problem since regardless of what it writes it would have predicted your choice correctly. It just can't write down the choice without making you choose the opposite.

My point in the post is that the paradoxical situation occurs when the prediction outcome is communicated to the decision maker. We have a seemingly correct prediction—the one that you wrote about—that ceases to be correct after it is communicated. And later in the post I discuss whether this problematic feature of prediction extends to other scenarios, leaving the question open. What did you want to say exactly?

I was quite skeptical of paying in Counterfactual Mugging until I discovered the Counterfactual Prisoner's Dilemma which addresses the problem of why you should care about counterfactuals given that they aren't factual by definition.

I've read the problem and the analysis I did for (standard) counterfactual mugging applies to your version as well.

The first intuition is that, before knowing the toss outcome, the DM wants to pay in both cases, because that gives the highest utility (9900) in expectation.

The second intuition is that, after the DM knows (wlog) the outcome is heads, he doesn't want to pay anymore in that case—and wants to be someone who pays when tails is the outcome, thus getting 10000.

Comment by michele-campolo on Goals and short descriptions · 2020-07-04T08:55:39.049Z · score: 1 (1 votes) · LW · GW

I wouldn't say goals as short descriptions are necessarily "part of the world".

Anyway, locality definitely seems useful to make a distinction in this case.

Comment by michele-campolo on Goals and short descriptions · 2020-07-03T17:59:26.094Z · score: 1 (1 votes) · LW · GW

No worries, I think your comment still provides good food for thought!

Comment by michele-campolo on Goals and short descriptions · 2020-07-03T11:06:18.705Z · score: 1 (1 votes) · LW · GW

I'm not sure I understand the search vs discriminative distinction. If my hand touches fire and thus immediately moves backwards by reflex, would this be an example of a discriminative policy, because an input signal directly causes an action without being processed in the brain?

About the goal of winning at chess: in the case of minimax search, generates the complete tree of the game using and then selects the winning policy; as you said, this is probably the simplest agent (in terms of Kolmogorov complexity, given ) that wins at chess—and actually wins at any game that can be solved using minimax/backward induction. In the case of , reads the environmental data about chess to assign reward to winning states and elsewhere, and represents an ideal RL procedure that exploits interaction with the environment to generate the optimal policy that maximises the reward function created by . The main feature is that in both cases, when the environment gets bigger and grows, the description length of the two algorithms given doesn't change: you could use minimax or the ideal RL procedure to generate a winning policy even for chess on a larger board, for example. If instead you wanted to use a giant lookup table, you would have to extend your algorithm each time a new state gets added to the environment.

I guess the confusion may come from the fact that is underspecified. I tried to formalise it more precisely by using logic, but there were some problems and it's still work in progress.

By the way, thanks for the links! I hope I'll learn something new about how the brain works, I'm definitely not an expert on cognitive science :)

Comment by michele-campolo on Goals and short descriptions · 2020-07-03T09:58:02.854Z · score: 1 (1 votes) · LW · GW

The others in the AISC group and I discussed the example that you mentioned more than once. I agree with you that such an agent is not goal-directed, mainly because it doesn't do anything to ensure that it will be able to perform action A even if adverse events happen.

It is still true that action A is a short description of the behaviour of that agent and one could interpret action A as its goal, although the agent is not good at pursuing it ("robustness" could be an appropriate term to indicate what the agent is lacking).

Comment by michele-campolo on Dutch-Booking CDT: Revised Argument · 2020-06-16T22:10:38.692Z · score: 1 (1 votes) · LW · GW

The part that I don't get is the reason why the agent is betting ahead of time implies evaluation according to edt, while the agent is reasoning during its action implies evaluation according to cdt. Sorry if I'm missing something trivial, but I'd like to receive an explanation because this seems a fundamental part of the argument.

Comment by michele-campolo on Dutch-Booking CDT: Revised Argument · 2020-06-14T10:05:54.666Z · score: 1 (1 votes) · LW · GW

I've noticed that one could read the argument and say: "Ok, an agent evaluates a parameter U differently at different times. Thus, a bookmaker exploits the agent with a bet/certificate whose value depends on U. What's special about this?"

Of course the answer lies in the difference between cdt(a) and edt(a), specifically you wrote:

The key point here is that because the agent is betting ahead of time, it will evaluate the value of this bet according to the conditional expectation E(U|Act=a).

and

Now, since the agent is reasoning during its action, it is evaluating possible actions according to cdt(a); so its evaluation of the bet will be different.

I think developing this two points would be useful to readers since, usually, the pivotal concepts behind EDT and CDT are considered to be "conditional probabilities" and "(physical) causation" respectively, while here you seem to point at something different about the times at which decisions are made.

***

Unrelated to what I just wrote:

XXX insert the little bit about free will and stuff that I want to remove from the main argument... no reason to spend time justifying it there if I have a whole section for it here

I guess here you wanted to say something interesting about free will, but it was probably lost from the draft to the final version of the post.

Comment by michele-campolo on Focus: you are allowed to be bad at accomplishing your goals · 2020-06-13T10:14:15.542Z · score: 5 (2 votes) · LW · GW

I want to show an example that seems interesting for evaluating, and potentially tweaking/improving, the current informal definition.

Consider an MDP with states; initial state; from each an action allows to go back to , and another action goes to (what happens in is not really important for the following). Consider two reward functions that are both null everywhere, except for one state that has reward 1: in the first function, in the second function, for some .

It's interesting (problematic?) that two agents, trained on the first reward function and on the second, have similar policies but different goals (defined as sets of states). Specifically, I expect that for , (for various possible choices of and different ways of defining the distance ). In words: respect to the environment size, the first agent is extremely close to , and viceversa, but the two agents have different goals.

Maybe this is not a problem at all: it could simply indicate that there exists a way of saying that the two considered goals are similar.

Comment by michele-campolo on Wireheading and discontinuity · 2020-02-28T10:03:10.912Z · score: 1 (1 votes) · LW · GW

That's an interesting example I had not considered. As I wrote in the observations: I don't think the discontinuity check works in all cases.

Comment by michele-campolo on Wireheading and discontinuity · 2020-02-26T11:54:39.908Z · score: 1 (1 votes) · LW · GW

I'm not sure I understand what you mean—I know almost nothing about robotics—but I think that, in most cases, there is a function whose discontinuity gives a strong indication that something went wrong. A robotic arm has to deal with impulsive forces, but its movement in space is expected to be continuous wrt time. The same happens in the bouncing ball example, or in the example I gave in the post: velocity may be discontinuous in time, but motion shouldn't.

Thanks for the suggestion on hybrid systems!

Comment by michele-campolo on Thinking of tool AIs · 2019-11-21T22:15:17.128Z · score: 3 (2 votes) · LW · GW

Maybe I should have used different words: I didn't want to convey the message that catastrophes are easy to obtain. The purpose of the fictional scenario was to make the reader reflect on the usage of the word "tool". Anyway, I'll try to consider non-technical feedback mechanisms more often in the future. Thanks!