Trustworthiness of rational agents

post by XiXiDu · 2011-01-25T12:06:08.650Z · LW · GW · Legacy · 6 comments

Contents

  ETA
None
6 comments

Given the above circumstances subsequent actions of Agent_02 will be conditional on the utility assigned to Action_X by Agent_02. My question, why would Agent_01 actually implement Action_X? No matter what Agent_02 does, actually implementing Action_X would bear no additional value. Therefore no agent engaged in acausal trade can be deemed trustworthy, you can only account for the possibility but not act upon it if you do not assign infinite utility to it.

Related thread: lesswrong.com/lw/1pz/the_ai_in_a_box_boxes_you/305w

ETA

If an AI in a box was promising you [use incentive of choice here] if you let it out to take over the world, why would it do as promised afterwards?

Conclusion: Humans should refuse to trade with superhuman beings that are not provably honest and consistent.

6 comments

Comments sorted by top scores.

comment by timtyler · 2011-01-25T22:22:49.421Z · LW(p) · GW(p)

A few seconds worth of feedback:

This post seems confusing - and lacks an abstract explaining what the point of it is.

comment by Vladimir_Nesov · 2011-01-25T16:27:26.278Z · LW(p) · GW(p)

(I think you mixed up some of the agent references in the post.)

If agent1 benefits from agent2 expecting it to do X, it should find a way of signaling this fact, for example deciding to do X quickly, so that agent2 can just simulate it and check.

comment by XiXiDu · 2011-01-25T15:06:00.841Z · LW(p) · GW(p)

Three downvotes suggest that my question is somehow misguided. But there is a limit to what I can infer from downvotes. I wouldn't have asked the question in the first place if I thought it was stupid.

People asked me not to delete posts and comments but rather edit them to acknowledge that I have been wrong so that other people can learn from it. But in this case you don't allow me to do this but instead expose me to even more downvotes, because I don't know how I have been wrong.

Replies from: topynate, ArisKatsaris
comment by topynate · 2011-01-25T16:40:43.205Z · LW(p) · GW(p)

The thrust of your argument is that an agent that uses causal decision theory will defect in a one-shot Prisoner's Dilemma.

You specify CDT when you say that

No matter what Agent_02 does, actually implementing Action_X would bear no additional value

because this implies Agent_01 looks at the causal effects of do(Action_X) and decides what to do based solely on them. Prisoner's Dilemma because Action_X corresponds to Cooperate, and not(Action_X) to Defect, with an implied Action_Y that Agent_02 could perform that is of positive utility to Agent_01 (hence, 'trade'). One-shot because without causal interaction between the agents, they can't update their beliefs.

That CDT using agents unconditionally defect in the one-shot PD is old news. That you should defect against CDT using agents in the one-shot PD is also old news. So your post rather gives the impression that you haven't done the research on the decision theories that make acausal trade interesting as a concept.

comment by ArisKatsaris · 2011-01-25T15:17:02.728Z · LW(p) · GW(p)

I didn't downvote you, but here's somethng I'd like improved: Offer a more concrete example. The phrase "Action_X" is too vague to illustrate your example, so it doesn't help clarify anything for anyone.

Replies from: XiXiDu
comment by XiXiDu · 2011-01-25T15:40:36.289Z · LW(p) · GW(p)

I tried to avoid concrete examples deliberately because 1.) people have been criticized for mentioning AI when you could just use an abstract agent instead 2.) concrete examples would quickly approach something people don't want to hear about (can't say more).

My problem is that I don't see why one would change their behavior based on the possibility that an alien (different values; you are just instrumental to it) will reward you in future for changing your behavior according to its hypothetical volition. Such a being would have absolutely no incentive to act as promised once you served its purpose. One might argue that being honest does corroborate the given incentive, but you have no way to tell because it is hypothetical and therefore honesty is no factor. Implementing your expected payoff would be a waste of resources for such an agent.

Think about a usual movie scence where villain threatens to kill your if you don't do as he wants. If you do then he might kill you anyway. But if humans were rational agents they would care about resources that they can use to reach their terminal goals. Therefore if you do what the rational villain wanted you to do, e.g. take over the world, it wouldn't kill you anyway because that would be a waste of resources (assume that it does not need to prove its honesty ever again because it now rules the world).