Desiderata for an Adversarial Prior

shmi

Desiderata for an Adversarial Prior

post by Shmi (shminux) · 2022-11-09T23:45:16.331Z · LW · GW · 2 comments

2 comments

Based on the discussion in the comments on https://www.lesswrong.com/posts/6XsZi9aFWdds8bWsy/is-there-any-discussion-on-avoiding-being-dutch-booked-or [LW · GW].

Epistemic status: I am not an expert in the subject matter, so not confident, seems useful to have, I could not find anything discussion online.

As I understand it, Pascal Mugging [? · GW] can either be an honest opportunity for a large gain in utility (e.g. a Powerball ticket?) if the mugger is honest, or an adversarial attempt to exploit the agent (e.g. a Nigerian Prince Advance Fee Scam) if not. Most people have some skills in telling the two apart, usually erring on the side of caution. However, it is not clear how to formalize this approach. Most of the discussion focuses on ignoring super-tiny probabilities, and accepting a utility loss penalty, where a proverbial $10 on a busy sidewalk gets ignored because if it were real someone would have picked it up already, a situation humans navigate quite successfully most of the time.

I suspect that Pascal Mugging detection might be better approached from a different direction: instead of starting from "I know a liar when I see one" and the rule of thumb "Most too-good-to-be-true proposals are just that" and trying to formalize it for the case of "tiny probabilities of vast utilities", one could try to start with a more general logic of "adversarial detection", and then apply it to the special case of Pascal Mugging.

How would one go about telling apart an indifferent interaction, where something like the Solomonoff's universal prior is appropriate, from an adversarial one, where it runs into trouble? Whatever the approach, it is likely to have the following properties:

In a non-adversarial case it reduces to the universal prior.
An attack by a "less-intelligent" adversary is severely penalized, the larger the difference in "intelligence" the harsher.
Conversely, an attack by a "vastly more intelligent" adversary is indistinguishable from the non-adversarial case.

In the Pascal Mugger's case, the really clever one will get your money without you realizing what happened (see point 3 above), the dumb one will go away empty handed, and there is a whole range in between. There are probably some other desiderata that I am missing here.

2 comments

Comments sorted by top scores.

comment by JBlack · 2022-11-10T01:20:49.688Z · LW(p) · GW(p)

Solomonoff priors work fine for adversarial interaction. After all, the prior distribution does include lots of adversarial agents. The main problem is that all such priors are uncomputable even in theory, and so are equally useless for any interaction whether adversarial or not.

If you happen to have a hypercomputer in your mind, then you could ignore that inconvenient fact.^[1] You can use a Solomonoff prior to estimate the outcome distribution over your actions for every possible model of the world, weighted by complexity, which naturally includes everything about the agent you're interacting with.

Agents with bounded rationality can't do this, and have to rely on much cruder heuristics and social structures. For example, people who say "I've got a device that will kill or maim a great many people unless you give me money" tend to be dealt with very harshly by society if their threat is credible, and still pretty harshly if their threat is not credible.

^{^}
You also need to know that nothing else in your universe involves a hypercomputer of comparable strength, which does seem unjustifiable. The nice thing about absurd hypotheticals is that we can just declare this to be true by fiat anyway.

comment by lalaithion · 2022-11-10T17:17:50.748Z · LW(p) · GW(p)

One adversarial prior would be "my prior for bets of expected value X is O(1/X)".

Desiderata for an Adversarial Prior

Contents

2 comments