Probability Space & Aumann Agreement

post by Wei Dai (Wei_Dai) · 2009-12-10T21:57:02.812Z · LW · GW · Legacy · 76 comments

Contents

  Probability Space
  Aumann's Agreement Theorem
None
76 comments

The first part of this post describes a way of interpreting the basic mathematics of Bayesianism. Eliezer already presented one such view at http://lesswrong.com/lw/hk/priors_as_mathematical_objects/, but I want to present another one that has been useful to me, and also show how this view is related to the standard formalism of probability theory and Bayesian updating, namely the probability space.

The second part of this post will build upon the first, and try to explain the math behind Aumann's agreement theorem. Hal Finney had suggested this earlier, and I'm taking on the task now because I recently went through the exercise of learning it, and could use a check of my understanding. The last part will give some of my current thoughts on Aumann agreement.

Probability Space

In http://en.wikipedia.org/wiki/Probability_space, you can see that a probability space consists of a triple:

F and P are required to have certain additional properties, but I'll ignore them for now. To start with, we’ll interpret Ω as a set of possible world-histories. (To eliminate anthropic reasoning issues, let’s assume that each possible world-history contains the same number of observers, who have perfect memory, and are labeled with unique serial numbers.) Each “event” A in F is formally a subset of Ω, and interpreted as either an actual event that occurs in every world-history in A, or a hypothesis which is true in the world-histories in A. (The details of the events or hypotheses themselves are abstracted away here.)

To understand the probability measure P, it’s easier to first introduce the probability mass function p, which assigns a probability to each element of Ω, with the probabilities summing to 1. Then P(A) is just the sum of the probabilities of the elements in A. (For simplicity, I’m assuming the discrete case, where Ω is at most countable.) In other words, the probability of an observation is the sum of the probabilities of the world-histories that it doesn't rule out.

A payoff of this view of the probability space is a simple understanding of what Bayesian updating is. Once an observer sees an event D, he can rule out all possible world-histories that are not in D. So, he can get a posterior probability measure by setting the probability masses of all world-histories not in D to 0, and renormalizing the ones in D so that they sum up to 1 while keeping the same relative ratios. You can easily verify that this is equivalent to Bayes’ rule: P(H|D) = P(D H)/P(D).

To sum up, the mathematical objects behind Bayesianism can be seen as

Aumann's Agreement Theorem

Aumann's agreement theorem says that if two Bayesians share the same probability space but possibly different information partitions, and have common knowledge of their information partitions and posterior probabilities of some event A, then their posterior probabilities of that event must be equal. So what are information partitions, and what does "common knowledge" mean?

The information partition I of an observer-moment M divides Ω into a number of subsets that are non-overlapping, and together cover all of Ω. Two possible world-histories w1 and w2 are placed into the same subset if the observer-moments in w1 and w2 have the exact same information. In other words, if w1 and w2 are in the same element of I, and w1 is the actual world-history, then M can't rule out either w1 or w2. I(w) is used to denote the element of I that contains w.

Common knowledge is defined as follows: If w is the actual world-history and two agents have information partitions I and J, an event E is common knowledge if E includes the member of the meet I∧J that contains w. The operation ∧ (meet) means to take the two sets I and J, form their union, then repeatedly merge any of its elements (which you recall are subsets of Ω) that overlap until it becomes a partition again (i.e., no two elements overlap).

It may not be clear at first what this meet operation has to do with common knowledge. Suppose the actual world-history is w. Then agent 1 knows I(w), so he knows that agent 2 must know one of the elements of J that overlaps with I(w). And he can reason that agent 2 must know that agent 1 knows one of the elements of I that overlaps with one of these elements of J. If he carries out this inference to infinity, he'll find that both agents know that the actual world-history is in (I∧J)(w), and both know the other know, and both know the other know the other know, and so on. In other words it is common knowledge that the actual world-history is in (I∧J)(w). Since event E occurs in every world-history in (I∧J)(w), it's common knowledge that E occurs in the actual world-history.

Proof for the agreement theorem then goes like this. Let E be the event that agent 1 assigns a posterior probability (conditioned on everything it knows) of q1 to event A and agent 2 assigns a posterior probability of q2 to event A. If E is common knowledge at w, then both agents know that P(A | I(v)) = q1 and P(A | J(v)) = q2 for every v in (I∧J)(w). But this implies P(A | (I∧J)(w)) = q1 and P(A | (I∧J)(w)) = q2 and therefore q1 = q2. (To see this, suppose you currently know only (I∧J)(w), and you know that no matter what additional information I(v) you obtain, your posterior probability will be the same q1, then your current probability must already be q1.)

Is Aumann Agreement Overrated?

Having explained all of that, it seems to me that this theorem is less relevant to a practical rationalist than I thought before I really understood it. After looking at the math, it's apparent that "common knowledge" is a much stricter requirement than it sounds. The most obvious way to achieve it is for the two agents to simply tell each other I(w) and J(w), after which they share a new, common information partition. But in that case, agreement itself is obvious and there is no need to learn or understand Aumann's theorem.

There are some papers that describe ways to achieve agreement in other ways, such as iterative exchange of posterior probabilities. But in such methods, the agents aren't just moving closer to each other's beliefs. Rather, they go through convoluted chains of deduction to infer what information the other agent must have observed, given his declarations, and then update on that new information. (The process is similar to the one needed to solve the second riddle on this page.) The two agents essentially still have to communicate I(w) and J(w) to each other, except they do so by exchanging posterior probabilities and making logical inferences from them.

Is this realistic for human rationalist wannabes? It seems wildly implausible to me that two humans can communicate all of the information they have that is relevant to the truth of some statement just by repeatedly exchanging degrees of belief about it, except in very simple situations. You need to know the other agent's information partition exactly in order to narrow down which element of the information partition he is in from his probability declaration, and he needs to know that you know so that he can deduce what inference you're making, in order to continue to the next step, and so on. One error in this process and the whole thing falls apart. It seems much easier to just tell each other what information the two of you have directly.

Finally, I now see that until the exchange of information completes and common knowledge/agreement is actually achieved, it's rational for even honest truth-seekers who share common priors to disagree. Therefore, two such rationalists may persistently disagree just because the amount of information they would have to exchange in order to reach agreement is too great to be practical. This is quite different from the understanding of Aumann agreement I had before I read the math.

76 comments

Comments sorted by top scores.

comment by jpet · 2009-12-12T03:08:53.777Z · LW(p) · GW(p)

I think there's another, more fundamental reason why Aumann agreement doesn't matter in practice. It requires each party to assume the other is completely rational and honest.

Acting as if the other party is rational is good for promoting calm and reasonable discussion. Seriously considering the possibility that the other party is rational is certainly valuable. But assuming that the other party is in fact totally rational is just silly. We know we're talking to other flawed human beings, and either or both of us might just be totally off base, even if we're hanging around on a rationality discussion board.

Replies from: gwern, TAG
comment by gwern · 2009-12-12T03:15:30.475Z · LW(p) · GW(p)

I believe Hanson's paper on 'Bayesian wannabes' shows that even only partially rational agents must agree about a lot.

Replies from: timtyler, jpet
comment by timtyler · 2009-12-12T08:58:24.800Z · LW(p) · GW(p)

Jaw-droppingly (for me), that paper apparently uses "Bayesians" to refer to agents whose primary goal involves seeking (and sharing) the truth.

IMO, "Bayesians" should refer to agents that employ Bayesian statistics, regardless of what their goals are.

That Hanson casually employs this other definition without discussing the issue or defending his usage says a lot about his attitude to the subject.

Replies from: Eliezer_Yudkowsky, aausch
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-12-13T03:42:23.388Z · LW(p) · GW(p)

I assume this just means that their primary epistemic goal is such, not that this is their utility function.

Replies from: timtyler, aausch
comment by timtyler · 2009-12-13T09:09:34.466Z · LW(p) · GW(p)

That's why I used the word "involves".

However, surely there are possible agents who are major fans of Bayesian statistics who don't have the time or motive to share their knowledge with other agents. Indeed, they may actively spread disinformation to other agents in order to manipulate them. Those folk are not bound to agree with other agents when they meet them.

comment by aausch · 2009-12-13T04:29:44.930Z · LW(p) · GW(p)

Won't the utility function eventually update to match?

comment by aausch · 2009-12-13T03:14:37.362Z · LW(p) · GW(p)

Maybe I lack imagination - is it possible for a strict Bayesian to do anything but seek and share the truth (assuming he is interacting with other Bayesians)?

Replies from: timtyler
comment by timtyler · 2009-12-13T09:16:06.316Z · LW(p) · GW(p)

Bayes rule is about how to update your estimates of the probability of hypotheses on the basis of incoming data. It has nothing to say about an agent's goal, or how it behaves. Agents can employ Bayesian statistics to update their world view while pursuing literally any goal.

If you think the term "Bayesian" implies an agent whose goal necessarily involves spreading truth to other agents, I have to ask for your references for that idea.

Replies from: aausch
comment by aausch · 2009-12-18T02:46:59.307Z · LW(p) · GW(p)

I am looking at the world around me, at the definition of Bayesian, and assuming the process has been going on in an agent for long enough for it to be properly called "a Bayesian agent", and think to myself - the agent space I end up in, has certain properties.

Of course, I'm using the phrase "Bayesian agent" to mean something slightly different than what the original poster intended.

Replies from: timtyler
comment by timtyler · 2009-12-18T10:09:06.997Z · LW(p) · GW(p)

Of course the agent space you end up in, has certain properties - but the issue is whether those properties necessarily involve sharing the truth with others.

I figure you can pursue any goal using Bayesian statistics - including goals that include attempting to deceive and mislead others.

For example, a Bayesian public relations officer for big tobacco would not be bound to agree with other agents that she met.

Replies from: aausch
comment by aausch · 2009-12-19T15:29:18.948Z · LW(p) · GW(p)

You're speaking of Bayesian agents as a general term to refer to anyone who happens to use Bayesian statistics for a specific purpose - and in that context, I agree with you. In that context, your statements are correct, by definition.

I am speaking of Bayesian agents using the idealized, Hollywood concept of agent. Maybe I should have been more specific and referred to super-agents, equivalent to super-spies.

I claim that someone who has lived and breathed the Bayes way will be significantly different than someone who has applied it, even very consistently, within a limited domain. For example, I can imagine a Bayesian super-agent working for big tobacco, but I see the probability of that event actually coming to pass as too small to be worth considering.

Replies from: timtyler
comment by timtyler · 2009-12-19T16:16:28.290Z · LW(p) · GW(p)

I don't really know what you mean. A "super agent"? Do you really think Bayesian agents are "good"?

Since you haven't really said what you mean, what do you mean? What are these "super agents" of which you speak? Would you know one if you met one?

Replies from: aausch
comment by aausch · 2009-12-20T05:24:20.270Z · LW(p) · GW(p)

Super-agent. You know, like James Bond, Mr. and Ms. Smith. Closer to the use, in context - Jeffreyssai.

Replies from: timtyler
comment by timtyler · 2009-12-20T10:47:33.588Z · LW(p) · GW(p)

Right... So: how about Lex Luthor or General Zod?

comment by jpet · 2009-12-13T04:31:20.527Z · LW(p) · GW(p)

I've seen the paper, but it assumes the point in question in the definition of partially rational agents in the very first paragraph:

If these agents agree that their estimates are consistent with certain easy-to-compute consistency constraints, then... [conclusion follows].

But peoples' estimates generally aren't consistent with his constraints, so even for someone who is sufficiently rational, it doesn't make any sense whatsoever to assume that everyone else is.

This doesn't mean Robin's paper is wrong. It just means that faced with a topic where we would "agree to disagree", you can either update your belief about the topic, or update your belief about whether both of us are rational enough for the proof to apply.

comment by TAG · 2019-11-07T10:54:25.327Z · LW(p) · GW(p)

[...] re­quires each party to as­sume the other is com­pletely ra­tio­nal and hon­est. [/..] But as­sum­ing that the other party is in fact to­tally ra­tio­nal is just silly.

Assuming honesty is pretty problematical, too. In real-world disputes, participants are likely to disagree about what constitutes evidence ("the Bible says.."), aren't rational, and suspect each others honesty.

comment by RobinHanson · 2009-12-10T22:58:49.800Z · LW(p) · GW(p)

Sure all by itself this first paper doesn't seem very relevant for real disagreements, but there is a whole literature beyond this first paper, which weakens the assumptions required for similar results. Keep reading.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2009-12-10T23:06:30.515Z · LW(p) · GW(p)

I already scanned through some of the papers that cite Aumann, but didn't find anything that made me change my mind. Do you have any specific suggestions on what I should read?

Replies from: SilasBarta, timtyler
comment by SilasBarta · 2009-12-10T23:57:07.772Z · LW(p) · GW(p)

Uh oh, it looks like you guys are doing the Aumann "meet" operation to update your beliefs about Aumann. Make sure to keep track of the levels of recursion...

comment by timtyler · 2009-12-11T00:00:08.714Z · LW(p) · GW(p)

Seen Hanson's own http://hanson.gmu.edu/deceive.pdf - and its references?

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2009-12-11T01:04:59.054Z · LW(p) · GW(p)

Yes, I looked at that paper, and also Agreeing To Disagree: A Survey by Giacomo Bonanno and Klaus Nehring.

Replies from: HalFinney
comment by HalFinney · 2009-12-11T16:49:40.889Z · LW(p) · GW(p)

How about Scott Aaronson:

http://www.scottaaronson.com/papers/agree-econ.pdf

He shows that you do not have to exchange very much information to come to agreement. Now maybe this does not address the question of the potential intractability of the deductions to reach agreement (the wannabe papers may do this) but I think it shows that it is not necessary to exchange all relevant information.

The bottom line for me is the flavor of the Aumann theorem: that there must be a reason why the other person is being so stubborn as not to be convinced by your own tenacity. I think this insight is the key to the whole conclusion and it is totally overlooked by most disagreers.

Replies from: Wei_Dai, timtyler, timtyler
comment by Wei Dai (Wei_Dai) · 2009-12-11T21:52:22.901Z · LW(p) · GW(p)

I haven't read the whole paper yet, but here's one quote from it (page 5):

The dependence, alas, is exponential in 1 / (δ^3 ε^6), so our simulation procedure is still not practical. However, we expect that both the procedure and its analysis can be considerably improved.

Scott is talking about the computational complexity of his agreement protocol here. Even if we can improve the complexity to something that is considered practical from a computer science perspective, that will still likely be impractical for human beings, most of whom can't even multiply 3 digit numbers in their heads.

comment by timtyler · 2009-12-12T15:44:30.293Z · LW(p) · GW(p)

To quote from the abstract of Scott Aaronson's paper:

"A celebrated 1976 theorem of Aumann asserts that honest, rational Bayesian agents with common priors will never agree to disagree": if their opinions about any topic are common knowledge, then those opinions must be equal."

Even "honest, rational, Bayesian agents" seems too weak. Goal-directed agents who are forced to signal their opinions to others can benefit from voluntarily deceiving themselves in order to effectively deceive others. Their self-deception makes their opinions more credible - since they honestly believe them.

If an agent honestly believes what they are saying, it is difficult to accuse them of dishonesty - and such an agent's understanding of Bayesian probability theory may be immaculate.

Such agents are not constrained to agree by Aumann's disagreement theorem.

Replies from: gwern
comment by gwern · 2010-05-15T17:12:06.389Z · LW(p) · GW(p)

Goal-directed agents who are forced to signal their opinions to others can benefit from voluntarily deceiving themselves in order to effectively deceive others. Their self-deception makes their opinions more credible - since they honestly believe them.

This seems to reflect human cognitive architecture more than a general fact about optimal agents or even most/all goal-directed agents. That humans are not optimal is nothing new around here, nor that the agreement theorems have little relevance to real human arguments. (I can't be the only one to read the papers and think, 'hell, I don't trust myself as far as even the weakened models, much less Creationists and whatnot', and have little use for them.)

comment by timtyler · 2009-12-11T20:48:30.760Z · LW(p) · GW(p)

The reason is often that you regard your own perceptions and conclusion as trustworthy and in accordance with your own aims - whereas you don't have a very good reason to believe the other person is operating in your interests (rather than selfishly trying to manipulate you to serve their own interests). They may reason in much the same way.

Probably much the same circuitry continues to operate even in those very rare cases where two truth-seekers meet, and convince each other of their sincerity.

comment by HalFinney · 2009-12-11T17:57:57.699Z · LW(p) · GW(p)

One question on your objections: how would you characterize the state of two human rationalist wannabes who have failed to reach agreement? Would you say that their disagreement is common knowledge, or instead are they uncertain if they have a disagreement?

ISTM that people usually find themselves rather certain that they are in disagreement and that this is common knowledge. Aumann's theorem seems to forbid this even if we assume that the calculations are intractable.

The rational way to characterize the situation, if in fact intractability is a practical objection, would be that each party says he is unsure of what his opinion should be, because the information is too complex for him to make a decision. If circumstances force him to adopt a belief to act on, maybe it is rational for the two to choose different actions, but they should admit that they do not really have good grounds to assume that their choice is better than the other person's. Hence they really are not certain that they are in disagreement, in accordance with the theorem. Again this is in striking contrast to actual human behavior even among wannabes.

Replies from: Wei_Dai, timtyler
comment by Wei Dai (Wei_Dai) · 2009-12-11T21:38:16.581Z · LW(p) · GW(p)

One question on your objections: how would you characterize the state of two human rationalist wannabes who have failed to reach agreement?

I would say that one possibility is that their disagreement is common knowledge, but they don't know how to reach agreement. From what I've learned so far, disagreements between rationalist wannabes can arise from 3 sources:

  • different priors
  • different computational shortcuts/approximations/errors
  • incomplete exchange of information

Even if the two rationalist wannabes agree that in principle they should have the same priors and the same computations, and full exchange of information, as of today they do not have general methods to solve any of these problems, can only try to work out their differences on a case-by-case basis, with high likelihood that they'll have to give up at some point before they reach agreement.

Again this is in striking contrast to actual human behavior even among wannabes.

Your suggestion of what rationalist wannabes should do intuitively makes a lot of sense to me. But perhaps one reason people don't do it is because they don't know that it is what they should do? I don't recall a post here or on OB that argued for this position, for example.

comment by timtyler · 2009-12-11T20:40:09.196Z · LW(p) · GW(p)

You mean "common knowledge" in the technical sense described in the post?

If so, your questions do not appear to make sense.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-12T14:52:37.423Z · LW(p) · GW(p)

Why not? They both know they disagree, they both know they both know they disagree, etc... Perhaps Agent 1 doesn't know 2's partitioning, or vice versa. Or perhaps their partitionings are common knowledge, but they lack the (computational ability) to actually determine the meet, for example, no?

Replies from: timtyler
comment by timtyler · 2009-12-12T15:18:01.123Z · LW(p) · GW(p)

Wei was hypothesising disagreement due to an incomplete exchange of information. In which case, the parties both know that they disagree, but don't have the time/energy/resources to sort each other's opinions out. Then Aumann's idea doesn't really apply.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-12T15:25:51.486Z · LW(p) · GW(p)

Aaah, okay. Though presumably at least one would know the probabilities that both assigned (and said "I disagree"...) that is, it would generally take a bit of a contrived situation for them to know they disagree, but neither to know anything about the other's probability other than that it's different.

(What happens if the successfully exchange probabilities, have unbounded computing power, they have shared common knowledge priors... But they don't know each other's partitioning... Or would the latter automatically be computed from the rest?)

Replies from: timtyler
comment by timtyler · 2009-12-12T15:51:21.713Z · LW(p) · GW(p)

Just one round of comparing probabilities is not normally enough for the parties involved to reach agreement, though.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-12T15:55:26.143Z · LW(p) · GW(p)

Well, if they do know each other's partitions and are computationally unbounded, then they would reach agreement after one step, wouldn't they? (or did I misunderstand the theorem?)

Or do you mean If they don't know each other's partitions, iterative exchange of updated probabilities effectively transmits the needed information?

comment by timtyler · 2009-12-10T23:57:59.071Z · LW(p) · GW(p)

Should people really adopt the "common knowledge" terminology? Surely that terminology is highly misleading and is responsible for many misunderstandings.

If people take common English words and give them an esoteric technical meaning that differs dramatically from a literal reading, then shouldn't they at least capitalise them?

comment by timtyler · 2009-12-11T00:13:27.975Z · LW(p) · GW(p)

I too found my understanding changed dramatically when I looked into Aumann's original paper. Basically, the result has a misleading billing - and those citing the result rarely seemed to bother explaining much about the actual result or its significance.

I also found myself wondering why people remained puzzled about the high observed levels of disagreement. It seems obvious to me that people are poor approximations of truth-seeking agents - and instead promote their own interests. If you understand that, then the existence of many real-world disagreements is explained: people disagree in order to manipulate the opinions and actions of others for their own benefit.

comment by Psy-Kosh · 2009-12-11T15:02:07.977Z · LW(p) · GW(p)

Sorry, I think I got a bit confused about the "meet" operation, mind clarifying?

is (I^J)(w) equal to the intersection of I(w) and J(w) (which seems to be the implied way it works based on the overall description here) or something else? (Since the definition of meet you gave involved unions rather than intersections, and some sort of merging operation)

Thanks.

EDIT: whoops. am stupid today. Meant to say intersection, not disjunction

Replies from: Vladimir_Nesov, janos
comment by Vladimir_Nesov · 2009-12-11T17:10:50.086Z · LW(p) · GW(p)

Meet of two partitions (in the context of this post) is the finest common coarsening of those partitions.

Consider the coarsening relation on the set of all partitions of the given set. Partition A is a coarsening of partition B if A can be obtained by "lumping together" some of the elements of B. Now, for this order, a "meet" of two partitions X and Y is a partition Z such that

  • Z is a coarsening of X, and it is a coarsening of Y
  • Z is the finest such partition, that is for any other Z' that is a coarsening of both X and Y, Z' is also a coarsening of Z.
Replies from: Tyrrell_McAllister, Psy-Kosh
comment by Tyrrell_McAllister · 2009-12-11T17:23:40.102Z · LW(p) · GW(p)

Meet of two partitions is the finest common coarsening of those partitions.

Under the usages familiar to me, the common coarsening is the join, not the meet. That's how "join" is used on the Wikipedia page for set partitions. Using "meet" to mean "common refinement" is the usage that makes sense to me in the context of the proof in the OP. [ETA: I've been corrected on this point; see below.]

Of course, what you call "meet" or "join" depends on which way you decide to direct the partial order on partitions. Unfortunately, it looks like both possibilities are floating around as conventions.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-11T17:59:16.284Z · LW(p) · GW(p)

See for example on Wikipedia: Common knowledge (logic)

It is not difficult to see that the common knowledge accessibility function [...] corresponds to the finest common coarsening of the partitions [...], which is the finitary characterization of common knowledge also given by Aumann in the 1976 article.

The idea is that the partitions define what each agent is able to discern, so no refinement of what a given agent can discern is possible (unless you perform additional communication). Aumann's agreement theorem is about a condition for when the agents already agree, without any additional discussion between them.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2009-12-11T18:23:16.937Z · LW(p) · GW(p)

Hmm. Then I am in a state of confusion much like Psy-Kosh's. These opposing convention aren't helping, but, at any rate, I evidently need to study this more closely.

Replies from: Wei_Dai, Vladimir_Nesov
comment by Wei Dai (Wei_Dai) · 2009-12-11T23:17:16.311Z · LW(p) · GW(p)

It was confusing for me too, which is why I gave an imperative definition: first form the union of I and J, then merge any overlapping elements. Did that not help?

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2009-12-13T03:25:45.300Z · LW(p) · GW(p)

Did that not help?

It should have. The fault is certainly mine. I skimmed your definition too lightly because you were defining a technical term ("meet") in a context (partitions) where I was already familiar with the term, but I hadn't suspected that it had any other usages than the one I knew.

comment by Vladimir_Nesov · 2009-12-11T18:29:53.375Z · LW(p) · GW(p)

The term "meet" would correspond to considering a coarser partition as "less" than a finer partition, which is natural enough if you see partitions as representing "precision of knowledge". The coarser partition is able to discern less. Greatest lower bound is usually called "meet".

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2009-12-11T19:13:02.468Z · LW(p) · GW(p)

Greatest lower bound is usually called "meet".

It's always called that, but the greatest lower bound and the least upper bound switch places if you switch the direction of the partial order. And there's a lot of literature on set partitions in which finer partitions are lower in the poset. (That's the convention used in the Wikipedia page on set partitions.)

The justification for taking the meet to be a refinement is that refinements correspond to intersections of partition elements, and intersections are meets in the poset of sets. So the terminology carries over from the poset of sets to the poset of set partitions in a way that appeals to the mathematician's aesthetic.

But I can see the justification for the opposite convention when you're talking about precision of knowledge.

comment by Psy-Kosh · 2009-12-11T17:15:42.615Z · LW(p) · GW(p)

Ah, thanks. In that case... wouldn't the meet of A and B often end up being the entire space?

For that matter, why this coarsening operation rather than the set of all the possible pairwise intersections between members of I and members of J?

ie, why coarsening instead if "fineing" (what's the appropriate word there anyways?)

When two rationalists exchange information, shouldn't their conclusions then sometimes be finer rather than coarser since they have, well, each gained information they didn't have previously?

Replies from: jimmy
comment by jimmy · 2009-12-11T18:57:56.287Z · LW(p) · GW(p)

If I've got this right...

When two rationalists exchange all information, their new partition is the 'join' of the two old partitions, where the join is the "coarsest common fining". If you plot omega as the rectangle with corners at (-1,-1) and (1,1) and the initial partitions are the x axis for agent A and the Y axis for agent B, then they share information and 'join' and then their common partition separates all 4 quadrants.

"common knowledge" is the set of questions that they can both answer before sharing information. This is the 'meet' which is the coarsest common fining. In the previous example, there is no information that they both share, so the meet becomes the whole quadrant.

If you extend omega down to y = -2 and modify the original partitions to both fence off this new piece on its own, then the join would be the original four squares plus this lower rectangle, while the meet would be the square from (-1,1) to (1,1) plus this lower rectangle (since they now have this as common knowledge).

Does this help?

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-12T10:41:39.732Z · LW(p) · GW(p)

wait, what? is it coarsest common fining or finest common coarsening that we're interested in here?

And isn't common knowledge the set of questions that not only they can both answer, but that they both know that both can answer, and both know that both know, etc etc etc?

Actually, maybe I need to reread this a bit more, but now am more confused.

Actually, on rereading, I think I'm starting to get the idea about meet and common knowledge (given that before exchanging info, they do know each other's partitioning, but not which particular partition the other has observed to be the current one).

Thanks!

comment by janos · 2009-12-11T15:33:42.965Z · LW(p) · GW(p)

Nope; it's the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.

Alternately if instead of I and J you think about the sigma-algebras they generate (let's call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-11T15:52:46.259Z · LW(p) · GW(p)

Then... I'm having trouble seeing why I^J wouldn't very often converge on the entire space.

ie, suppose a super simplification in which both agent 1 and agent 2 partition the space into only two parts, agent 1 partitioning it into I = {A1, B1}, and agent 2 partitioning into J = {A2, B2}

Suppose I(w) = A1 and J(w) = A2

Then, unless the two partitions are identical, wouldn't (I^J)(w) = the entire space? or am I completely misreading? And thanks for taking the time to explain.

Replies from: janos
comment by janos · 2009-12-11T16:10:54.624Z · LW(p) · GW(p)

That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that's still not common knowledge, because agent 1 doesn't know that agent 2 knows A1 union A2.

I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual "everyone knows that everyone knows that ... " definition of common knowledge translates to I(J(I(J(I(J(...(w)...).

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-11T16:26:44.078Z · LW(p) · GW(p)

Well, how is it not the intersection then?

ie, Agent 1 knows A1 and knows that Agent 2 knows A2

If they trust each other's rationality, then they both know that w must be in A1 and be in A2

So they both conclude it must be in intersection of A1 and A2, and they both know that they both know this, etc etc...

Or am I missing the point?

Replies from: janos
comment by janos · 2009-12-11T16:42:10.919Z · LW(p) · GW(p)

As far as I understand, agent 1 doesn't know that agent 2 knows A2, and agent 2 doesn't know that agent 1 knows A1. Instead, agent 1 knows that agent 2's state of knowledge is in J and agent 2 knows that agent 1's state of knowledge is in I. I'm a bit confused now about how this matches up with the meaning of Aumann's Theorem. Why are I and J common knowledge, and {P(A|I)=q} and {P(A|J)=q} common knowledge, but I(w) and J(w) are not common knowledge? Perhaps that's what the theorem requires, but currently I'm finding it hard to see how I and J being common knowledge is reasonable.

Edit: I'm silly. I and J don't need to be common knowledge at all. It's not agent 1 and agent 2 who perform the reasoning about I meet J, it's us. We know that the true common knowledge is a set from I meet J, and that therefore if it's common knowledge that agent 1's posterior for the event A is q1 and agent 2's posterior for A is q2, then q1=q2. And it's not unreasonable for these posteriors to become common knowledge without I(w) and J(w) becoming common knowledge. The theorem says that if you're both perfect Bayesians and you have the same priors then you don't have to communicate your evidence.

But if I and J are not common knowledge then I'm confused about why any event that is common knowledge must be built from the meet of I and J.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-12-11T17:10:31.961Z · LW(p) · GW(p)

Then agent 1 knows that agent 2 knows one of the members of J that have non empty intersection with I(w), and similar for for agent 2.

Presumably they have to tell each other which of their own partitions w is in, right? ie, presumably SOME sort of information sharing happens about each other's conclusions.

And, once that happens, seems like intersection I(w) and J(w) would be their resultant common knowledge.

I'm confused still though what the "meet" operation is.

Unless... the idea is something like this: they exchange probabilities. Then agent 1 reasons "J(w) is a member of J such that it both Intersects I(w) AND would assign that particular probability. So then I can determine the subset of I(w) that intersects with those" and determine a probability from there." And similar for agent 2. Then they exchange probabilities again, and go through an equivalent reasoning process to tighten the spaces a bit more... and the theorem ensures that they'd end up converging on the same probabilities? (each time they state unequal probabilities, they each learn more information and each one then comes up with a set that's a strict subset of the one they were previously considering, but each of their sets always contain the intersection of I(w) and J(w))?

Replies from: HalFinney
comment by HalFinney · 2009-12-11T17:45:08.292Z · LW(p) · GW(p)

Try a concrete example: Two dice are thrown, and each agent learns one die's value. In addition, each learns whether the other die is in the range 1-3 vs 4-6. Now what can we say about the sum of the dice?

Suppose player 1 sees a 2 and learns that player 2's die is in 1-3. Then he knows that player 2 knows that player 1's die is in 1-3. It is common knowledge that the sum is in 2-6.

You could graph it by drawing a 6x6 grid and circling the information partition of player 1 in one color, and player 2 in another color. You will find that the meet is a partition of 4 elements, each a 3x3 grid in one of the corners.

In general, anything which is common knowledge will limit the meet - that is, the meet partition the world is in will not extend to include world-states which contradict what is common knowledge. If 2 people disagree about global warming, it is probably common knowledge what the current CO2 level is and what the historical record of that level is. They agree on this data and each knows that the other agrees, etc.

The thrust of the theorem though is not what is common knowledge before, but what is common knowledge after. The claim is that it cannot be common knowledge that the two parties disagree.

Replies from: janos, Psy-Kosh
comment by janos · 2009-12-12T06:14:17.775Z · LW(p) · GW(p)

What I don't like about the example you provide is: what player 1 and player 2 know needs to be common knowledge. For instance if player 1 doesn't know whether player 2 knows whether die 1 is in 1-3, then it may not be common knowledge at all that the sum is in 2-6, even if player 1 and player 2 are given the info you said they're given.

This is what I was confused about in the grandparent comment: do we really need I and J to be common knowledge? It seems so to me. But that seems to be another assumption limiting the applicability of the result.

comment by Psy-Kosh · 2009-12-12T10:49:33.975Z · LW(p) · GW(p)

Not sure... what happens when the ranges are different sizes, or otherwise the type of information learnable by each player is different in non symmetric ways?

Anyways, thanks, upon another reading of your comment, I think I'm starting to get it a bit.

Replies from: pengvado
comment by pengvado · 2009-12-12T13:58:02.539Z · LW(p) · GW(p)

Different size ranges in Hal's example? Nothing in particular happens. It's ok for different random variables to have different ranges.

Otoh, if the players get different ranges about a single random variable, then they could have problems. Suppose there is one d6. Player A learns whether it is in 1-2, 3-4, or 5-6. Player B learns whether it is in 1-3 or 4-6.
And suppose the actual value is 1.
Then A knows it's 1-2. So A knows B knows it's 1-3. But A reasons that B reasons that if it were 3 then A would know it's 3-4, so A knows B knows A knows it's 1-4. But A reasons that B reasons that A reasons that if it were 4 then B would know it's 4-6, so A knows B knows A knows B knows it's 1-6. So there is no common knowledge, i.e. I∧J=Ω. (Omitting the argument w, since if this is true then it's true for all w.)

And if it were a d12, with ranges still size 2 and 3, then the partitions line up at one point, so the meet stops at {1-6, 7-12}.

comment by joshmatthew · 2020-09-21T21:47:49.529Z · LW(p) · GW(p)

I'm not sure I understand how $\Omega$ represents the set of world histories. If world histories were to live anywhere, they'd live in the sigma algebra — as collections of events, per the definition. If not, and every element of $\Omega$ truly is a world history, then how can $F$ represent "information about which events occur in which possible world-histories", when each $f \in F$ is made up of atoms from $\Omega$, that is, when every element in $F$ is a collection of world histories? One of these definitions ought to be recast, I believe. It might be most sensible to make $\Omega$ the set of all possible events across all possible histories, that way you can largely keep your other definitions as-is

Replies from: joshmatthew
comment by joshmatthew · 2020-09-21T21:54:55.565Z · LW(p) · GW(p)

Also, I am not sure the following claim is true: "which assigns a probability to each element of Ω, with the probabilities summing to 1". It *is* true that every sigma algebra must contain $\Omega$, and typically $P(\Omega)=1$. But P acts on $F$, not $\Omega$, and of course not every atom in $\Omega$ must occur in $F$. Since you preface with the claim that you write this partly as an exercise to check understanding of the underlying ideas, I would kindly suggest considering a read-through of chapter 2 of Pollard's excellent "User's Guide to Measure Theoretic Probability". It might clear up some of these matters

comment by TAG · 2019-11-07T11:03:26.656Z · LW(p) · GW(p)

Interesting that the problems with Aumann's theorem were pointed out ten years ago, but belief in it continues to be prevalent.

comment by Mike Bishop (MichaelBishop) · 2009-12-11T18:32:35.717Z · LW(p) · GW(p)

Diagrams would be wonderful, anyone up to drawing them?

comment by Tyrrell_McAllister · 2009-12-11T15:41:28.624Z · LW(p) · GW(p)

I think that I understand this proof now. Does the following dialogue capture it?

AGENT 1: My observations establish that our world is in the world-set S. However, as far as I can tell, any world in S could be our world.

AGENT 2: My observations establish that our world is in the world-set T. However, as far as I can tell, any world in T could be our world.

TOGETHER: So now we both know that our world is in the world-set ST—though, as far as we can tell, any world in ST could be our world. Therefore, since we share the same priors, we both arrive at the same value when we compute P(E | ST), the probability that a given event E occurred in our world.


ETA: janos's comment indicates that I'm missing something, but I don't have the time this second to think it through. Sounds like the set that they ultimately condition on isn't ST but rather a subset of it.

ETA2: Well, I couldn't resist thinking about it, even though I couldn't spare the time :). The upshot is that I don't understand janos's comment, and I agree with Psy-Kosh. As stated, for example, in this paper:

The meet [of partitions π1 and π2] has as blocks [i.e., elements] all nonempty intersections of a block from π1 with a block from π2.

From this it follows that the element of I∧J containing w is precisely I(w) ∩ J(w). So, unless I'm missing something, my dialogue above completely captures the proof in the OP.

ETA3: It turns out that both possible ways of orienting the partial order relation are in common use. Everything that I've seen discussing the theory of set partitions puts refinements lower in the lattice. This was the convention that I was using above. But, as Vladimir Nesov points out, it's natural to use the opposite convention when talking about epistemic agents, and this is the usage in Wei Dai's post. The clash between these conventions was a large part of the cause of my confusion. At any rate, under the convention that Wei Dai is using, the element of I∧J containing w is not in general I(w) ∩ J(w).

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2009-12-11T23:44:55.538Z · LW(p) · GW(p)

Your dialog is one way to achieve agreement, and what I meant when I said "simply tell each other I(w) and J(w)" however it is not what Aumann's proof is about. The dialog shows that two Bayesians with the same prior would always agree if they exchange enough information.

Aumann's proof is not really about how to reach agreement, but why disagreements can't be "common knowledge". The proof follows a completely different structure from your dialog.

From this it follows that the element of I∧J containing w is precisely I(w) ∩ J(w).

No, this is wrong. Please edit or delete it to avoid confusing others.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2009-12-11T23:56:26.893Z · LW(p) · GW(p)

From this it follows that the element of I∧J containing w is precisely I(w) ∩ J(w).

No, this is wrong. Please edit or delete it to avoid confusing others.

The implication that I asserted is correct. The confusion arises because both possible ways of orienting the partial order on partitions are common in the literature. But I'll note that in the comment.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-12T00:15:14.842Z · LW(p) · GW(p)

The problem is not in conventions and the literature, but in whether your interpretation captures the statement of the theorem discussed in the post. Ambiguity of the term is no excuse. By the way, "meet" is Aumann's usage as well, as can be seen from the first page of the original paper.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2009-12-13T03:54:10.225Z · LW(p) · GW(p)

Ambiguity of the term is no excuse.

Indeed. I plead guilty to reading hastily. I saw the term "meet" being used in a context where I already knew its definition (the only definition it had, so far as I knew), so I only briefly skimmed Wei Dai's own definition. Obviously I was too careless.

However, it really bears emphasizing how strange it is to put refinements higher in the partial order of partitions, at least from the perspective of the general theory of partial orders. Under the category theoretic definition of partial orders, PQ means that there is a map PQ. Now, to say that a partition Q is a coarsening of a partition P is to say that Q is a quotient P/~ of P. But such a quotient corresponds canonically to a map PQ sending each element p of P to the equivalence class in Q containing p. Indeed, Wei Dai is invoking just such maps when he writes "I(w)". In this case, Ω is construed as the discrete partition of itself (where each element is in its own equivalence class) and I is used (as an abuse of notation) for the canonical map of partitions I: Ω → I. The upshot is that one of these canonical partition maps PQ exists if and only if Q is a coarsening of P. Therefore, that is what PQ should mean. In the context of the general theory of partial orders, coarser partitions should be greater than finer ones.

comment by timtyler · 2009-12-11T00:04:21.100Z · LW(p) · GW(p)

Efforts to illuminate Aumann's disagreement result do seem rather rare - thanks for your efforts here.

Replies from: MichaelBishop
comment by dlro66 · 2016-11-06T15:00:52.769Z · LW(p) · GW(p)

It appears to me that reducing this to an equation is totally irrelevant, in that it obscures the premises of the argument, and an argument is only as good as the reliability of the premises. Moreover, the theorem appears faulty based on inductive logic, in that the premises can be true and the conclusion false. I'm really interested in why this thought process is wrong.

comment by jimmy · 2009-12-11T06:13:48.506Z · LW(p) · GW(p)

While I see your point, I wouldn't say that the agreement issue is over rated at all.

There are many disagreements that don't change at all over arbitrarily many iterations, which sure don't look right given AAT. Even if the beliefs don't converge exactly, I don't think its too much to ask for some motion towards convergence.

I think the more important parts are the parts that talk about predicting disagreements

Replies from: Johnicholas
comment by Johnicholas · 2009-12-11T13:42:25.009Z · LW(p) · GW(p)

Could robust statistics be relevant for explaining fixed points where disagreements do not change at all?

Roughly speaking, the idea of robust statistics is that the median or similar concepts may be preferable in some circumstances to the mean - and unlike the mean, the median routinely does not change at all, even when another datapoint changes.

Replies from: jimmy
comment by jimmy · 2009-12-11T18:37:37.814Z · LW(p) · GW(p)

I don't think that really helps. If you're treating someones beliefs as an outlier, then you're not respecting that person as a rationalist.

Even if you did take the median of your metaprobability distribution (which is not the odds you want to bet on, though you may want to profess them for some reason), eventually you should change your mind (most bothersome disagreements involve people confidently on opposite sides of the spectrum so the direction in which to update is obvious).

It could be that in practice most people update beliefs according to some more "robust" method, but to the extent that it freezes their beliefs under new real evidence, its a sucky way of doing it and you don't get a 'get out of jail free' card for doing it.

comment by AndrewKemendo · 2009-12-11T11:53:42.614Z · LW(p) · GW(p)

The main problem I have always had with this is that the reference set is "actual world history" when in fact that is the exact thing that observers are trying to decipher.

We all realize that there is in fact an "actual world history" however if it was known then this wouldn't be an issue. Using it as a reference set then, seems spurious in all practicality.

The most obvious way to achieve it is for the two agents to simply tell each other I(w) and J(w), after which they share a new, common information partition.

I think that summation is a good way to interpret the problem I addressed in as practical a manner as is currently available; I would note however that most people arbitrarily weight observational inference, so there is a skewing of the data.

The sad part about the whole thing is that both or all observers exchanging information may be the same deviation away from w such that their combined probabilities of l(w)are further away from w than either individually.

Replies from: janos
comment by janos · 2009-12-11T15:48:52.173Z · LW(p) · GW(p)

Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don't see what's wrong with this.

Replies from: AndrewKemendo
comment by AndrewKemendo · 2009-12-12T12:50:21.285Z · LW(p) · GW(p)

I suppose my post was poorly worded. Yes, in this case omega is the reference set for possible world histories.

What I was referring to was the baseline of w as an accurate measure. It is a normalizing reference, though not a set.