A Rationality Condition for CDT Is That It Equal EDT (Part 1)

abramdemski

A Rationality Condition for CDT Is That It Equal EDT (Part 1)

post by abramdemski · 2018-10-04T04:32:49.483Z · LW · GW · 14 comments

  Hyperreal Probability
  Hyperreal Bayes Nets & CDT=EDT
  Are We Really Eliminating Exploration?
    Ways of Taking Counterfactuals are Somewhat Interchangeable
    Exploration is Always Necessary for Learning Guarantees
    You Still Explore in Logical Time
None
14 comments

[Epistemic Status: this series of two posts gives some arguments which, in my eyes, make it difficult to maintain a position other than CDT=EDT, but not impossible. As I explain at the end of the second post, it is still quite tenable to suppose that CDT and EDT end up taking different actions.]

Previously, I argued that fair comparisons of CDT and EDT (in which the same problem representation is given to both decision theories) will conclude that CDT=EDT, under what I see as reasonable assumptions. Recently, Paul Christiano wrote a post arguing that, all things considered, the evidence strongly favors EDT. Jessica Taylor pointed out that Paul didn't address the problem of conditioning on probability zero events, but she came up with a novel way of addressing that problem by taking the limit of small probabilities: COEDT [AF · GW].

Here, I provide further arguments that rationality constraints point in the direction of COEDT-like solutions.

Note that I argue for the conclusion that CDT=EDT, which is somewhat different from arguing directly for EDT; my line of reasoning suggests some additional structure which could be missed by advocating EDT in isolation (or CDT in isolation). Paul's post described CDT as a very special case of EDT, in which our action is independent of other things we care about. This is true, but, we can also accurately describe EDT is a very special case of CDT where all probabilistic relationships which remain after conditioning on what we know turn out to also be causal relationships. I more often think in the second way, because CDT can have all sorts of counterfactuals based on how causation works. EDT claims that these are only correct when they agree with the conditional probabilities.

(ETA: When I say "CDT", I'm pointing at some kind of steel-man of CDT which uses logical counterfactuals rather than physical counterfactuals. TDT is a CDT in this sense, whereas UDT could be either CDT or EDT.)

This post will be full of conjectural sketches, and mainly serves to convey my intuitions about how COEDT could fit into the larger picture.

Hyperreal Probability

Initially, thinking about COEDT, I was concerned that although something important had been accomplished, the construction via limits didn't seem fundamental enough that it should belong in our basic notion of rationality. Then, I recalled how hyperreal numbers (which can be thought of as sequences of real numbers) are a natural generalization of decision theory. This crops up in several different forms in different areas of Bayesian foundations, but most critically for the current discussion, in the question of how to condition on probability zero events. Quoting an earlier post of mine:

In What Conditional Probabilities Could Not Be, Alan Hajek argues that conditional probability cannot possibly be defined by Bayes’ famous formula, due primarily to its inadequacy when conditioning on events of probability zero. He also takes issue with other proposed definitions, arguing that conditional probability should instead be taken as primitive.

The most popular way of doing this are Popper’s axioms of conditional probability. In Learning the Impossible (Vann McGee, 1994), it’s shown that conditional probability functions following Popper’s axioms and nonstandard-real probability functions with conditionals defined according to Bayes’ theorem are inter-translatable. Hajek doesn’t like the infinitesimal approach because of the resulting non-uniqueness of representation; but, for those who don’t see this as a problem but who put some stock in Hajek’s other arguments, this would be another point in favor of infinitesimal probability.

In other words, there is an axiomatization of probability -- Popper's axioms -- which takes conditional probability to be fundamental rather than derived. This approach is relatively unknown outside philosophy, but often advocated by philosophers as a superior notion of probability, largely because it allows one to condition on probability zero events. Popper's axioms are in some sense equivalent to allowing hyperreal probabilities, which also means (with a little mathematical hand-waving; I haven't worked this out in detail) we can think of them as a limit of a sequence of strictly nonzero probability distributions.

All of this agrees nicely with Jessica's approach.

I take this to strongly suggest that reasonable approaches to conditioning on probability zero events in EDT will share the limit-like aspect of Jessica's approach, even if it isn't obvious that they do. (Popper's axioms are "limit-like", but this was probably not obvious to Popper.) The major contribution of COEDT beyond this is to provide a particular way of constructing such limits.

(Having the idea "counterfactuals should look like conditionals in hyperreal probability distributions" is not enough to solve decision theory problems alone, since it is far from obvious how we should construct hyperreal probability distributions over logic to get reasonable logical counterfactuals.)

Hyperreal Bayes Nets & CDT=EDT

(The following argument is the only justification of the title of the post which will appear in Part 1. I'll have a different argument for the claim in the title in Part 2.)

The CDT=EDT argument can now be adapted to hyperreal structures. My original argument required:

1. Probabilities & Causal Structure are Compatible: The decision problem is given as a Bayes net, including an action node (for the actual action taken by the agent) and a decision node (for the mixed strategy the agent decides on). The CDT agent interprets this as a causal net, whereas the EDT agent ignores the causal information and treats it as a probability distribution.

2. Exploration: all action probabilities are bounded away from zero in the decision; that is, the decision node is restricted to mixed strategies in which each action gets some minimal probability.

3. Mixed-Strategy Ratifiability: The agents know the state of the decision node. (This can be relaxed to approximate self-knowledge under some additional assumptions.)

4. Mixed-Strategy Implementability: The action node doesn't have any parents other than the decision node.

I justified assumption #2 as an extension of the desire to give EDT a fair trial: EDT is only clearly-defined in cases with epsilon exploration, so I argued that CDT and EDT should be compared with epsilon-exploration. However, if you prefer CDT because EDT isn't well-defined when conditioning on probability zero actions, this isn't much of an argument.

We can now address this by requiring conditionals on probability zero events to be limits of sequences of conditionals in which the event has greater than zero probability. Or (I think equivalently), we think of the probability distribution as being the real part of a hyperreal probability distribution.

Having done this, we can apply the same CDT=EDT result to Bayes nets with hyperreal conditional probability tables. This shows that CDT still equals EDT without restricting to mixed strategies, so long as conditionals on zero-probability actions are defined via limits.

This still leaves the other questionable assumptions behind the CDT=EDT theorem.

#1 (compatible probability & causality): I framed this assumption as the main condition for a fair fight between CDT and EDT: if the causal structure is not compatible with the probability distribution, then you are basically handing different problems to CDT and EDT and then complaining that one gets worse results than the other. However, the case is not so clear as I made it out to be. In cases where CDT/EDT are in specific decision problems which they understand well, the causal structure and probabilistic structure must be compatible. However, boundedly rational agents will have inconsistent beliefs, and it may be that beliefs about causal structure are sometimes inconsistent with other beliefs. An advocate of CDT or EDT might say that the differentiating cases are on exactly such inconsistent examples.

Although I agree that it's important to consider how agents deal with inconsistent beliefs (that's logical uncertainty!), I don't currently think it makes sense to judge them on inconsistent decision problems. So, I'll set aside such problems.

Notice, however, that one might contest whether there's necessarily a reasonable causal structure at all, and deny #1 that way.

#3 (ratifiability): The ratifiability assumption is a kind of equilibrium concept; the agent's mixed strategy has to be in equilibrium with knowledge of that very mixed strategy. I argued that it is as much a part of understanding the situation the agent is in as anything else, and that it is usually approximately achievable (IE, doesn't cause terrible self-reference problems or imply logical omniscience). However, I didn't prove that a ratifiable equilibrium always exists! Non-existence would trivialize the result, making it into an argument from false premises to a false conclusion.

Jessica's COEDT results address this concern, showing that this level of self-knowledge is indeed feasible.

#4 (implementability): I think of this as the shakiest assumption; it is easy to set up decision problems which violate it. However, I tend to think such setups get the causal structure wrong. Other parents of the action should instead be thought of as children of the action. Furthermore, if an agent is learning about the structure of a situation by repeated exposure to that situation, implementability seems necessary for the agent to come to understand the situation it is in: parents of the action will look like children if you try to perform experiments to see what happens when you do different things.

I won't provide any direct arguments for the implementability constraint in the rest of this post, but I'll be discussing other connections between learning and counterfactual reasoning.

Are We Really Eliminating Exploration?

Ways of Taking Counterfactuals are Somewhat Interchangeable

When thinking about decision theory, we tend to focus on putting the agent in a particular well-defined problem. However, realistically, an agent has a large amount of uncertainty about the structure of the situation it is in. So, a big part of getting things right is learning what situation you're in.

Any reasonable way of defining counterfactuals for actions, be it CDT or COEDT or something else, is going to be able to describe essentially any combination of consequences for the different actions. So, for an agent who doesn't know what situation it is in, any system of counterfactuals is possible no matter how counterfactuals are defined. In some sense, this means that getting counterfactuals right will be mainly up to the learning. Choosing between different kinds of counterfactual reasoning is a bit like choosing different priors -- you would hope it gets washed out by learning.

Exploration is Always Necessary for Learning Guarantees

COEDT eliminates the need for exploration in 5-and-10, which intuitively means cases where it should be really, really obvious what to do. It isn't clear to what extent COEDT helps with other issues. I'm skeptical that COEDT alone will allow us to get the right counterfactuals for game-theoretic reasoning [AF · GW]. But, it is really clear that COEDT doesn't change the fundamental trade-off between learning guarantees (via exploration) and Bayesian optimality (without exploration).

This is illustrated by the following problem:

Scary Door Problem. According to your prior, there is some chance that doors of a certain red color conceal monsters who will destroy the universe if disturbed. Your prior holds that this is not very strongly correlated to any facts you could observe without opening such a door. So, there is no way to know whether such doors conceal universe-destroying monsters without trying them. If you knew such doors were free of universe-destroying monsters, there are various reasons why you might sometimes want to open them.

The scary door problem illustrates the basic trade-off between asymptotic optimality and subjective optimality. Epsilon exploration would guarantee that you occasionally open scary doors. If such doors conceal monsters, you destroy the universe. However, if you refuse to open scary doors, then it may be that you never learn to perform optimally in the world you're in.

What COEDT does is show that the scary door and 5-and-10 really are different sorts of problem. If there weren't approaches like COEDT which eliminate the need for exploration in 5-and-10, we would be forced to conclude that they're the same: no matter how easy the problem looks, you have to explore in order to learn the right counterfactuals.

So, COEDT shows that not all counterfactual reasoning has to reduce to learning. There are problems you can get right by reasoning alone. You don't always have to explore; you can refuse to open scary doors, while still reliably picking up $10.

I mentioned that choosing between different notions of counterfactual is kind of like choosing between different priors -- you might hope it gets washed out by learning. The scary door problem illustrates why we might not want the learning to be powerful enough to wash out the prior. This means getting the prior right is quite important.

You Still Explore in Logical Time

If you follow the logical time [AF · GW] analogy, it seems like you can't ever really construct logical counterfactuals without exploration in some sense: if you reason about a counterfactual, the counterfactual scenario exists somewhere in your logical past, since it is a real mathematical object. Hence, you must take the alternate action sometimes in order to reason about it at all.

So, how does a COEDT agent manage not to explore?

COEDT can be thought of as "learning" from an infinite sequence of agents who explore less and less. None of those agents are COEDT agents, but they get closer and closer. If each of these agents exists at a finite logical time, COEDT exists at an infinite logical time, greater than any of the agents COEDT learns from. So, COEDT doesn't need to explore because COEDT doesn't try to learn from agents maximally similar to itself; it is OK with a systematic difference between itself and the reference class it logically learns from.

This systematic difference may allow us to drive a wedge between the agent and its reference class to demonstrate problematic behavior. I won't try to construct such a case today.

In the COEDT post, Jessica says:

I consider COEDT to be major progress in decision theory. Before COEDT, there were (as far as I know) 3 different ways to solve 5 and 10, all based on counterfactuals:

• Causal counterfactuals (as in CDT), where counterfactuals are worlds where physical magic happens to force the agent's action to be something specific.

• Model-theoretic counterfactuals (as in modal UDT), where counterfactuals are models in which false statements are true, e.g. where PA is inconsistent.

• Probabilistic conditionals (as in reinforcement learning and logical inductor based decision theories such as LIEDT/LICDT and asymptotic decision theory [LW · GW]), where counterfactuals are possible worlds assigned a small but nonzero probability by the agent in which the agent takes a different action through "exploration"; note that ADT-style optimism is a type of exploration.

COEDT is a new way to solve 5 and 10. My best intuitive understanding is that, whereas ordinary EDT (using ordinary reflective oracles) seeks any equilibrium between beliefs and policy, COEDT specifically seeks a not-extremely-unstable equilibrium (though not necessarily one that is stable in the sense of dynamical systems), where the equilibrium is "justified" by the fact that there are arbitrarily close almost-equilibria. This is similar to trembling hand perfect equilibrium. To the extent that COEDT has counterfactuals, they are these worlds where the oracle distribution is not actually reflective but is very close to the actual oracle distribution, and in which the agent takes a suboptimal action with very small probability.

Based on my picture, I think COEDT belongs in the modal UDT class. Both proposals can be seen as a special sort of exploration where we explore if we are in a nonstandard model. Modal UDT explores if PA is inconsistent. COEDT explores if a randomly sampled positive real in the unit interval happens to be less than some nonstandard epsilon. :)

(Note that describing them in this way is a little misleading, since it makes them sound uncomputable. Modal UDT in particular is quite computable, if the decision problem has the right form and if we are happy to assume that PA is consistent.)

I'll be curious to see how well this analogy holds up. Will COEDT have fundamentally new behavior in some sense?

More thoughts to follow in Part 2.

14 comments

Comments sorted by top scores.

comment by Sniffnoy · 2018-10-10T07:34:53.576Z · LW(p) · GW(p)

So, I haven't really read this in any detail, but -- I am very, very wary of the use of hyperreal and/or surreal numbers here. While as I said I haven't taken a thorough look at this, to me these look like "well we need infinitesimals and this is what I've heard of" rather than having any real reason to pick one of these two. I seriously doubt that either is a good choice.

Hyperreals require picking a free ultrafilter; they're not even uniquely defined. Surreal numbers (pretty much) completely break limits. (Hyperreals kind of break limits too, due to being of uncountable cofinality, but not nearly as extensively as surreal numbers do, which are of proper-class cofinality.) If you're picking a number system, you need to consider what you're actually going to do with it. If you're going to do any sort of limits or integration with it -- and what else is probability for, if not integration? -- you probably don't want surreal numbers, because limits are not going to work there. (Some things that are normally done with limits can be recovered for surreals by other means, e.g. there's a surreal exponential, but you don't define it as a limit of partial sums, because that doesn't work. So, maybe you can develop the necessary theory based on something other than limits, but I'm pretty sure it's not something that already exists which you can just pick up and use.)

Again: Pick number systems for what they do. Hyperreals have a specific point, which is the transfer principle. If you're not going to be using the transfer principle, you probably don't want hyperreals. And as I already said, if you're going to be taking any sort of limit, you probably don't want surreals.

Consider asking whether you need a system of numbers at all. You mention sequences of real numbers; perhaps that's simply what you want? Sequences of real numbers, not modulo a free ultrafilter? You don't need to use an existing system of numbers, you can purpose-build one; and you don't need to use a system of numbers at all, you can just use appropriate objects, whatever they may be. (Oftentime it makes more sense to represent "orders of infinity" by functions of different growth rates -- or, I guess here, sequences of different growth rates.)

(Honestly if infinitesimal probabilities or utilities are coming up, I'd consider that a flag that something has likely gone wrong -- we have good reasons to use real numbers for these, which I'm sure you're already familiar with (but here's a link [LW · GW] for everyone else :P ) -- but I'll admit that I haven't read this thing in any detail and you are going beyond that sort of classical context so, hey, who knows.)

comment by jessicata (jessica.liu.taylor) · 2018-10-04T06:29:51.754Z · LW(p) · GW(p)

COEDT can be thought of as "learning" from an infinite sequence of agents who explore less and less.

Interestingly, the issue COEDT has with sequential decision problems [AF(p) · GW(p)] looks suspiciously similar to folk theorems in iterated game theory (which also imply that completely-aligned agents can get a very bad outcome because they will each maximally punish anyone who doesn't play the grim trigger strategy). There might be some kind of folk theorem for COEDT, though there's a complication in that, conditioning on yourself taking a probability-0 action, you get both worlds where you are being punished and ones where you are punishing someone else, which might mean counterfactual punishments can't be maximal for everyone at once (yay?).

COEDT ensures that conditionals on each action exist at all, but it doesn't ensure that agents behave even remotely sanely in these conditionals, as it's still conditioning on a very rare event, and the relevant rationality conditions permit agents to behave insanely with very small probability. What would be really nice is to get some set of conditional beliefs under which:

no one takes any strictly dominated actions with nonzero probability (i.e. an action such that all possible worlds where the agent takes this action are worse than all possible worlds where the agent doesn't)
conditional on any subset of the agents taking non-strictly-dominated actions, no agent takes any strictly dominated action with nonzero probability

(I suspect this is easier for common-payoff problems; for non-common-payoff problems, agents might take strictly dominated actions as a form of extortion)

COEDT doesn't get this but perhaps a similar construction (maybe using the hyperreals?) does.

comment by jessicata (jessica.liu.taylor) · 2018-10-04T05:54:28.188Z · LW(p) · GW(p)

#4 (implementability): I think of this as the shakiest assumption; it is easy to set up decision problems which violate it. However, I tend to think such setups get the causal structure wrong. Other parents of the action should instead be thought of as children of the action. Furthermore, if an agent is learning about the structure of a situation by repeated exposure to that situation, implementability seems necessary for the agent to come to understand the situation it is in: parents of the action will look like children if you try to perform experiments to see what happens when you do different things.

This assumption seems sketchy to me. In particular, what if you make 2 copies of a deterministic agent, move them physically far from each other, give them the same information, and ask each to select an action? Clearly, if a rational agent is uncertain about either agent's action, then they will believe the two agents' actions to be (perfectly) correlated. The two actions can't each be children of each other...

Replies from: abramdemski

↑ comment by abramdemski · 2018-10-04T14:38:38.057Z · LW(p) · GW(p)

I maybe should have clarified that when I say CDT I'm referring to a steel-man CDT which would use some notion of logical causality. I don't think the physical counterfactuals are a live hypothesis in our circles, but several people advocate reasoning which looks like logical causality.

Implementability asserts that you should think of yourself as logico-causally controlling your clone when it is a perfect copy.

Replies from: jessica.liu.taylor

↑ comment by jessicata (jessica.liu.taylor) · 2018-10-04T18:48:59.440Z · LW(p) · GW(p)

If your decision logico-causally controls your clone's decision and vice versa, doesn't that imply a non-causal model (since it has a cycle)?

In the case of an exact clone this is less of an issue since there's only one relevant logical fact. But in cases where something like correlated equilibrium is being emulated on logical uncertainty (as in this post), the decisions could be logically correlated without being identical.

[EDIT: in the case of correlated equilibrium specifically, there actually is a signal (which action you are told to take), and your action is conditionally independent of everything else given this signal, so there isn't a problem. However, in COEDT, each agent knows the oracle distribution but not the oracle itself, which means they consider their own action to be correlated with other agents' actions.]

comment by Chris_Leong · 2018-10-07T01:18:39.404Z · LW(p) · GW(p)

Surreal numbers are generally better than hyperreals for many use cases as hyperreals aren't unique and require you to choose a principle ultrafilter or something.

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2018-10-10T07:39:42.000Z · LW(p) · GW(p)

I've already mentioned this in a separate comment, but surreals come with a lot of problems of their own (basically, limits don't work). I don't like to say this, but your comment gives off the same "oh well we need infinitesimals and this is what I've heard of" impression as above. Pick systems of numbers based on what they do. Surreals probably don't do whatever's necessary here -- how are you going to do any sort of integration?

(Also, you mean a free ultrafilter, not a principal one.)

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2018-10-10T10:03:49.349Z · LW(p) · GW(p)

They solve a lot more problems than people realise. I'll be making a post on this soon. And what's the issue re: limits not working?

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2018-10-11T01:28:19.919Z · LW(p) · GW(p)

Later edits: various edits for clarity; also the "transfinite sequences suffice" thing is easy to verify, it doesn't require some exotic theorem
Yet later edit: Added another example
Two weeks later edit: Added the part about sign-sequence limits

So, to a large extent this is a problem with non-Archimedean ordered fields in general; the surreals just exacerbate it. So let's go through this in stages.

===Stage 1: Infinitesimals break limits===

Let's start with an example. In the real numbers, the limit as n goes to infinity of 1/n is 0. (Here n is a natural number, to be clear.)

If we introduce infinitesimals -- even just as minimally as, say, passing to R(ω) -- that's not so, because if you have some infinitesimal ε, the sequence will not get within ε of 0.

Of course, that's not necessarily a problem; I mean, that's just restating that our ordered field is no longer Archimedean, right? Of course 1/n is no longer going to go to 0, but is 1/n really the right thing to be looking at? How about, say, 1/x, as x goes to infinity, where x takes values in this field of ours? That still goes to 0. So it may seem like things are fine, like we just need to get these sequences out of our head and make sure we're always taking limits of functions, not sequences.

But that's not always so easy to do. What if we look at x^n, where |x|<1? If x isn't infinitesimal, that's no longer going to go to 0. It may still go to 0 in some cases -- like, in R(ω), certainly 1/ω^n will still go to 0 -- but 1/2^n sure won't. And what do we replace that with? 1/2^x? How do we define that? In certain settings we may be able to -- hell, there's a theory of the surreal exponential, so in the surreals we can -- but not in general. And doing that requires first inventing the surreal exponential, which -- well, I'll talk more about that later, but, hey, let's talk about that a bit right now. How are we going to define the exponential? Normally we define exp(x) to be the limit of 1, 1+x, 1+x+x^2/2... but that's not going to work anymore. If we try to take exp(1), expecting an answer of e, what we get is that the sequence doesn't converge due to the cloud of infinitesimals surrounding it; it'll never get within 1/ω of e. For some values maybe it'll converge, but not enough to do what we want.

Now the exponential is nice, so maybe we can find another definition (and, as mentioned, in the case of the surreals indeed we can, while obviously in the case of the hyperreals we can do it componentwise). But other cases can be much worse. Introducing infinitesimals doesn't break limits entirely -- but it likely breaks the limits that you're counting on, and that can be fatal on its own.

===Stage 2: Uncountable cofinality breaks limits harder===

Stage 2 is really just a slight elaboration of stage 1. Once your field is large enough to have uncountable cofinality -- like, say, the hyperreals -- no sequence (with domain the whole numbers) will converge (unless it's eventually constant). If you want to take limits, you'll need transfinite sequences of uncountable length, or you simply will not get convergence.

Again, when you can rephrase things from sequences (with domain the natural numbers) to functions (with domain your field), things are fine. Because obviously your field's cofinality is equal to itself. But you can't always do that, or at least not so easily. Again: It would be nice if, for |x|<1, we had x^n approaching 0, and once we hit uncountable cofinality, that is simply not going to happen for any nonzero x.

(A note: In general in topology, not even transfinite sequences are good enough for general limits, and you need nets/filters. But for ordered fields, transfinite sequences (of length equal to the field's cofinality) are sufficient. Hence the focus on transfinite sequences rather than being ultra-general and using nets.)

Note that of course the hyperreals are used for nonstandard analysis, but nonstandard analysis doesn't involve taking limits in the hyperreals -- that's the point; limits in the reals correspond to non-limit-based things in the hyperreals.

===Stage 3: The surreals break limits as hard as is possible===

So now we have the surreals, which take uncountable cofinality to the extreme. Our cofinality is no longer merely uncountable, it's not even an actual ordinal! The "cofinality" of the surreals is the "ordinal" represented by the class of all ordinals (or the "cardinal" of the class of all sets, if you prefer to think of cofinalities as cardinals). We have proper-class cofinality.

Limits of sequences are gone. Limits of ordinary transfinite sequences are gone. All that remains working are limits of sequences whose domain consists of the entire class of all ordinals. Or, again, other things with proper-class cofinality; 1/x still goes to 0 as x goes to infinity (again, letting x range over all surreals -- note that that that's a very strong notion of "goes to infinity"!) You still have limits of surreal functions of a surreal variable. But as I keep pointing out, that's not always good enough.

I mean, really -- in terms of ordered fields, the real numbers are the best possible setting for limits, because of the existence of suprema. Every set that's bounded above has a least upper bound. By contrast, in the surreals, no set that's bounded above has a least upper bound! That's kind of their defining property; if you have a set S and an upper bound b then, oops, {S|b} sneaks right inbetween. Proper classes can have suprema, yes, but, as I keep pointing out, you don't always have a proper class to work with; oftentimes you just have a plain old countably infinite set. As such, in contrast to the reals, the surreal numbers are the worst possible setting for limits.

The result is that doing things with surreals beyond addition and multiplication typically requires basically reinventing those things. Now, of course, the surreal numbers have something that vaguely resemble limits, namely, {left stuff|right stuff} -- the "simplest in an interval" construction. I mean, if you want, say, √2, you can just put {x∈Q, x>0, x^2<2 | x∈Q, x>0, x^2>2}, and, hey, you've got √2! Looks almost like a limit, doesn't it? Or a Dedekind cut? Sure, there's a huge cloud of infinitesimals surrounding √2 that will thwart attempts at limits, but the simplest-in-an-interval construction cuts right through that and snaps to the simplest thing there, which is of course √2 itself, not √2+1/ω or something.

Added later: Similarly, if you want, say, ω^ω, you just take {ω,ω^2,ω^3,...|}, and you get ω^ω. Once again, it gets you what a limit "ought" to get you -- what it would get you in the ordinals -- even though an actual limit wouldn't work in this setting.

But the problem is, despite these suggestive examples showing that snapping-to-the-simplest looks like a limit in some cases, it's obviously the wrong thing in others; it's not some general drop-in substitute. For instance, in the real numbers you define exp(x) as the limit of the sequence 1, 1+x, 1+x+x^2/2, etc. In the surreals we already know that won't work, but if you make the novice mistake in fixing it of instead trying to define exp(x) as {1,1+x,1+x+x^2/2,...|}, you will get not exp(1)=e but rather exp(1)=3. Oops. We didn't want to snap to something quite that simple. And that's hard to prevent.

You can do it -- there is a theory of the surreal exponential -- but it requires care. And it requires basically reinventing whatever theory it is that you're trying to port over to the surreal numbers, it's not a nice straight port like so many other things in mathematics. It's been done for a number of things! But not, I think, for the things you need here.

Martin Kruskal tried to develop a theory of surreal integration back in the 70s; he ultimately failed, and I'm pretty sure nobody has succeeded since. And note that this was for surreal functions of a single surreal variable. For surreal utilities and real probabilities you'd need surreal functions on a measure space, which I imagine would be harder, basically for cofinality reasons. And for this thing, where I guess we'd have something like surreal probabilities... well, I guess the cofinality issue gets easier -- or maybe gets easier, I don't want to say that it does -- but it raises so many others. Like, if you can do that, you should at least be able to do surreal functions of a single surreal variable, right? But at the moment, as I said, nobody knows how (I'm pretty sure).

In short, while you say that the surreals solve a lot more problems than people realize, my point of view is basically the opposite: From the point of view of applications, the surreal numbers are basically an attractive nuisance. People are drawn to them for obvious reasons -- surreals are cool! Surreals are fun! They include, informally speaking, all the infinities and infitesimals! But they can be a huge pain to work with, and -- much more importantly -- whatever it is you need them to do, they probably don't do it. "Includes all the infinities and infinitesimals" is probably not actually on your list of requirements; while if you're trying to do any sort of decision theory, some sort of theory of integration is.

You have basically no idea how many times I've had to write the same "no, you really don't want to use surreal utilities" comment here on LW. In fact years ago -- basically due to constant abuse of surreals (or cardinals, if people really didn't know what they were talking about) -- I wrote this article [LW · GW] here on LW, and (while it's not like people are likely to happen across that anyway) I wish I'd included more of a warning against using the surreals.

Basically, I would say, go where the math tells you to; build your system to the requirements, don't just go pulling something off the shelf unless it meets those requirements. And note that what you build might not be a system of numbers at all. I think people are often too quick to jump to the use of numbers in the first place. Real numbers get a lot of this, because people are familiar with them. I suspect that's the real historical reason why utility functions were initially defined as real-valued; we're lucky that they turned out to actually be appropriate!

(Added later: There is one other thing you can do in the surreals that kind of resembles a limit, and this is to take a limit of sign sequences. This at least doesn't have the cofinality problem; you can take a sign-sequence limit of a sequence. But this is not any sort of drop-in replacement for usual limits either, and my impression (not an expert here) is that it doesn't really work very well at all in the first place. My impression is that, while {left|right} can be a bit too oblivious to the details of the the inputs (if you're not careful), limits of sign sequences are a bit too finicky. For instance, defining e to be the sign-sequence limit of the partial sums 1, 2, 5/2, 8/3, 65/24... will work, but defining exp(x) analogously won't, because what if x is (as a real number) the logarithm of a dyadic rational? Instead of getting exp(log(2))=2, you'll get exp(log(2))=2-1/ω. (I'm pretty sure that's right.) There goes multiplicativity! Worse yet, exp(-log(2)) won't "converge" at all. Again, I can't rule out that, like {left|right}, it can be made to work with some care, but it's definitely not a drop-in replacement, and my non-expert impression is that it's overall worse than {left|right}. In any case, once again, the better choice is almost certainly not to use surreals.)

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2018-10-12T16:08:33.517Z · LW(p) · GW(p)

That's an exceptionally informative comment!

Do you know where I could find proofs of the following?

"Normally we define exp(x) to be the limit of 1, 1+x, 1+x+x^2/2, it'll never get within 1/ω of e."

"If you make the novice mistake in fixing it of instead trying to define exp(x) as {1,1+x,1+x+x^2/2,...|}, you will get not exp(1)=e but rather exp(1)=3."

I still need to read more about surreal numbers, but the thing I like about them is that you can always reduce the resolution if you can't solve the equation in the surreals. In some ways I view them as the ultimate reality and if we don't know the answer to something or only know the answer to a certain fineness, I think it's better to be honest about, rather than just fall back to an equivalence class over the surreals where we do know the answer. Actually, maybe that wasn't quite clear, I'm fine with falling back, but after its clear that we can't solve it to the finest degree.

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2018-10-13T08:04:42.382Z · LW(p) · GW(p)

(Note: I've edited some things in to be clearer on some points.)

Do you know where I could find proofs of the following?

"Normally we define exp(x) to be the limit of 1, 1+x, 1+x+x^2/2, it'll never get within 1/ω of e."

"If you make the novice mistake in fixing it of instead trying to define exp(x) as {1,1+x,1+x+x^2/2,...|}, you will get not exp(1)=e but rather exp(1)=3."

These are both pretty straightforward. For the first, say we're working in a non-Archimedean ordered field which contains the reals, we take the partial sums of the series 1+1+1/2+1/6+...; these are rational numbers, in particular they're real numbers. So if we have one of these partial sums, call it s, then e-s is a positive real number. So if you have some infinitesimal ε, it's larger than ε; that's what an infinitesimal is. The sequence will not get within ε of e.

For the second, note that 3={2|}, i.e., it's the simplest number larger than 2. So if you have {1,2,5/2,8/3,...|}, well, the simplest number larger than all of those is still 3, because you did nothing to exclude 3. 3 is a very simple number! By definition, if you want to not get 3, either your interval has to not contain 3, or it has to contain something even simpler than 3 (i.e., 2, 1, or 0). (This is easy to see if you use the sign-sequence representation -- remember that x is simpler than y iff the sign sequence of x is a proper prefix of the sign sequence of y.) The interval of surreals greater than those partial sums does contain 3, and does not contain 2, 1, or 0. So you get 3. That's all there is to it.

As for the rest of the comment... let me address this out of order, if you don't mind:

In some ways I view them as the ultimate reality

See, this is exactly the sort of thinking I'm trying to head off. How is that relevant to anything? You need to use something that actually fulfills the requirements of the problem.

On top of that, this seems... well, I don't know if you actually are making this error, but it seems rather reminiscent of the high school student's error of imagning that there's a single notion of "number" -- where every notion of "number" they know fits in C so "number" and "complex number" become identified. And this is false not just because you can go beyond C, but because there are systems of numbers that can't be fit together with C at all. (How does Q_p fit into this? Answer: It doesn't!)

(Actually, by that standard, shouldn't the surcomplexes be the "ultimate reality"? :) )

(...I actually have some thoughts on that sort of thing, but since I'm trying to point out right now that that sort of thing is not what you should be thinking about when determining what sort of space to use, I won't go into them. "Ultimate reality" is, in addition to not being correct, probably not on the list of requirements!)

Also, y'know, you don't necessarily need something that could be considered "numbers" at all, as I keep emphasizing.

Anyway, as to the mathematical part of what you were saying...

I still need to read more about surreal numbers, but the thing I like about them is that you can always reduce the resolution if you can't solve the equation in the surreals. In some ways I view them as the ultimate reality and if we don't know the answer to something or only know the answer to a certain fineness, I think it's better to be honest about, rather than just fall back to an equivalence class over the surreals where we do know the answer. Actually, maybe that wasn't quite clear, I'm fine with falling back, but after its clear that we can't solve it to the finest degree.

I have no idea what you're talking about here. Like, what? First off, what sort of equations are you talking about? Algebraic ones? Over the surreals, I guess? The surreals are a real closed field, the surcomplexes are algebraically closed. That will suffice for algebraic equations. Maybe you mean some more general sort, I don't know.

But most of this is just baffling. I have no idea what you're talking about when you speak of passing to a quotient of the surreals to solve any equation. Where is that coming from? And like -- what sort of quotient are we talking about here? "Quotient of the surreals" is already suspect because, well, it can't be a ring-theoretic quotient, as fields don't have nontrivial ideals, at all. So I guess you mean purely an additive quotient? But that's not going to mix very well with solving any equations that involve more than addition now, is it? Meanwhile what the surreals are known for is that any ordered field embeds in them, not something about quotients!

Anyway, if you want to solve algebraic equations, you want an algebraically closed field. If you want to solve algebraic equations to the greatest extent possible while still keeping things ordered, you want a real closed field. The surreals are a real closed field, but you certainly don't need them just for solving equations. If you want to be able to do limits and calculus and such, you want something with a nice topology (just how nice probably depends on just what you want), but note that you don't necessarily want a field at all! None of these things favor the surreals, and the fact that we almost certainly need integration here is a huge strike against them.

Btw, you know what's great for solving equations in, even if they aren't just algebraic equations? The real numbers. Because they're connected, so you have the intermediate value theorem. And they're the only ordered field that's connected. Again, you might be able to emulate that sort of thing to some extent in the surreals for sufficiently nice functions (mere continuity won't be enough) (certainly you can for polynomials, like I said they're real closed, but I'm guessing you can probably get more than that), I'm not super-familiar with just what's possible there, but it'll take more work. In the reals it's just, make some comparisons, they come out opposite one another, R is connected, boom, there's a solution somewhere inbetween.

But mostly I'm just wondering where like any of this is coming from. It neither seems to make much sense nor to resemble anything I know.

(Edit: And, once again, it's not at all clear that being able to solve equations is at all relevant! That just doesn't seem to be something that's required. Whereas integration is.)

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2018-10-13T23:02:00.119Z · LW(p) · GW(p)

"So if we have one of these partial sums, call it s, then e-s is a positive real number. So if you have some infinitesimal ε, it's larger than ε; that's what an infinitesimal is" - Are you sure this chain of reasoning is correct? Consider 1/2x. For any finite number of terms it will be greater than ε, but as x approaches ω, it should approach 1/2ω. Why can't the partial sum get within 1/ω of e?

"But it seems rather reminiscent of the high school student's error of imagining that there's a single notion of "number"" - okay, the term "ultimate reality" is a stretch. I can't imagine all possible applications, so I can't imagine all possible numbering systems. My point is that we don't just want to use a seperate numbering system for each individual problem. We want to be philosophically consistent and so there should be broad classes of problems for which we can identify a single numbering system as sufficient. And there's a huge set of problems (which I'm not going to even attempt to spec out here) for which surreals can be justified on a philosophical level, even if it is convenient to drop down to another number system for the actual calculations. Maybe an example will help, Newtonian physics and Special Relativity embedded in General Relativity. General relativity provides consistency for physics, even though we use one of the first two for the majority of calculations.

"I have no idea what you're talking about here. Like, what?"

You're right that it won't be a nice neat quotient group. But here's an example. N_0 - N_0 can equal any integer where N_0 is a cardinal, or even +/- N_0, but in surreal numbers it works as follows. Suppose X and Y are countable infinities. Then X - Y has a unique value that we can sometimes identify. For example, if X represents the length of a sequence and Y is all the elements in the sequences except one, then X - Y = 1. We can perform the calculation in the surreals, or we can perform it in the cardinals and receive a broad range of possible answers. But for every possible answer in the cardinals, we can find pairs of surreal numbers that would provide that answer.

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2018-10-14T04:09:08.608Z · LW(p) · GW(p)

(Hey, a note, you should probably learn to use the blockquote feature. I dunno where it is in the rich text editor if you're using that, but if you're using the Markdown editor you just precede the paragraph you're quoting with a ">". It will make your posts substantially more readable.)

Are you sure this chain of reasoning is correct?

Yes.

Consider 1/2x. For any finite number of terms it will be greater than ε, but as x approaches ω, it should approach 1/2ω.

What "terms"? What are you talking about? This isn't a sequence or a sum; there are no "terms" here. Yes, even in the surreals, as x goes to ω, 1/(2x) will approach 1/(2ω), as you say; as I mentioned above, limits of functions of a surreal variable will indeed still work. But that has no relevance to the case under discussion.

(And, while it's not necessary to see what's going on here, it may be helpful to remember that if we if we interpret this as occurring in the surreals, then in the case of 1/2x as x→ω, your domain has proper-class cofinality, while in the case of this infinite sum, the domain has cofinality ω. So the former can work, and the latter cannot. Again, one doesn't need this to see that -- the partial sum can't get within 1/ω of e even when the cofinality is countable -- but it may be worth remembering.)

Why can't the partial sum get within 1/ω of e?

Because the partial sum is always a rational number. A rational number -- more generally, a real number -- cannot be infinitesimally close to e without being e. (By contrast, for surreal x, 1/(2x) certainly does not need to be a real number, and so can get infinitesimally close to 1/(2ω) without being equal to it.)

You're right that it won't be a nice neat quotient group. But here's an example. N_0 - N_0 can equal any integer where N_0 is a cardinal, or even +/- N_0, but in surreal numbers it works as follows. Suppose X and Y are countable infinities. Then X - Y has a unique value that we can sometimes identify. For example, if X represents the length of a sequence and Y is all the elements in the sequences except one, then X - Y = 1. We can perform the calculation in the surreals, or we can perform it in the cardinals and receive a broad range of possible answers. But for every possible answer in the cardinals, we can find pairs of surreal numbers that would provide that answer.

What??

OK. Look. I could spend my time attempting to pick this apart. But, let me be blunt, the point I am trying to get across here is that you are talking nonsense. This is babble. You are way out of your depth, dude. You don't know what you are talking about. You need to go back and relearn this from the beginning. I don't even know what mistake you're making, because it's not a common one I recognize.

Just in the hopes it might be somewhat helpful, I will quickly go over the things I can maybe address quickly:

N_0 - N_0 can equal any integer where N_0 is a cardinal, or even +/- N_0, but in surreal numbers it works as follows.

I have no idea what this sentence is talking about.

Suppose X and Y are countable infinities.

What's an "infinity"? An ordinal? A cardinal? (There's only one countably infinite cardinal...) A surreal or something else entirely [LW · GW]? You said "countable", so it has to be something to which the notion of countability applies!

This mistake, at least, I think I can identify. Maybe you should, in fact, look over that "quick guide to the infinite" I wrote, because this is myth #0 I discussed there. There's no such thing as a unified notion of "infinities". There are different systems of numbers, some of them contain numbers/objects that are infinite (i.e.: larger in magnitude than any whole number), there is not some greater unified system they are all a part of.

Then X - Y has a unique value that we can sometimes identify.

What is X-Y? I don't even know what system of numbers you're using, so I don't know what this means.

If X and Y are surreals, then, sure, there's quite definitely a unique surreal X-Y. This is true more generally if you're thinking of X and Y as living in some sort of ordered field or ring.

If X and Y are cardinals, then X-Y may not be well-defined. Trivially so if Y>X (no possible values), but let's ignore that case. Even ignoring that, if X and Y are infinite, X-Y may fail to be well-defined due to having multiple possible values.

If X and Y are ordinals, we have to ask what sort of addition we're using. If we're using natural addition, then X-Y certainly has a unique value in the surreals, but it may or may not be an ordinal, so it's not necessarily well-defined within the ordinals.

If we're using ordinary addition, we have to distinguish between X-Y and -Y+X. (The latter just being a way of denoting "subtracting on the left"; it should not be interpreted as actually negating Y and adding to X.) -Y+X will have a unique value so long as Y≤X, but X-Y is a different story; even restricting to Y≤X, if X is infinite, then X-Y may have multiple possible values or none.

For example, if X represents the length of a sequence and Y is all the elements in the sequences except one, then X - Y = 1.

Yeah, not going to try to pick this apart, in short though this is nonsense.

I'm starting to think though that maybe you meant that X and Y were infinite sets, rather than some sort of numbers? With X-Y being the set difference? But that is not what you said. Simply put you seem very confused about all this.

We can perform the calculation in the surreals, or we can perform it in the cardinals and receive a broad range of possible answers.

Are X and Y surreals or are they cardinals? Surreals and cardinals don't mix, dude! It can't be both, not unless they're just whole numbers! You are performing the calculation in whatever number system these things live in.
You just said above you get a well-defined answer, and, moreover, that it's 1! Now you're telling me that you can get a broad range of possible answers??
If X is representing the length of a sequence, it should probably be an ordinal. As for Y... yeah, OK, not going to try to make sense of the thing I already said I wouldn't attempt to pick through.
And if X and Y are sets rather than numbers... oh, to hell with it, I'm just going to move on.

But for every possible answer in the cardinals, we can find pairs of surreal numbers that would provide that answer.

There is, I think, a correct idea here that is rescuable. It also seems pretty clear you don't know enough to perform that rescue yourself and rephrase this as something that makes sense. (A hint, though: The fixed version probably should not involve surreals.)

(Do surreal numbers even have cardinalities, in a meaningful sense? Yes obviously if you pick a particular way of representing surreals as sets, e.g. by representing them as sign sequences, the resulting representations will have cardinalities; obviously, that's not what I'm talking about. Although, who knows, maybe that's a workable notion -- define the cardinality of a surreal to be the cardinality of its birthday. No idea if that's actually relevant to anything, though.)

Even charitably interpreted, none of this matches up with your comments above about equivalence classes. It relates, sure, but it doesn't match. What you said above was that you could solve more equations by passing to equivalence classes. What you're saying now seems to be... not that.

Long story short: I really, really, do not think you have much idea what you are talking about. You really need to relearn this from scratch, and not starting with surreals. I definitely do not think you are prepared to go instructing others on their uses; at this point I'm not convinced you could clearly articulate what ordinals and cardinals are for, you've gotten everything so mixed up in your comment above. I wouldn't recommend trying to expand this into a post.

I think I should probably stop arguing here. If you reply to this with more babble I'm not going to waste my time replying to it further.

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2018-10-14T16:55:19.193Z · LW(p) · GW(p)

I really appreciate the time you've put into writing these long responses and I'll admit that there are some gaps in my understanding, but I don't think you've understood what I was saying at all. I suspect that this is a hazard with producing informal overviews of ideas + illusion of transparency. One example, when I said equivalence classes, I really meant something like equivalence classes. Anyway, in regards to all the points you've raised it'd take a lot of space to respond to them all, so I think I'll just add a link to my post when I get time to write it.

A Rationality Condition for CDT Is That It Equal EDT (Part 1)

Contents

Hyperreal Probability

Hyperreal Bayes Nets & CDT=EDT

Are We Really Eliminating Exploration?

Ways of Taking Counterfactuals are Somewhat Interchangeable

Exploration is Always Necessary for Learning Guarantees

You Still Explore in Logical Time

14 comments