Posts

Identifiability Problem for Superrational Decision Theories 2021-04-09T20:33:33.374Z
Phylactery Decision Theory 2021-04-02T20:55:45.786Z
Learning Russian Roulette 2021-04-02T18:56:26.582Z
Fisherian Runaway as a decision-theoretic problem 2021-03-20T16:34:42.649Z
A non-logarithmic argument for Kelly 2021-03-04T16:21:13.192Z
Learning Normativity: Language 2021-02-05T22:26:15.199Z
Limiting Causality by Complexity Class 2021-01-30T12:18:37.723Z
What is the interpretation of the do() operator? 2020-08-26T21:54:07.675Z
Towards a Formalisation of Logical Counterfactuals 2020-08-08T22:14:28.122Z

Comments

Comment by Bunthut on Momentum of Light in Glass · 2024-10-17T07:54:49.583Z · LW · GW

What are real numbers then? On the standard account, real numbers are equivalence classes of sequences of rationals, the finite diagonals being one such sequence. I mean, "Real numbers don't exist" is one way to avoid the diagonal argument, but I don't thinks that's what cubefox is going for.

Comment by Bunthut on How to Give in to Threats (without incentivizing them) · 2024-09-29T22:01:35.202Z · LW · GW

The society’s stance towards crime- preventing it via the threat of punishment- is not what would work on smarter people

This is one of two claims here that I'm not convinced by. Informal disproof: If you are a smart individual in todays society, you shouldn't ignore threats of punishment, because it is in the states interest to follow through anyway, pour encourager les autres. If crime prevention is in peoples interest, intelligence monotonicity implies that a smart population should be able to make punishment work at least this well. Now I don't trust intelligence monotonicity, but I don't trust it's negation either.

The second one is:

You can already foresee the part where you're going to be asked to play this game for longer, until fewer offers get rejected, as people learn to converge on a shared idea of what is fair.

Should you update your idea of fairness if you get rejected often? It's not clear to me that that doesn't make you exploitable again. And I think this is very important to your claim about not burning utility: In the case of the ultimatum game, Eliezers strategy burns very little over a reasonable-seeming range of fairness ideals, but in the complex, high-dimensional action spaces of the real world, it could easily be almost as bad as never giving in, if there's no updating.

Comment by Bunthut on Superrational Agents Kelly Bet Influence! · 2024-09-19T22:53:19.957Z · LW · GW

Maybe I'm missing something, but it seems to me that all of this is straightforwardly justified through simple selfish pareto-improvements.

Take a look at Critchs cake-splitting example in section 3.5. Now imagine varying the utility of splitting. How high does it need to get, before [red->Alice;green->Bob] is no longer a pareto improvement over [(split)] from both player's selfish perspective before the observation? It's 27, and thats also exactly where the decision flips when weighing Alice 0.9 and Bob 0.1 in red, and Alice 0.1 and Bob 0.9 in green.

Intuitively, I would say that the reason you don't bet influence all-or-nothing, or with some other strategy, is precisely because influence is not money. Influence can already be all-or-nothing all by itself, if one player never cares that much more than the other. The influence the "losing" bettor retains in the world where he lost is not some kind of direct benefit to him, the way money would be: it functions instead as a reminder of how bad a treatment he was willing to risk in the unlikely world, and that is of course proportional to how unlikely he thought it is.

So I think all this complicated strategizing you envision in influence betting, actually just comes out exactly to Critches results. Its true that there are many situations where this leads to influence bets that don't matter to the outcome, but they also don't hurt. The theorem only says that actions must be describable as following a certain policy, it doesn't exclude that they can be described by other policies as well.

Comment by Bunthut on Physical Therapy Sucks (but have you tried hiding it in some peanut butter?) · 2024-09-10T23:09:37.836Z · LW · GW

The timescale for improvement is dreadfully long and the day-to-day changes are imperceptible.

 

This sounded wrong, but I guess is technically true? I had great in-session improvements as I'm warming up the area and getting into it, and the difference between a session where I missed the previous day, and one where I didn't, is absolutely preceptible. Now after that initial boost, it's true that I couldn't tell if the "high point" was improving day to day, but that was never a concern - the above was enough to give me confidence. Plus with your external rotations, was there not perceptible strength improvement week to week?

Comment by Bunthut on FixDT · 2024-09-10T22:50:24.069Z · LW · GW

So I've reread your section on this, and I think I follow that, but its arguing a different claim. In the post, you argue that a trader that correctly identifies a fixed point, but doesn't have enough weight to get it played, might not profit from this knowledge. That I agree with.

But now you're saying that even if you do play the new fixed point, that trader still won't gain?

I'm not really calling this a proof because it's so basic that something else must have gone wrong, but:

 has a fixed point at , and  doesn't. Then . So if you decide to play , then   predicts , which is wrong, and gets punished. By continuity, this is also true in some neighborhood around p. So if you've explored your way close enough, you win.

Comment by Bunthut on FixDT · 2024-09-09T20:49:50.395Z · LW · GW

On reflection, I didn't quite understand this exploration business, but I think I can save a lot of it.

>You can do exploration, but the problem is that (unless you explore into non-fixed-point regions, violating epistemic constraints) your exploration can never confirm the existence of a fixed point which you didn't previously believe in.

I think the key here is in the word "confirm". Its true that unless you believe p is a fixed point, you can't just try out p and see the result. However, you can change your beliefs about p based on your results from exploring things other than p. (This is why I call the thing I'm objecting to humean trolling.) And there is good reason to think that the available fixed points are usually pretty dense in the space. For example, outside of the rule that binarizes our actions, there should usually be at least one fixed point for every possible action. Plus, as you explore, your beliefs change, creating new believed-fixed-points for you to explore.

>I think your idea for how to find repulsive fixed-points could work if there's a trader who can guess the location of the repulsive point exactly rather than approximately

I don't think thats needed. If my net beliefs have a closed surface in propability space on which they push outward, then necessarily those beliefs have a repulsive fixed point somewhere in that surface. I can then explore that believed fixed point. Then if its not a true fixed point, and I still believe in the closed surface, theres a new fixed point in that surface that I can again explore (generally more in the direction I just got pushed away from). This should converge on a true fixed point. The only thing that can go wrong is that I stop believing in the closed surface, and it seems like I should leave open that possibility - and even then, I might believe in it again after I do some checking along the outside.

>However, the wealth of that trader will act like a martingale; there's no reliable profit to be made (even on average) by enforcing this fixed point. 

This I don't understand at all. If you're in a certain fixed point, shouldn't the traders that believe in it profit from the ones that don't?

Comment by Bunthut on FixDT · 2024-08-30T18:24:13.561Z · LW · GW

I don't think the learnability issues are really a problem. I mean, if doing a handstand with a burning 100 riyal bill between your toes under the full moon is an exception to all physical laws and actually creates utopia immediately, I'll never find out either. Assuming you agree that that's not a problem, why is the scenario you illustrate? In both cases, it's not like you can't find out, you just don't, because you stick to what you believe is the optimal action.

I don't think this would be a significant problem in practice any more than other kinds of humean trolling are. It always seems much more scary in these extremely barebones toy problems, where the connection between the causes and effects we create really are kind of arbitrary. I especially don't think it will be possible to learn the couterfactuals of FDTish cooperation and such in these small settings, no matter the method.

Plus you can still do value-of-information exploration. The repulsive fixed points are not that hard to find if you're looking for them. If you've encircled one and found repulsion all around the edge, you know there must be one in there, and can get there with a procedure that just reverses your usual steps. Combining this with simplicity priors over a larger setting into which the problem is integrated, I don't think its any more worrying than the handstand thing.

Comment by Bunthut on People care about each other even though they have imperfect motivational pointers? · 2022-11-22T08:57:19.817Z · LW · GW

That prediction may be true. My argument is that "I know this by introspection" (or, introspection-and-generalization-to-others) is insufficient. For a concrete example, consider your 5-year-old self. I remember some pretty definite beliefs I had about my future self that turned out wrong, and if I ask myself how aligned I am with it I don't even know how to answer, he just seems way too confused and incoherent.

I think it's also not absurd that you do have perfect caring in the sense relevant to the argument. This does not require that you don't make mistakes currently. If you can, with increasing intelligence/information, correct yourself, then the pointer is perfect in the relevant sense. "Caring about the values of person X" is relatively simple and may come out of evolution whereas "those values directly" may not.

Comment by Bunthut on People care about each other even though they have imperfect motivational pointers? · 2022-11-11T10:50:02.347Z · LW · GW

This prediction seems flatly wrong: I wouldn’t bring about an outcome like that. Why do I believe that? Because I have reasonably high-fidelity access to my own policy, via imagining myself in the relevant situations.

This seems like you're confusing two things here, because the thing you would want is not knowable by introspection. What I think you're introspecting is that if you'd noticed that the-thing-you-pursued-so-far was different from what your brother actually wants, you'd do what he actually wants. But the-thing-you-pursued-so-far doesn't play the role of "your utility function" in the goodhart argument. All of you plays into that. If the goodharting were to play out, your detector for differences between the-thing-you-pursued-so-far and what-your-brother-actually-wants would simply fail to warn you that it was happening, because it too can only use a proxy measure for the real thing.

Comment by Bunthut on Notes on "Can you control the past" · 2022-10-27T09:37:36.712Z · LW · GW

The idea is that we can break any decision problem down by cases (like "insofar as the predictor is accurate, ..." and "insofar as the predictor is inaccurate, ...") and that all the competing decision theories (CDT, EDT, LDT) agree about how to aggregate cases.

Doesn't this also require that all the decision theories agree that the conditioning fact is independent of your decision?

Otherwise you could break down the normal prisoners dilemma into "insofar as the opponent makes the same move as me" and "insofar as the opponent makes the opposite move" and conclude that defect isn't the dominant strategy even there, not even under CDT.

And I imagine the within-CDT perspective would reject an independent probability for the predictors accuracy. After all, theres an independent probability it guessed 1-box, and if I 1-box it's right with that probability, and if I 2-box it's right with 1 minus that probability.

Comment by Bunthut on Impossibility results for unbounded utilities · 2022-03-30T09:30:43.958Z · LW · GW

Would a decision theory like this count as "giving up on probabilities" in the sense in which you mean it here?

Comment by Bunthut on Challenges to Yudkowsky's Pronoun Reform Proposal · 2022-03-15T12:31:42.938Z · LW · GW

I think your assessments of whats psychologically realistic are off.

I do not know what it feels like from the inside to feel like a pronoun is attached to something in your head much more firmly than "doesn't look like an Oliver" is attached to something in your head.

I think before writing that, Yud imagined calling [unambiguously gendered friend] either pronoun, and asked himself if it felt wrong, and found that it didn't. This seems realistic to me: I've experienced my emotional introspection becoming blank on topics I've put a lot of thinking into. This doesn't prevent doing the same automatic actions you always did, or knowing what those would be in a given situation. If something like this happened to him for gender long enough ago, he may well not be able to imagine otherwise.

But the "everyone present knew what I was doing was being a jerk" characterization seems to agree that the motivation was joking/trolling. How did everyone present know? Because it's absurd to infer a particular name from someone's appearance.

It's unreasonable, but it seems totally plausible that on one occasion you would feel like you know someone has a certain name, and continue feeling that way even after being rationally convinced you're wrong. That there are many names only means that the odds of any particular name featuring in such a situation is low, not that the class as a whole has low odds, and I don't see why the prior for that would be lower than for e.g. mistaken deja vu experiences.

Comment by Bunthut on The Case for Radical Optimism about Interpretability · 2021-12-22T22:22:05.414Z · LW · GW

I don't think the analogy to biological brains is quite as strong. For example, biological brains need to be "robust" not only to variations in the input, but also in a literal sense, to forceful impact or to parasites trying to control it. It intentionally has very bad suppressability, and this means there needs to be a lot of redundancy, which makes "just stick an electrode in that area" work. More generally, it is under many constraints that a ML system isn't, probably too many for us to think of, and it generally prioritizes safety over performance. Both lead away from the sort of maximally efficient compression that makes ML systems hard to interpret. 

Analogously: Imagine a programmer would write the shortest program that does a given task. That would be terrible. It would be impossible to change anything without essentially redesigning everything, and trying to understand what it does just from reading the code would be very hard, and giving a compressed explanation of how it does that would be impossible. In practice, we don't write code like that, because we face constraints like those mentioned above - but its very easy to imagine that some optimization-based "automatic coder" would program like that. Indeed, on the occasion that we need to really optimize runtimes, we move in that direction ourselves.

So I don't think brains tell us much about the interpretability of the standard, highly optimized neural nets.

Comment by Bunthut on Rationality Quotes June 2013 · 2021-12-21T22:33:09.080Z · LW · GW

Probably way too old here, but I had multible experiences relevant to the thread.

Once I had a dream and then, in the dream, I remembered I had dreamt this exact thing before, and wondered if I was dreaming now, and everything looked so real and vivid that I concluded I was not.

I can create a kind of half-dream, where I see random images and moving sequences at most 3 seconds or so long, in succession. I am really dimmed but not sleeping, and I am aware in the back of my head that they are only schematic and vague.

I would say the backstories in dreams are different in that they can be clearly nonsensical. E.g. I hold and look at a glass relief, there is no movement at all, and I know it to be a movie. I know nothing of its content, and I dont believe the image of the relief to be in the movie.

Comment by Bunthut on My Current Take on Counterfactuals · 2021-05-25T20:48:50.183Z · LW · GW

I think its still possible to have a scenario like this. Lets say each trader would buy or sell a certain amount when the price is below/above what they think it to be, but the transition being very steep instead of instant. Then you could still have long price intervalls where the amounts bought and sold remain constant, and then every point in there could be the market price.

I'm not sure if this is significant. I see no reason to set the traders up this way other than the result in the particular scenario that kicked this off, and adding traders who don't follow this pattern breaks it. Still, its a bit worrying that trading strategies seem to matter in addition to beliefs, because what do they represent? A traders initial wealth is supposed to be our confidence in its heuristics - but if a trader is mathematical heuristics and trading strategy packaged, then what does confidence in the trading strategy mean epistemically? Two things to think about:

Is it possible to consistently define the set of traders with the same beliefs as trader X?

It seems that logical induction is using a trick, where it avoids inconsistent discrete traders, but includes an infinite sequence of continuous traders with ever steeper transitions to get some of the effects. This could lead to unexpected differences between behaviour "at all finite steps" vs "at the limit". What can we say about logical induction if trading strategies need to be lipschitz-continuous with a shared upper limit on the lipschitz constant?

Comment by Bunthut on Hufflepuff Cynicism on Hypocrisy · 2021-05-25T20:31:56.824Z · LW · GW

So I'm not sure what's going on with my mental sim. Maybe I just have a super-broad 'crypto-moral detector' that goes off way more often than yours (w/o explicitly labeling things as crypto-moral for me).

Maybe. How were your intuitions before you encountered LW? If you already had a hypocrisy intuition, then trying to internalize the rationalist perspective might have lead it to ignore the morality-distinction.

Comment by Bunthut on Hufflepuff Cynicism on Hypocrisy · 2021-05-20T20:38:23.964Z · LW · GW

My father playing golf with me today, telling me to lean down more to stop them going out left so much.

Comment by Bunthut on Hufflepuff Cynicism on Hypocrisy · 2021-05-19T14:16:12.368Z · LW · GW

I don't strongly relate to any of these descriptions. I can say that I don't feel like I have to pretend advice from equals is more helpful than it is, which I suppose means its not face. The most common way to reject advice is a comment like "eh, whatever" and ignoring it. Some nerds get really mad at this and seem to demand intellectual debate. This is not well received. Most people give advice with the expectation of intellectual debate only on crypto-moral topics (this is also not well received generally, but the speaker seems to accept that as an "identity cost"), or not at all.

Comment by Bunthut on Hufflepuff Cynicism on Hypocrisy · 2021-05-19T13:46:47.592Z · LW · GW

You mean advice to diet, or "technical" advice once its established that person wants to diet? I don't have experience with either, but the first is definitely crypto-moral.

Comment by Bunthut on My Current Take on Counterfactuals · 2021-05-19T13:26:36.738Z · LW · GW

This excludes worlds which the deductive process has ruled out, so for example if  has been proved, all worlds will have either A or B. So if you had a bet which would pay $10 on A, and a bet which would pay $2 on B, you're treated as if you have $2 to spend.

I agree you can arbitrage inconsistencies this way, but it seems very questionable. For one, it means the market maker needs to interpret the output of the deductive process semantically. And it makes him go bankrupt if that logic is inconsistent. And there could be a case where a proposition is undecidable, and a meta-proposition about it is undecidable, and a meta-meta-propopsition about it is undecidable, all the way up, and then something bad happens, though I'm not sure what concretely.

Comment by Bunthut on My Current Take on Counterfactuals · 2021-05-19T13:08:01.738Z · LW · GW

Why is the price of the un-actualized bet constant? My argument in the OP was to suppose that PCH is the dominant hypothesis, so, mostly controls market prices.

Thinking about this in detail, it seems like what influence traders have on the market price depends on a lot more of their inner workings than just their beliefs. I was thinking in a way where each trader only had one price for the bet, below which they bought and above which they sold, no matter how many units they traded (this might contradict "continuous trading strategies" because of finite wealth), in which case there would be a range of prices that could be the "market" price, and it could stay constant even with one end of that range shifting. But there could also be an outcome like yours, if the agents demand better and better prices to trade one more unit of the bet.

Comment by Bunthut on My Current Take on Counterfactuals · 2021-05-19T12:40:33.842Z · LW · GW

But now, you seem to be complaining that a method that explicitly avoids Troll Bridge would be too restrictive?

No, I think finding such a no-learning-needed method would be great. It just means your learning-based approach wouldn't be needed.

You seem to be arguing that being susceptible to Troll Bridge should be judged as a necessary/positive trait of a decision theory.

No. I'm saying if our "good" reasoning can't tell us where in Troll Bridge the mistake is, then something that learns to make "good" inferences would have to fall for it.

But there are decision theories which don't have this property, such as regular CDT, or TDT (depending on the logical-causality graph). Are you saying that those are all necessarily wrong, due to this?

A CDT is only worth as much as its method of generating counterfactuals. We generally consider regular CDT (which I interpret as "getting its counterfactuals from something-like-epsilon-exploration") to miss important logical connections. "TDT" doesn't have such a method. There is a (logical) causality graph that makes you do the intuitively right thing on Troll Bridge, but how to find it formally?

A strong candidate from my perspective is the inference from  to  

Isn't this just a rephrasing of your idea that the agent should act based on C(A|B) instead of B->A? I don't see any occurance of ~(A&B) in the troll bridge argument. Now, it is equivalent to B->~A, so perhaps you think one of the propositions that occur as implications in troll bridge should be parsed this way? My modified troll bridge parses them all as counterfactual implication.

For example, I could have a lot of prior mass on "crossing gives me +10, not crossing gives me 0". Then my +10 hypothesis would only be confirmed by experience. I could reason using counterfactuals

I've said why I don't think "using counterfactuals", absent further specification, is a solution. For the simple "crossing is +10" belief... you're right its succeeds, and insofar as you just wanted to show that its rationally possible to cross, I suppose it does.

This... really didn't fit into my intuitions about learning. Consider that there is also the alternative agent who believes that crossing is -10, and sticks to that. And the reason he sticks to that isn't that hes to afraid and VOI isn't worth it: while its true that he never empirically confirms it, he is right, and the bridge would blow up if he were to cross it. That method works because it ignores the information in the problem description, and has us insert the relevant takeaway without any of the confusing stuff directly into its prior. Are you really willing to say: Yup, thats basically the solution to counterfactuals, just a bit of formalism left to work out?

Comment by Bunthut on My Current Take on Counterfactuals · 2021-05-18T08:03:47.749Z · LW · GW

So I don't see how we can be sure that PCH loses out overall. LCH has to exploit PCH -- but if LCH tries it, then we're seemingly in a situation where LCH has to sell for PCH's prices, in which case it suffers the loss I described in the OP.

So I've reread the logical induction paper for this, and I'm not sure I understand exploitation. Under 3.5, it says:

On each day, the reasoner receives 50¢ from T, but after day t, the reasoner must pay $1 every day thereafter.

So this sounds like before day t, T buys a share every day, and those shares never pay out - otherwise T would receive $t on day t in addition to everything mentioned here. Why?

In the version that I have in my head, theres a market with PCH and LCH in it that assigns constant price to the unactualised bet, so neither of them gain or lose anything with their trades on it, and LCH exploits PCH on the actualised ones.

But the special bundled contract doesn't go to zero like this, because the conditional contract only really pays out when the condition is satisfied or refuted.

So if I'm understanding this correctly: The conditional contract on (a|b) pays if a&b is proved, if a&~b is proved, and if ~a&~b is proved.

Now I have another question: how does logical induction arbitrage against contradiction? The bet on a pays $1 if a is proved. The bet on ~a pays $1 if not-a is proved. But the bet on ~a isn't "settled" when a is proved - why can't the market just go on believing its .7? (Likely this is related to my confusion with the paper).

My proposal is essentially similar to that, except I am trying to respect logic in most of the system, simply reducing its impact on action selection. But within my proposed system, I think the wrong 'prior' (ie distribution of wealth for traders) can make it susceptible again.

I'm not blocking Troll Bridge problems, I'm making the definition of rational agent broad enough that crossing is permissible. But if I think the Troll Bridge proof is actively irrational, I should be able to actually rule it out. IE, specify an X which is inconsistent with PA.

What makes you think that theres a "right" prior? You want a "good" learning mechanism for counterfactuals. To be good, such a mechanism would have to learn to make the inferences we consider good, at least with the "right" prior. But we can't pinpoint any wrong inference in Troll Bridge. It doesn't seem like whats stopping us from pinpointing the mistake in Troll Bridge is a lack of empirical data. So, a good mechanism would have to learn to be susceptible to Troll Bridge, especially with the "right" prior. I just don't see what would be a good reason for thinking theres a "right" prior that avoids Troll Bridge (other than "there just has to be some way of avoiding it"), that wouldn't also let us tell directly how to think about Troll Bridge, no learning needed.

Comment by Bunthut on Hufflepuff Cynicism on Hypocrisy · 2021-05-18T06:20:47.691Z · LW · GW

So your social experience is different in this respect?

I've never experienced this example in particular, but I would not expect such a backlash. Can you think of another scenario with non-moral advice that I have likely experienced?

Comment by Bunthut on Hufflepuff Cynicism on Hypocrisy · 2021-05-17T16:16:03.652Z · LW · GW

It seems to me that this habit is universal in American culture, and I'd be surprised (and intrigued!) to hear about any culture where it isn't.

I live in Austria. I would say we do have norms against hypocrisy, but your example with the drivers license seems absurd to me. I would be surprised (and intrigued!) if agreement with this one in particular is actually universal in American culture. In my experience, hypocrisy norms are for moral and crypto-moral topics.

For normies, morality is an imposition. Telling them of new moral requirements increases how much they have to do things that suck to stay a good person. From this perspective, the anti-hypocrisy norm is great. If people don't follow the new requirement they suggest (or can be construed as not doing so), you get to ignore it without further litigation.

Now the classical LW position is that thats wrong, and that you ought to be glad someone informed you how much good you can do by giving away large fractions of your income, because this gave you the opportunity to do so. Because you ought to believe morality in your heart. From this perspective, the anti-hypocrisy norm is useless. It might also be why the drivers license example is a natural fit for you: its all "advice" to you. If you've failed to Chestertons fence, I think its this previous step where it happened.

Comment by Bunthut on My Current Take on Counterfactuals · 2021-05-17T15:43:35.896Z · LW · GW

The payoff for 2-boxing is dependent on beliefs after 1-boxing because all share prices update every market day and the "payout" for a share is essentially what you can sell it for.

If a sentence is undecidable, then you could have two traders who disagree on its value indefinitely: one would have a highest price to buy, thats below the others lowest price to sell. But then anything between those two prices could be the "market price", in the classical supply and demand sense. If you say that the "payout" of a share is what you can sell it for... well, the "physical causation" trader is also buying shares on the counterfactual option that won't happen. And if he had to sell those, he couldn't sell them at a price close to where he bought them - he could only sell them at how much the "logical causation" trader values them, and so both would be losing "payout" on their trades with the unrealized option. Thats one interpretation of "sell". If theres a "market maker" in addition to both traders, it depends on what prices he makes - and as outlined above, there is a wide range of prices that would be consistent for him to offer as a market maker, including ways which are very close to the logical traders valuations - in which case, the logical trader is gaining on the physical one.

Trying to communicate a vague intuition here: There is a set of methods which rely on there being a time when "everything is done", to then look back from there and do credit assignment for everything that happened before. They characteristically use backwards induction to prove things. I think markets fall into this: the argument for why ideal markets don't have bubbles is that eventually, the real value will be revealed, and so the bubble has to pop, and then someone holds the bag, and you don't want to be that someone, and people predicting this and trying to avoid it will make the bubble pop earlier, in the idealised case instantly. I also think these methods aren't going to work well with embedding. They essentially use "after the world" as a subsitute for "outside the world".

My claim is: eventually, if you observe enough cases of "crossing" in similar circumstances, your expectation for "cross" should be consistent with the empirical history

My question was more "how should this roughly work" rather than "what conditions should it fulfill", because I think thinking about this illuminates my next point.

The hope is that we can block the troll argument completely if proving B->A does not imply cf(A|B)=1

This doesn't help against what I'm imagining, I'm not touching indicative B->A. So, standard Troll Bridge:

  • Reasoning within PA (ie, the logic of the agent):
    • Suppose the agent crosses.
      • Further suppose that the agent proves that crossing implies U=-10.
        • Examining the source code of the agent, because we're assuming the agent crosses, either PA proved that crossing implies U=+10, or it proved that crossing implies U=0.
        • So, either way, PA is inconsistent -- by way of 0=-10 or +10=-10.
        • So the troll actually blows up the bridge, and really, U=-10.
      • Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
      • By Löb's theorem, crossing really implies U=-10.
      • So (since we're still under the assumption that the agent crosses), U=-10.
    • So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
  • Since we proved all of this in PA, the agent proves it, and proves no better utility in addition (unless PA is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.

But now, say the agents counterfactual reasoning comes not from PA, but from system X. Then the argument fails because "suppose the agent proves crossing->U=-10 in PA" doesn't go any further because examining the sourcecode of the agent doesn't say anything about PA anymore, and "suppose the agent proves crossing->U=-10 in X" doesn't show that PA is inconsistent, so the bridge isn't blown up. But lets have a troll that blows up the bridge if X is inconsistent. Then we can make an argument like this:

  • Reasoning within X (ie, the logic of counterfactuals):
    • Suppose the agent crosses.
      • Further suppose that the agent proves in X that crossing implies U=-10.
        • Examining the source code of the agent, because we're assuming the agent crosses, either X proved that crossing implies U=+10, or it proved that crossing implies U=0.
        • So, either way, X is inconsistent -- by way of 0=-10 or +10=-10.
        • So the troll actually blows up the bridge, and really, U=-10.
      • Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
      • By Löb's theorem, crossing really implies U=-10.
      • So (since we're still under the assumption that the agent crosses), U=-10.
    • So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
  • Since we proved all of this in X, the agent proves it, and proves no better utility in addition (unless X is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.

Now, this argument relies on X and counterfactual reasoning having a lot of the properties of PA and normal reasoning. But even a system that doesn't run on proofs per se could still end up implementing something a lot like logic, and then it would have a property thats a lot like inconsistency, and then the troll could blow up the bridge conditionally on that. Basically, it still seems reasonable to me that counterfactual worlds should be closed under inference, up to our ability to infer. And I don't see which of the rules for manipulating logical implications wouldn't be valid for counterfactual implications in their own closed system, if you formally separate them. If you want your X to avoid this argument, then it needs to not-do something PA does. "Formal separation" between the systems isn't enough, because the results of counterfactual reasoning still really do effect your actions, and if the counterfactual reasoning system can understand this, Troll Bridge returns. And if there was such a something, we could just use a logic that doesn't do this in the first place, no need for the two-layer approach.

a convincing optimality result could

I'm also sceptical of optimality results. When you're doing subjective probability, any method you come up with will be proven optimal relative to its own prior - the difference between different subjective methods is only in their ontology, and the optimality results don't protect you against mistakes there. Also, when you're doing subjectivism, and it turns out the methods required to reach some optimality condition aren't subjectively optimal, you say "Don't be a stupid frequentist and do the subjectively optimal thing instead". So, your bottom line is written. If the optimality condition does come out in your favour, you can't be more sure because of it - that holds even under the radical version of expected evidence conservation. I also suspect that as subjectivism gets more "radical", there will be fewer optimality results besides the one relative to prior.

Comment by Bunthut on My Current Take on Counterfactuals · 2021-04-24T11:37:54.477Z · LW · GW

Because we have a “basic counterfactual” proposition for what would happen if we 1-box and what would happen if we 2-box, and both of those propositions stick around, LCH’s bets about what happens in either case both matter. This is unlike conditional bets, where if we 1-box, then bets conditional on 2-boxing disappear, refunded, as if they were never made in the first place.

I don't understand this part. Your explanation of PCDT at least didn't prepare me for it, it doesn't mention betting. And why is the payoff for the counterfactual-2-boxing determined by the beliefs of the agent after 1-boxing?

And what I think is mostly independent of that confusion: I don't think things are as settled. 

I'm more worried about the embedding problems with the trader in dutch book arguments, so the one against CDT isn't as decisive for me.

In the Troll Bridge hypothetical, we prove that [cross]->[U=-10]. This will make the conditional expectations poor. But this doesn’t have to change the counterfactuals.

But how is the counterfactual supposed to actually think? I don't think just having the agent unrevisably believe that crossing is counterfactually +10 is a reasonable answer, even if it doesn't have any instrumental problems in this case. I think it ought to be possible to get something like "whether to cross in troll bridge depends only on what you otherwise think about PAs consistency" with some logical method. But even short of that, there needs to be some method to adjust your counterfactuals if they fail to really match you conditionals. And if we had an actual concrete model of counterfactual reasoning instead of a list of desiderata, it might be possible to make a troll based on the consistency of whatever is inside this model, as opposed to PA.

I also think there is a good chance the answer to the cartesian boundary problem won't be "heres how to calculate where your boundary is", but something else of which boundaries are an approximation, and then something similar would go for counterfactuals, and then there won't be a counterfactual theory which respects embedding.

These later two considerations suggest the leftover work isn't just formalisation.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-24T09:50:17.612Z · LW · GW

are the two players physically precisely the same (including environment), at least insofar as the players can tell?

In the examples I gave yes. Because thats the case where we have a guarantee of equal policy, from which people try to generalize. If we say players can see their number, then the twins in the prisoners dilemma needn't play the same way either.

But this is one reason why correlated equilibria are, usually, a better abstraction than Nash equilibria.

The "signals" players receive for correlated equilibria are already semantic. So I'm suspicious that they are better by calling on our intuition more to be used, with the implied risks. For example I remember reading about a result to the effect that correlated equilibria are easier to learn. This is not something we would expect from your explanation of the differences: If we explicitly added something (like the signals) into the game, it would generally get more complicated.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-12T21:32:04.907Z · LW · GW

Hum, then I'm not sure I understand in what way classical game theory is neater here?

Changing the labels doesn't make a difference classically.

As long as the probabilistic coin flips are independent on both sides

Yes.

Do you have examples of problems with copies that I could look at and that you think would be useful to study?

No, I think you should take the problems of distributed computing, and translate them into decision problems, that you then have a solution to.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-12T11:12:35.670Z · LW · GW

Well, if I understand the post correctly, you're saying that these two problems are fundamentally the same problem

No. I think:

...the reasoning presented is correct in both cases, and the lesson here is for our expectations of rationality...

As outlined in the last paragraph of the post. I want to convince people that TDT-like decision theories won't give a "neat" game theory, by giving an example where they're even less neat than classical game theory.

Actually it could. 

I think you're thinking about a realistic case (same algorithm, similar environment) rather than the perfect symmetry used in the argument. A communication channel is of no use there because you could just ask yourself what you would send, if you had one, and then you know you would have just gotten that message from the copy as well.

I can use my knowledge of distributed computing to look at the sort of decision problems where you play with copies

I'd be interested. I think even just more solved examples of the reasoning we want are useful currently.

Comment by Bunthut on Phylactery Decision Theory · 2021-04-12T08:44:02.035Z · LW · GW

The link would have been to better illustrate how the proposed system works, not about motivation. So, it seems that you understood the proposal, and wouldn't have needed it.

I don't exactly want to learn the cartesian boundary. A cartesian agent believes that its input set fully screens off any other influence on its thinking, and the outputs screen off any influence of the thinking on the world. Its very hard to find things that actually fulfill this. I explain how PDT can learn cartesian boundaries, if there are any, as a sanity/conservative extension check. But it can also learn that it controls copies or predictions of itself for example.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-12T07:35:36.341Z · LW · GW

The apparent difference is based on the incoherent counterfactual "what if I say heads and my copy says tails"

I don't need counterfactuals like that to describe the game, only implications. If you say heads and your copy tails, you will get one util, just like how if 1+1=3, the circle can be squared.

The interesting thing here is that superrationality breaks up an equivalence class relative to classical game theory, and peoples intuitions don't seem to have incorporated this.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-12T07:27:25.937Z · LW · GW

"The same" in what sense? Are you saying that what I described in the context of game theory is not surprising, or outlining a way to explain it in retrospect? 

Communication won't make a difference if you're playing with a copy.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-12T07:16:47.096Z · LW · GW

What is and isn't an isomorphism depends on what you want to be preserved under isomorphism. If you want everything thats game-theoretically relevant to be preserved, then of course those games won't turn out equivalent. But that doesn't explain anything. If my argument had been that the correct action in the prisoners dilemma depends on sunspot activity, you could have written your comment just as well.

Comment by Bunthut on Identifiability Problem for Superrational Decision Theories · 2021-04-12T06:59:34.612Z · LW · GW

...the one outlined in the post?

Comment by Bunthut on Phylactery Decision Theory · 2021-04-09T19:35:17.784Z · LW · GW

Right, but then, are all other variables unchanged? Or are they influenced somehow? The obvious proposal is EDT -- assume influence goes with correlation.

I'm not sure why you think there would be a decision theory in that as well. Obviously when BDT decides its output, it will have some theory about how its output nodes propagate. But the hypothesis as a whole doesn't think about influence. Its just a total probability distribution, and it includes that some things inside it are distributed according to BDT. It doesn't have beliefs about "if the output of BDT were different". If BDT implements a mixed strategy, it will have beliefs about what each option being enacted correlates with, but I don't see a problem if this doesn't track "real influence" (indeed, in the situations where this stuff is relevant it almost certainly won't) - its not used in this role.

Comment by Bunthut on Learning Russian Roulette · 2021-04-09T19:26:07.612Z · LW · GW

Adding other hypothesis doesn't fix the problem. For every hypothesis you can think of, theres a version of it that says "but I survive for sure" tacked on. This hypothesis can never lose evidence relative to the base version, but it can gain evidence anthropically. Eventually, these will get you. Yes, theres all sorts of considerations that are more relevant in a realistic scenario, thats not the point.

Comment by Bunthut on Learning Russian Roulette · 2021-04-09T19:20:47.832Z · LW · GW

The problem, as I understand it, is that there seem to be magical hypothesis you can't update against from ordinary observation, because by construction the only time they make a difference is in your odds of survival. So you can't update them from observation, and anthropics can only update in their favour, so eventually you end up believing one and then you die.

Comment by Bunthut on Learning Russian Roulette · 2021-04-09T08:53:36.090Z · LW · GW

Maybe the disagreement is in how we consider the alternative hypothesis to be? I'm not imagining a broken gun - you could examine your gun and notice it isn't, or just shoot into the air a few times and see it firing. But even after you eliminate all of those, theres still the hypothesis "I'm special for no discernible reason" (or is there?) that can only be tested anthropically, if at all. And this seems worrying.

Maybe heres a stronger way to formulate it: Consider all the copies of yourself across the multiverse. They will sometimes face situations where they could die. And they will always remember having survived all previous ones. So eventually, all the ones still alive will believe they're protected by fate or something, and then do something suicidal. Now you can bring the same argument about how there are a few actual immortals, but still... "A rational agent that survives long enough will kill itself unless its literally impossible for it to do so" doesn't inspire confidence, does it? And it happens even in very "easy" worlds. There is no world where you have a limited chance of dying before you "learn the ropes" and are safe - its impossible to have a chance of eventual death other than 0 or 1, without the laws of nature changing over time.

just have a probabilistic model of the world and then condition on the existence of yourself (with all your memories, faculties, etc). 

I interpret that as conditioning on the existence of at least one thing with the "inner" properties of yourself.

Comment by Bunthut on Learning Russian Roulette · 2021-04-08T18:46:20.990Z · LW · GW

To clarify, do you think I was wrong to say UDT would play the game? I've read the two posts you linked. I think I understand Weis, and I think the UDT described there would play. I don't quite understand yours.

Comment by Bunthut on Phylactery Decision Theory · 2021-04-08T18:19:50.981Z · LW · GW

Another problem with this is that it isn't clear how to form the hypothesis "I have control over X".

You don't. I'm using talk about control sometimes to describe what the agent is doing from the outside, but the hypothesis it believes all have a form like "The variables such and such will be as if they were set by BDT given such and such inputs".

One problem with this is that it doesn't actually rank hypotheses by which is best (in expected utility terms), just how much control is implied.

For the first setup, where its trying to learn what it has control over, thats true. But you can use any ordering of hypothesis for the descent, so we can just take "how good that world is" as our ordering. This is very fragile of course. If theres uncountably many great but unachievable worlds, we fail, and in any case we are paying for all this with performance on "ordinary learning". If this were running in a non-episodic environment, we would have to find a balance between having the probability of hypothesis decline according to goodness, and avoiding the "optimistic humean troll" hypothesis by considering complexity as well. It really seems like I ought to take "the active ingredient" of this method out, if I knew how.

Comment by Bunthut on Reflective Bayesianism · 2021-04-07T22:11:23.217Z · LW · GW

From my perspective, Radical Probabilism is a gateway drug.

This post seemed to be praising the virtue of returning to the lower-assumption state. So I argued that in the example given, it took more than knocking out assumptions to get the benefit.

So, while I agree, I really don't think it's cruxy. 

It wasn't meant to be. I agree that logical inductors seem to de facto implement a Virtuous Epistemic Process, with attendent properties, whether or not they understand that. I just tend to bring up any interesting-seeming thoughts that are triggered during conversation and could perhaps do better at indicating that. Whether its fine to set it aside provisionally depends on where you want to go from here.

Comment by Bunthut on Reflective Bayesianism · 2021-04-07T18:35:07.716Z · LW · GW

Either way, we've made assumptions which tell us which Dutch Books are valid. We can then check what follows.

Ok. I suppose my point could then be made as "#2 type approaches aren't very useful, because they assume something thats no easier than what they provide".

I think this understates the importance of the Dutch-book idea to the actual construction of the logical induction algorithm. 

Well, you certainly know more about that than me. Where did the criterion come from in your view?

This part seems entirely addressed by logical induction, to me.

Quite possibly. I wanted to separate what work is done by radicalizing probabilism in general, vs logical induction specifically. That said, I'm not sure logical inductors properly have beliefs about their own (in the de dicto sense) future beliefs. It doesn't know "its" source code (though it knows that such code is a possible program) or even that it is being run with the full intuitive meaning of that, so it has no way of doing that. Rather, it would at some point think about the source code that we know is its, and come to believe that that program gives reliable results - but only in the same way in which it comes to trust other logical inductors. It seems like a version of this in the logical setting.

By "knowing where they are", I mean strategies that avoid getting dutch-booked without doing anything that looks like "looking for dutch books against me". One example of that would be The Process That Believes Everything Is Independent And Therefore Never Updates, but thats a trivial stupidity.

Comment by Bunthut on I'm from a parallel Earth with much higher coordination: AMA · 2021-04-07T17:33:17.208Z · LW · GW

One of the most important things I learned, being very into nutrition-research, is that most people can't recognize malnutrition when they see it, and so there's a widespread narrative that it doesn't exist. But if you actually know what you're looking for, and you walk down an urban downtown and look at the beggars, you will see the damage it has wrought... and it is extensive.

Can someone recommed a way of learning to recognize this without having to spend effort on nutrition-in-general?

Comment by Bunthut on Don't Sell Your Soul · 2021-04-07T16:11:36.649Z · LW · GW

I think giving reasons made this post less effective. Reasons make naive!rationalist more likely to yield on this particular topic, but thats no longer a live concern, and it probably inhibits learning the general lesson.

Comment by Bunthut on Reflective Bayesianism · 2021-04-07T11:24:12.529Z · LW · GW

What is actually left of Bayesianism after Radical Probabilism? Your original post on it was partially explaining logical induction, and introduced assumptions from that in much the same way as you describe here. But without that, there doesn't seem to be a whole lot there. The idea is that all that matters is resistance to dutch books, and for a dutch book to be fair the bookie must not have an epistemic advantage over the agent. Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of this would require solving epistemology in general. So we avoid this problem by instead taking "what the agent actually knew (believed) at the time", which is a subset and so also fair. But this doesn't do any work, it just offloads it to agent design. 

For example with logical induction, we know that it can't be dutch booked by any polynomial-time trader. Why do we think that criterion is important? Because we think its realistic for an agent to in the limit know anything you can figure out in polynomial time. And we think that because we have an algorithm that does it. Ok, but what intellectual progress does the dutch book argument make here? We had to first find out what one can realistically know, and got logical induction, from which we could make the poly-time criterion. So now we know its fair to judge agents by that criterion, so we should find one, which fortunately we already have. But we could also just not have thought about dutch books at all, and just tried to figure out what one could realistically know, and what would we have lost? Making the dutch book here seems like a spandrel in thinking style.

As a side note, I reread Radical Probabilism for this, and everything in the "Other Rationality Properties" section seems pretty shaky to me. Both the proofs of both convergence and calibration as written depend on logical induction - or else, the assumption that the agent would know if its not convergent/calibrated, in which case could orthodoxy not achieve the same? You acknowledge this for convergence in a comment but also hint at another proof. But if radical probabilism is a generalization of orthodox bayesianism, then how can it have guarantees that the latter doesn't?

For the conservation of expected evidence, note that the proof here involves a bet on what the agents future beliefs will be. This is a fragile construction: you need to make sure the agent can't troll the bookie, without assuming the accessability of the structures you want to establish. It also assumes the agent has models of itself in its hypothesis space. And even in the weaker forms, the result seems unrealistic. There is the problem with psychedelics that the "virtuous epistemic process" is supposed to address, but this is something that the formalism allows for with a free parameter, not something it solves. The radical probabilist trusts the sequence of , but it doesn't say anything about where they come from. You can now assert that it can't be identified with particular physical processes, but that just leaves a big questionmark for bridging laws. If you want to check if there are dutch books against your virtuous epistemic process, you have to be able to identify its future members. Now I can't exclude that some process could avoid all dutch books against it without knowing where they are (and without being some trivial stupidity), but it seems like a pretty heavy demand.

Comment by Bunthut on Learning Russian Roulette · 2021-04-06T23:42:30.386Z · LW · GW

Definition (?). A non-anthropic update is one based on an observation E that has no (or a negligible) bearing on how many observers in your reference class there are.

Not what I meant. I would say anthropic information tells you where in the world you are, and normal information tell you what the world is like. An anthropic update, then, reasons about where you would be, if the world were a certain way, to update on world-level probabilities from anthropic information. So sleeping beauty with N outsiders is a purely anthropic update by my count. Big worlds generally tend to make updates more anthropic.

What you said about leading to UDT sounds interesting but I wasn't able to follow the connection you were making.

One way to interpret the SSA criterion is to have beliefs in such a way that in as many (weighed by your prior) worlds as possible, you would as right as possible in the position of an average member of your reference class. If you "control" the beliefs of members in your reference class, then we could also say to believe in such a way as to make them as right as possible in as many worlds as possible.  "Agents which are born with my prior" (and maybe "and using this epistemology", or some stronger kind of identicalness) is a class whichs beliefs are arguably controlled by you in the timeless sense. So if you use it, you will be doing a UDT-like optimizing. (Of course, it will be a UDT that  believes in SSA.)

And about using all possible observers as your reference class for SSA, that would be anathema to SSAers :)

Maybe, but if there is a general form that can produce many kinds of anthropics based on how its free parameter is set, then calling the result of one particular value of the parameter SIA and the results of all others SSA does not seem to cleave reality at the joints.

Comment by Bunthut on Learning Russian Roulette · 2021-04-05T19:05:20.728Z · LW · GW

I have thought about this before posting, and I'm not sure I really believe in the infinite multiverse. I'm not even sure if I believe in the possibility of being an individual exception for some other sort of possibility. But I don't think just asserting that without some deeper explanation is really a solution either. We can't just assign zero probability willy-nilly.

Comment by Bunthut on Learning Russian Roulette · 2021-04-05T17:22:02.794Z · LW · GW

That link also provides a relatively simple illustration of such an update, which we can use as an example:

I didn't consider that illustrative of my question because "I'm in the sleeping beauty problem" shouldn't lead to a "normal" update anyway. That said I haven't read Anthropic Bias, so if you say it really is supposed to be the anthropic update only then I guess. The definition in terms of "all else equal" wasn't very informative for me here.

To fix this issue we would need to include in your reference class whoever has the same background knowledge as you

But background knowledge changes over time, and a change in reference class could again lead to absurdities like this. So it seems to me like the sensible version of this would be to have your reference class always be "agents born with the same prior as me", or indentical in an even stronger sense, which would lead to something like UDT.

Now that I think of it SSA can reproduce SIA, using the reference class of "all possible observers", and considering existence a contingent property of those observers.

Comment by Bunthut on Learning Russian Roulette · 2021-04-03T21:42:34.653Z · LW · GW

In most of the discussion from the above link, those fractions are 100% on either A or B, resulting, according to SSA, in your posterior credences being the same as your priors.

For the anthropic update, yes, but isn't there still a normal update? Where you just update on the gun not firing, as an event, rather than your existence? Your link doesn't have examples where that would be relevant either way. But if we didn't do this normal updating, then it seems like you could only learn from an obervation if some people in your reference class make the opposite observation in different worlds. So if you use the trivial reference class, you will give everything the same probability as your prior, except for eliminating worlds where noone has your epistemic state and renormalizing. You will expect to violate bayes law even in normal situations that dont involve any birth or death. I don't think thats how its meant to work.