What is the interpretation of the do() operator? 2020-08-26T21:54:07.675Z
Towards a Formalisation of Logical Counterfactuals 2020-08-08T22:14:28.122Z


Comment by bunthut on Great minds might not think alike · 2020-12-28T11:13:51.800Z · LW · GW

I'm not sure if "translation" is a good word for what youre talking about. For example it's not clear what a Shor-to-Constance translation would look like. You can transmit the results of statistical analysis to non-technical people, but the sharing of results wasn't the problem here. The Constance-to-Shor translator described Constances reasons in such a way that Shor can process them, and what could an inverse of this be? Constances beliefs are based on practical experience, and Shor simply hasn't had that, whereas Constance did get "data" in a broad sense. Now we could invent anecdotes that would make Constance conclude the same thing as the analysis, but we can do that no matter who is actually right.

"The same idea" can also bring very different capabilities depending on the way of thinking it is used in. As we learn formal thinking, we generally also incorporate some of its results into more analogical styles. This develops mathematical intuition, which lets us navigate the formal system vastly faster than brute force. On the other hand, it is only when our everyday heuristic that getting more money is good is embedded into more systematic theories that we can think its strange that useless metal is so valuable, realize that money needed to have an origen, and that gold-mining isn't productive (caveats for technical uses apply now), etc.

"Translations" in your sense also require not only familiarity with the source and target style, but often also significant work thinking in those styles. Expressing something analytically can be hard even if you are good at analytic thinking. Finding cruxes is something that you need to sit down and do for a while. I've seen a few examples where the way an analytic expression is found is that someone develops it for reasons internal to analysis, and only afterward is it realized that this could be the reality that drove intuition xyz. In the other direction, "How to actually act on the things you claim to believe" has been a significant and constant topic of this site.

Comment by bunthut on Dissolving the Problem of Induction · 2020-12-28T09:54:23.404Z · LW · GW

We don't have to justify using "similarity", "resemblance"

I think you still do. In terms of induction, you still have the problem of grue and bleen. In terms of Occams Razor, it's the problem of which language a description needs to be simple in.

Comment by bunthut on Normativity · 2020-11-24T16:25:43.341Z · LW · GW

It's sort of like the difference between a programmable computer vs an arbitrary blob of matter. 

This is close to what I meant: My neurons keep doing something-like reinforcement learning, whether or not I theoretically believe thats valid. "I in fact can not think outside this" does adress the worry about a merely rational constraint.

On the other hand, we do want AI to eventually consider other hardware, and that might even be necessary for normal embedded agency, since we dont fully trust our hardware even when we dont want to normal-sense-change it. 

To sum up, meaning in this view is broadly more inferentialist and less correspondence-based: the meaning of a think is more closely tied with the inferences around that thing, than with how that thing corresponds to a territory. 

I broadly agree with inferentialism, but I don't think that entirely adresses it. The mark of confused, rather than merely wrong, beliefs is that they dont really have a coherent use. So for example it might be that theres a path through possible scenarios leading back to the starting point, where if at every step I adjust my reaction in a way that seems appropriate to me, I end up with a different reaction when I'm back at the start. If you tried to describe my practices here you would just explicitly account for the framing dependence. But then it wouldn't be confused! That framing-dependent concept you described also exists, but it seems quite different from the confused one. For the confused concept its essential that I consider it not dependent in this way. But if you try to include that in your description, by also describing the practices around my meta-beliefs about the concept, and the meta-meta beliefs, and so on, then you'd end up also describing the process with which I recognized it as confused and revised it. And then we're back in the position of already-having-recognized that its bullshit.

When you were only going up individual meta-levels, the propositions logical induction worked with could be meaningful even if they were wrong, because they were part of processes outside the logical induction process, and those were sufficient to give them truth-conditions. Now you want to determine both what to believe and how those beliefs are to be used in one go, and it's undermining that, because the "how beliefs are to be used" is what foundationalism kept fixed, and which gave them their truth conditions.

I'm not seeing that implication at all!

Well, this is a bit analogy-y, but I'll try to explain. I think theres a semantic issue with anthropics (indeed, under inferentialism all confusion can be expressed as a semantic issue). Things like "the propability that I will have existed if I do X now" are unclear. For example a descriptivist way of understanding conditional propabilities is something like "The token C means conditional propability iff whenever you believe xCy = p, then you would belief P(y) = p if you came to believe x". But that assumes not only that you are logically perfect but that you are there to have beliefs and answer for them. Now most of the time it's not a problem if you're not actually there, because we can just ask about if you were there (and you somehow got oxygen and glucose despite not touching anything, and you could see without blocking photons, etc but lets ignore that for now), even if you aren't actually. But this can be a problem with anthropic situations. Normally when a hypothetical involves you, you can just imagine it from your prespective, and when it doesn't involve you, you can imagine you were there. But if you're trying to imagine a scenario that involves you but you can't imagine it from your prespective, because you come into existence in it, or you have a mental defect in it, or something, then you have to imagine it from the third person. So you're not really thinking about yourself, you're thinking about a copy, which may be in quite a different epistemic situation. So if you can conceptually explain how to have semantics that accounts for my making mistakes, then I think that would propably be able to account for my not being there as well (in both cases, it's just the virtuous epistemic process missing). And that would tell us how to have anthropic beliefs, and that would unknot the area.

Comment by bunthut on Normativity · 2020-11-21T00:25:42.277Z · LW · GW

we have an updating process which can change its mind about any particular thing; and that updating process itself is not the ground truth, but rather has beliefs (which can change) about what makes an updating process legitimate.

This should still be a strong formal theory, but one which requires weaker assumptions than usual

There seems to be a bit of a tension here. What you're outlining for most of the post still requires a formal system with assumptions within which to take the fixed point, but then that would mean that it can't change its mind about any particular thing. Indeed it's not clear how such a totally self-revising system could ever be a fixed point of constraints of rationality: since it can revise anything, it could only be limited by the physically possible.

This is most directly an approach for solving meta-philosophy.

Related to the last point, your project would seem to imply some interesting semantics. Usually our semantics are based on the correspondence theory: we start thinking about what the world can be like, and then expressions get their meaning from their relations to these ways the world can be like, particularly through our use. (We can already see how this would take us down a descriptive path.) This leads to problems where you can't explain the content what confused beliefs. Like for example most children believe for some time that their languages words for things are the real words for them. If you're human you propably understand what I was talking about, but if some alien was puzzled about what the "real" there was supposed to mean, I don't think I could explain it. Basically once you've unconfused yourself, you become unable to say what you believed.

Now if we're foundationalists, we say that thats because you didn't actually believe anything, and that that was just a linguistic token passed around your head but failing to be meaningful, because you didn't implement The Laws correctly. But if we want to have a theory like yours, it treats this cognitively, and so such beliefs must meaningful in some sense. I'm very curious what this would look like.

More generally this is a massively ambitious undertaking. If you succeeded it would solve a bunch of other issues, not even from running it but just from the conceptual understanding of how it would work. For example in your last post on signalling you mentioned:

First of all, it might be difficult to define the hypothetical scenario in which all interests are aligned, so that communication is honest. Taking an extreme example, how would we then assign meaning to statements such as "our interests are not aligned"?

I think a large part of embedded agency has a similar problem, where we try to build our semantics on "If I was there, I would think", and apply this to scenarios where we are essentially not there, because we're thinking about our non-existence, or about bugs that would make us not think that way, or some such. So if you solved this it would propably just solve anthropics as well. On the one hand this is exciting, on the other its a reason to be sceptical. And all of this eerily reminds me of German Idealism. In any case I think this is very good as a post.

Comment by bunthut on Signalling & Simulacra Level 3 · 2020-11-16T22:29:24.188Z · LW · GW

Where do these crisp ontologies come from, if (under the signalling theory of meaning) symbols only have probabilistic meanings?

There are two things here which are at least potentially distinct: The meaning of symbols in thinking, and their meaning in communication. I'd expect these mechanisms to have a fair bit on common, but specifically the problem of alignment of the speakers which is adressed here would not seem to apply to the former. So I dont think we need to wonder here where those crisp ontologies came from.

This is the type of thinking that can't tell the difference between "a implies b" and "a, and also b" -- because people almost always endorse both "a" and "b" when they say "a implies b". 

One way to eliminate this particular problem is to focus on whether the speaker agrees with a sentence if asked, rather than spontaneaus assertions. This fails when the speaker is systematically wrong about something, or when Cartesian boundaries are broken, but other than that it seems to take out a lot of the "association" problems.

None of this is literally said, but a cloud of conversational implicature surrounds the literal text. The signalling analysis can't distinguish this cloud from the literal meaning.

Even what we would consider literal speech can depend on implicature. Consider: "Why don't we have bacon?" "The cat stole it". Which cat "the cat" is requires Gricean reasoning, and the phrase isn't compositional, either.

To hint at my opinion, I think it relates to learning normativity.

I think one criterion of adequacy for explanations of level 1 is to explain why it is sometimes rational to interpret people literally. Why would you throw away all that associated information? Your proposal in that post is quite abstract, could you outline how it would adress this?

Interrestingly I did think of norms when you drew up the problem, but in a different way, related to enforcement. We hold each other responsible for our assertions, and this means we need an idea of when a sentence is properly said. Now such norms can't require speakers to be faithful to all the propabilistic associations of a sentence. That would leave us with too few sentences to describe all situations, and if the norms are to be reponsive to changing expectations, it could never reach equilibrium. So we have to pick some subset of the associations to enforce, and that would then be the "literal meaning". We can see why it would be useful for this to incorporate some compositionality: assertions are much more useful when you can combine multiple, possibly from different sources, into one chain of reasoning.

Comment by bunthut on Weird Things About Money · 2020-11-07T21:28:49.696Z · LW · GW

However, how I assign value to divergent sums is subjective -- it cannot be determined precisely from how I assign value to each of the elements of the sum, because I'm not trying to assume anything like countable additivity.

This implies that you believe in the existence of countably infinite bets but not countably infinite dutch booking processes. Thats seems like a strange/unphysical position to be in - if that were the best treatment of infinity possible, I think infinity is better abandoned. Im not even sure the framework in your linked post can really be said to contain infinte bets: the only way a bet ever gets evaluated is in a bookie strategy, and no single bookie strategy can be guaranteed to fully evaluate an infinte bet. Is there a single bookie strategy that differentiates the St. Petersburg bet from any finite bet? Because if no, then the agent at least cant distinguish them, which is very close to not existing at all here.

In a case like the St Petersburg Lottery, I believe I'm required to have some infinite expectation. 

Why? I haven't found any finite dutch books against not doing so.

Perhaps you can try to problematize this example for me given what I've written above -- not sure if I've already addressed your essential worry here or not.

I dont think you have. That example doesn't involve any uncertainty or infinite sums. The problem is that for any finite n, waiting n+1 is better than waiting n, but waiting indefinitely is worse than any. Formally, the problem is that I have a complete and transitive preference between actions, but no unique best action, just a series that keeps getting better.

Note that you talk about something related in your linked post:

I'm representing preferences on sets only so that I can argue that this reduces to binary preference.

But the proof for that reduction only goes one way: for any preference relation on sets, theres a binary one. My problem is that the inverse does not hold.

Comment by bunthut on Weird Things About Money · 2020-10-28T22:56:59.343Z · LW · GW

I'm generally OK with dropping continuity-type axioms, though, in which case you can have hyperreal/surreal utility to deal with expectations which would otherwise be problematic (the divergent sums which unbounded utility allows).

Have you worked this out somewhere? I'd be interested to see it but I think there are some divergences it can't adress. There is for one the Pasadena paradox, which is also a divergent sum but one which doesn't stably lead anywhere, not even to infinity. The second is an apparently circular dominance relation: Imagine you are linear in monetary consumption. You start with 1$ which you can either spend or leave in the bank, which doubles it every year even after accounting for your time preference/uncertainty/other finite discounting. Now for every n, leaving it in the bank for n+1 years dominates leaving it for n years, but leaving it in the bank forever gets 0 utility. Note that if we replace money with energy here, this could actually happen in universes not too different from ours.

What is the expectation of the self-referential quantity "one greater than your expectation for this value"?

What is the expectation of the self-referential quantity "one greater than your expectation for this value, except when that would go over the maximum, in which case it's one lower than expectation instead"? Insofar as there is an answer it would have to be "one less than maximum", but that would seem to require uncertainty about what your expectations are.

Comment by bunthut on Weird Things About Money · 2020-10-28T11:57:37.479Z · LW · GW

But I still think it's important to point out that the behavioral recommendations of Kelly do not violate the VNM axioms in any way, so the incompatibility is not as great as it may seem.

I think the interesting question is what to do when you expect many more, but only finitely many rounds. It seems like Kelly should somehow gradually transition, until it recommends normal utility maximization in the case of only a single round happening ever. Log utility doesn't do this. I'm not sure I have anything that does though, so maybe it's unfair to ask it from you, but still it seems like a core part of the idea, that the Kelly strategy comes from the compounding, is lost. 

And yet it doesn't violate VNM, which means the classic argument for maximizing expected utility goes through. How can this paradox be resolved? By noting that utility is just whatever quantity expectation maximization does go through for, "by definition".

This is the sort of argument you want to be very suspicious of if youre confused, as I suspect we are. For example, you can now just apply all the arguments that made Kelly seem compelling again, but this time with respect to the new, logarithmic utility function. Do they actually seem less compelling now? A little bit, yes, because I think we really are sublinear in money, and the intuitions related to that went away. But no matter what the utility function, we can always construct bets that are compounding in utility, and then bettors which are Kelly with respect to that utility function will come to dominate the market. So if you do this reverse-inference of utility, the utility function of Kelly bettors will seems to change based on the bets offered.

I'm curious if you're taking a side, here, wrt which limit one should take.

Not really, I think we're to confused to say yet. I do think I understand decisions with bounded utility (all the classical foundations imply bounded utilities, including VNM. This doesn't seem to be well known here). Bounded utility makes maximization a lot more Kelly: it means that the maximizers can no longer have the arbitrarily high pay-offs that are needed to balance the near-certainty of elimination. I also think it should make it not matter which limit you take first, but I don't think that leads to Kelly, either, because the betting structure that leads to Kelly assumes unbounded utility. Perhaps it would end up as a local approximation somewhere.

Now I also think that bounded decision theory is inadequate. I think a decision theory should be able to implement a paperclip maximizer, and it should work in worlds that last infinitely long. But I don't have something that fulfills that. I think theres a good chance the solution doesn't look like utility at all: A theorem that needs its problem to be finite propably won't do well in embedded problems.

Comment by bunthut on Weird Things About Money · 2020-10-04T21:30:08.736Z · LW · GW

Money wants to be linear, but wants even more to be algorithmic

I think this is mixing up two things. First, a diminishing marginal utility in consumption measured in money. This can lead to risk averse behaviour, but it could be any sublinear function, not just logarithmic, and I have seen no reason to think it's logarithmic in actually existing humans.

if you have risk-averse behavior, other agents can exploit you by selling you insurance.

I wouldn't call it "exploit". It's not a money pump that can be repeated arbitrarily often, its simply a price you pay for stability.

This "money" acts very much like utility, suggesting that utility is supposed to be linear in money.

Only the utility of the agent in question is supposed to be linear in this "money", and that can always be achieved by a monotone transformation. This is quite different from suggesting there's a resource everyone should be linear in under the same scaling.

The second thing is the Kelly criterion. The Kelly criterion exist because money can compound. This is also why it produces specifically a logarithmic stucture. Kelly theory recommends you to use the criterion regardsless of the shape of your utility in consumption, if you expect many more games after this one - it is much more like a convergent instrumental goal. So this:

Kelly betting is fully compatible with expected utility maximization, since we can maximize the expectation of the logarithm of money.

is just wrong AFAICT. This is compatible from the side of utility maximization, but not from the side of Kelly as theory. Of course you can always construct a utility function that will behave in a specific way - this isn't saying much.

This means the previous counterpoint was wrong: expected-money bettors profit in expectation from selling insurance to Kelly bettors, but the Kelly bettors eventually dominate the market

Depends on how you define "dominate the market". In most worlds, most (by headcount) of the bettors still around will be Kelly bettors. I even think that weighing by money, in most worlds Kelly bettors would outweigh expectation maximizers. But weighing by money across all worlds, the expectation maximizers win - by definition. The Kelly criterion "almost surely" beats any other strategy when played sufficiently long - but it only wins by some amount in the cases where it wins, and its infinitely behind in the infinitely unlikely case that it doesn't win.

Kelly betting really is incompatible with expectation maximization. It deliberately takes a lower average. The conflict is essentially over two conflicting infinities: Kelly notes that for any sample size, if theres a long enough duration Kelly wins. And maximization notes that for any duration, if theres a big enough sample size maximisation wins.

Money wants to go negative, but can't.

A lot of what you say here goes into monetary economics, and you should ask someone in the field or at least read up on it before relying on any of this. Propably you shouldn't rely on it even then, if at all avoidable.

Comment by bunthut on What is the interpretation of the do() operator? · 2020-08-27T13:39:18.562Z · LW · GW
But in a newly born child or blank AI system, how does it acquire causal models?

I see no problem assuming that you start out with a prior over causal models - we do the same for propabilistic models after all. The question is how the updating works, and if, assuming the world has a causal structure, this way of updating can identify it.

I myself think (but I haven't given it enough thought) that there might be a bridge from data to causal models though falsification. Take a list of possible causal models for a given problem and search through your data. You might not be able to prove your assumptions, but you might be able to rule causal models out, if they suppose there is a causal relation between two variables that show no correlation at all.

This can never distinguish between different causal models that predict the same propability distribution - all the advantage this would have over purely propabilistic updating would already be included in the prior.

To update in a way that distinguishes between causal models, you need to update on information that is true for some event. Now in this case you could allow each causal model to decide when that is true,for the purposes of its own updating, so you are now allowed to define it in causal terms. This would still need some work from what I wrote in the question - you can't really change something independent of its causal antecendents, at least not when we're talking about the whole world which includes you, but perhaps some notion of independence would suffice. And then you would have to show that this really does converge on the true causal structure, if there is one.

Comment by bunthut on What is the interpretation of the do() operator? · 2020-08-27T09:30:29.495Z · LW · GW
If Markov models are simple explanations of our observations, then what's the problem with using them?

To be clear, by total propability distribution I mean a distribution over all possible conjunctions of events. A Markov model also creates a total propability distribution, but there are multiple Markov models with the same propability distribution. Believing in a Markov model is more specific, and so if we could do the same work with just propability distributions, then Occam would seem to demand we do.

The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities... But you can google this or find it in Pearl's book Causality.

My understanding is that you can't infer a causal graph from just a propability distribution. You need either causal assumptions or experiments to do that, and experimenting involves do()ing, so I'm asking if it can be explained what do()ing is in non-causal terms.

I'd just like you to think more about what you want from an "explanation." What is it you want to know that would make things feel explained?

If there were a way to infer causal structure from just propability distributions, that would be an explanation. Infering them from something else might also work, but it depends on what the something is, and I don't think I can give you a list of viable options in advance.

Alternatively, you might say that causality can't be reduced to something else. In that case, I would like to know how I come to have beliefs about causality, and why this gives true answers. I have something like that for propability distributions: I have a prior and a rule to update it (how I come to believe it) and a theorem saying if I do that in the limit I'll always do at least as well as my best hypothesis with propability ≠ 0 in the prior (why it works).

Comment by bunthut on Towards a Formalisation of Logical Counterfactuals · 2020-08-15T23:15:47.204Z · LW · GW

Hello Darmani,

I'm indeed talking about a kind of counterfactual conditional, one that could apply to logical rather than causal dependencies.

Avoiding multiple antecedents isn't just a matter of what my conceptual toolkit can handle; I could well have done two different types of nodes, that would have represented it just fine. However restricting inferences this way makes a lot of things easier. For example it means that all inferences to any point "after" come only from propositions that have come through . If an inference could have multiple antecedents, then there would be inferences that combine a premise derived from with a premise in , and its not clear if the separation can be kept here.

From the paper you linked... Their first definiton of the counterfactual (the one where the consequent can only be a simple formula) describes the causal counterfactual (well, the indeterminstic-but-not-propabilistic protocolls throw things off a bit, but we can ignore those), and the whole "tremble" analysis closely resembles causal decision theory. Now their concept of knowledge is interesting because its defined with respect to the ecosystem, and so is implicity knowledge of the other agents strategy, but the subsequent definition of belief rather kneecaps this. The problem that belief is supposed to solve is very similar to the rederivation problem I'm trying to get around, but its formulated in terms of model theory. This seems like a bad way to formulate it, because having a model is a holistic property of the formal system, and our counterfactual surgery is basically trying to break a few things in that system without destroying the whole. And indeed the way belief is defined is basically to always assume that no further deviation from the known strategy will occur, so its impossible to use the counterfactuals based on it to evaluate different strategies. Or applying it to "strategies" of logical derivation: If you try to evaluate "what of X wasnt derived by the system", it'll do normal logical derivations, then the first time it would derive X it derives non-X instead, and then it continues doing correct derivations and soon derives X and therefore contradiction.

PM is sent.

Comment by bunthut on Criticism of some popular LW articles · 2020-07-19T22:10:18.095Z · LW · GW
Tallest pygmy effects are fragile, especially when they are reliant on self-fulfilling prophecies or network effects. If everyone suddenly thought the Euro was the most stable currency, the resulting switch would destabilize the dollar and hurt both its value and the US economy as a whole.
This is begging the question. If everyone suddenly thought the Euro was the most stable currency, something dramatic would have had to have happened to shift the stock market's assessment of the fundamentals of the US vs. EU economies and governments. Economies are neither fragile nor passive, and these kinds of mass shifts in opinion on economic matters don't blow with the wind. Furthermore, people are likely to hedge their bets. If the US and EU currencies are similar in perceived stability, serious investors are likely to diversify.

Which question? That of whether the stability of currencies in in part caused by self-fulfilling prophecies? You seem to be saying that self-fulfilling prophecies dont happen dont happen with competent predictors. Do you assert this as a possibility not disproven, or as a fact?

Comment by bunthut on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-06-03T13:48:17.049Z · LW · GW

I've thought about something very similar before, and the conclusion I came to was that the number of copies in a world makes no difference to its likelihood. As far as I can tell, the disagreement is here:

"But I'm still confused. Because it still requires information to say that I'm person #1 or person #10^10. Even if we assume that it's equally easy to specify where a person is in both theories, it just plain old takes more bits to say 10^10 than it does to say 1."
"The amount of bits it takes to just say the number of the person adds log(n) bits to the length of the hypothesis. And so when I evaluate how likely that hypothesis is, which decreases exponentially as you add more bits, it turns out to actually be n times less likely. If we treat T1 and T2 as two different collections of numbered hypotheses, and add up all the probability in the collections, they don't add linearly, just because of the bits it takes to do the numbering. Instead, they add like the (discrete) integral of 1/n, which brings us back to log(n) scaling!"

In T1, there is only one copy of you, so you dont need any bits to specify which one you are. In T2, there is a trillion copies of you, so you need log2(a trillion) ~= 40 bits to specify which one you are. This makes each of those hypothesis times less likely, and since theres a trillion of them, it cancels exactly.

Whereas you seem to think that saying you are #1/1 000 000 000 000 takes fewer bits than saying you are #538 984 236 587/1 000 000 000 000. I can see how you get that idea when you imagine physically writing them out, but the thing that allows you to skip initial 0s in physical writing is that you have non-digit signs that tell you when the number encoding is over. You would need at least one such sign as a terminal character if you wanted to represent them like that in the computer, so you would actually need 3 signs instead of two. I'm pretty sure that comes out worse overall in terms of information theory then taking a set 40 bits.

Or perhaps an easier way to see the problem: If we take your number encoding seriously, it implies that "T2 and I'm person #1" is more likely than "T2 and I'm person #10^10", since it would have fewer bits. But the order in which we number the people is arbitrary. Clearly something has gone wrong upstream from that.

Comment by bunthut on A non-mystical explanation of "no-self" (three characteristics series) · 2020-05-23T10:11:27.825Z · LW · GW
Sorry for the late response

No problem.

Hmm, is there a difference? In that if you think that you are the feeling of tension, then logically you are also at the location of the tension.

Yes, there is a difference between the location of the tension and the location of the feeling of the tension. The location of the tension is behind my eyes, the location of the feeling is... good question. Somewhere in my head; ask neuroscience. The tag (=dot) can only be added to the feeling, since only that is mental, so that would have to be the "map". By analogy: If I go put the map into my bag, the location of the red dot moves, but the location it indicates doesnt. If I turn my head, the location of the feeling (presumably) moves, but the location of the tension only coincidentally moves because its physically connected. If it's a tension on my hand instead, then it wouldnt move.

These kinds of inconsistencies suggest that a part of the experience of the self, is actually an interpretation that is constructed on the fly, rather than being fundamental in the sense that intuition might otherwise suggest.

I dont disagree with this, but more because I think almost everything is like that. Or do you mean something that wouldn't be true of e.g. detecting objects? My point was that when feeling a thing in the hand, the method would locate your "self" in the hand. But noone beliefs their self is in their hand, not even intuitively. Therefore, those sensations are not a sense of self, it only seemed that way because the visual version made sense by coincidence.

If you have the experience of seeing yourself staring at the thing from a third-person perspective, then a question that might be interesting to investigate is "where are you looking at the third-person image from?".

On my left side 1.5-2m from me at the same height as my head. But I dont think thats helpful, its again just the focal point of that visual field, and it's an imagined picture anyway, so that point isn't in regular space-time.

Good catch! I think it's basically the same, despite sounding different; I briefly say a few words about that at the end of a later post.

I think I understand. The self-tag applies to experiences, so identifying with the plane would mean tagging your model of the plane. But in the Truely Enlightened state you should be aware of the tagging, and so only identify with experiences?

It's possible to get into states where you have this to at least some extent, but there's also some goal-directed action going on; and you are identifying with a process which is observing that goal-directed action, rather than getting pulled into it.

How would that process plan without self-markers? Maybe they could be self-markers but not "yours", but there'd need to be some more explanation of that.

Comment by bunthut on A non-mystical explanation of "no-self" (three characteristics series) · 2020-05-11T16:26:21.024Z · LW · GW

My experience with "structural" introspection is that either I just try to look and find nothing, or I look for something specific, in which case I can almost always find it with sufficient effort. I've tried meditation a few times and quickly stopped after not finding a way to avoid this. So naturally, I'm sceptical of this.

When I do this kind of exercise, a result that I may get is that there is the sight of the object, and then a pattern of tension behind my eyes. Something about the pattern of tension feels like “me” - when I feel that “I am looking at a plant in front of me”, this could be broken down to “there is a tension in my consciousness, it feels like the tension is what’s looking at the plant, and that tension feels like me”.
But suppose that you now get a little confused. Rather than taking the spot with red ink as indicating your location in your physical world, you take the red spot on the map to be your physical location. That is, you think that you are the “YOU ARE HERE” tag, looking at the rest of the map from the red ink itself.
But a particular tag in the sense data is not actually where they are looking at it from; for one, the visual cortex is located in the back of the head, rather than right behind the eyes. Furthermore, any visual information is in principle just a piece of data that has been fed into a program running in the brain. If we think of cognitive programs as analogous to computer programs, then a computer program that is fed a piece of data isn't really "looking at" the data "from" any spatial direction.

I tried the exercise. I didn't know what you expected, but my idea of "noticing myself looking" is a model, so I found something like seeing myself staring at the thing from a third-person perspective. I think I could reproduce your result, but I'm writing this the day after, and now that Im no longer tired I have to create the tension on purpose.

I'm not sure I understand. If you thought you were at the red dot rather than at the location in the world it marks, wouldn't that be analogous to thinking you are the feeling of tension, rather than to thinking you are at the location that feeling indicates?

There is also a sense in which you are looking at the world from behind your eyes. Your visual image is a projection with the focal point behind your eyes. If you try the same exercise with holding something in your hand and feeling it rather than looking at something, how does that work out? I tried to do "the same thing" I did to reproduce the tension behind the eyes, and the sensation was just below my skin. I dont know if that's the "right" answer, but if it is, the fact that it's not in the head might suggest the previous result is an artifact.

The outcome seems to be that rather than identifying with the sensations of the supposed observer, one’s identity shifts to the entire field of consciousness itself (in line with the thing about a program reading a file not having any location that would be defined in terms of the file):

There are two quotes after that. The first seems congurent with what you said, but the second sounds like identifying with all the contents of consiousness rather than with the field they are in (or is that distinction not real?).

On the other hand, some situations just trigger the self-related planning machinery very strongly. In vipassana/mindfulness-style approaches, one frequently ends up creating a sense of being an observer who is detached from their thoughts and emotions.

This is what I understood "identifying with the field of conciousness" to mean, is that right? I think I can do that, but it seems it's not compatible with goal-directed action, which would require its own self-markers as described.

Once one gets to this kind of a state, the subsystem trained to do this can continue to further investigate the contents of the mind in fine detail… either looking at other characteristics like impermanence or unsatisfactoriness, or turning its focus on itself, to deepen the no-self realization by seeing that the observer self that it is projecting is also something that can be dis-identified with.

That's supposed to happen? Usually what happens in observer mode is that the "normal" conscious content runs out quickly, because as per above I can't do anything else meanwhile (or at least, I can't keep doing it on purpose). And then I just hear myself saying "..and I feel X in my hand". But that doesn't lead anywhere special, the observer just starts to have some more complicated thoughts until it takes over all the "space", and then I'm back to normal cognition.

Comment by bunthut on Analyticity Depends On Definitions · 2020-03-09T10:35:33.134Z · LW · GW

Quine is disputing the idea of definitons just as much as analyticity. Perhaps this is a good way to think about his argument: How would you find out whether a given belief is a definition or not? You could of course ask, but what if you don't originally share a language? He then argues that there is no way to distinguish a word for "definition" from any other word designating a set of logically independent beliefs they hold, without making assumptions about how people use "definitions" we usually consider to be results of empirical psychology.

Comment by bunthut on Suspiciously balanced evidence · 2020-02-13T19:13:06.418Z · LW · GW

One more good explanation: Numbers are hard. Think of a few things that you would definitely give >99%. I can just about rank them, but I have no idea how many nines are supposed to be there.
And one more not-so-good one: We aren't actually trying to figure it out. We just want to participate in a discussion game, and this one involves numbers called propability, and they are usually between .1 and .9. A good sign that this is happening (and propably in part a reason for the range restriction) is when the "winning criterion" is whether the propability is above or below 50%.

Comment by bunthut on Philosophical self-ratification · 2020-02-05T10:09:28.880Z · LW · GW

I think you're dismissing the "tautological" cases too easily. If you don't believe in a philosophy, their standards will often seem artificially constructed to validate themselves. For example a simple argument that pops up from time to time:

Fallibilist: You can never be totally certain that something is true.

Absolutist: Do you think thats true?

F: Yes.

A: See, you've just contradicted yourself.

Obviously F is unimpressed by this, but if he argues that you can believe things without being certain of them, thats not that different from Beth saying she wrote the book by responding to stimuli to someone not already believing their theory.

Comment by bunthut on Circling as Cousin to Rationality · 2020-01-20T10:33:49.563Z · LW · GW
If I'm inferring correctly,

That seems mostly correct.

To the extent it's possible, I think it's good for people to have the option of Circling with strangers, in order to minimize worries in this vein; I think this is one of the other things that makes the possibility of Circling online neat.

I think doing it with strangers you never see again dissolves the worries I'm talking about for many people, though not quite for me (and it raises new problems about being intimate with strangers).

The stuff above is too vague to really do much with, so I'm looking forward to that post of yours. I will say though that I didn't imagine literal forgetting agreements - even if it were possible to keep them (and while we're at it, how do you imagine keeping a confidentiality agreement without keeping a forgetting agreement? Clearly your reaction can give a lot of information about what went on, even if you never Tell anyone) because that would sort of defeat the point, no? But clearly there is some expectation that people react differently then they normally would, or else how the hell is it a good idea for you to act differently?

Comment by bunthut on Circling as Cousin to Rationality · 2020-01-10T09:15:47.019Z · LW · GW

I remain sceptical of how you use internal/external. To give an example: Lets say a higher-up does something that makes me angry. Then I might want to scream at him but find myself unable to. If however he sensed this and offered me to scream without sanction (and lets say this is credible), I wouldn't want that. Thats because what I wanted was never about more decibel per se, but the significance this has under normal circumstances, and he has altered the significance. Now is the remaining barrier to "really expressing" myself internal or external? Keep in mind that we could repeat the above for any behaviour that doesn't directly harm anyone (the harm is not here because it is specifically anger we are talking about. Declarations of love could similarly be robbed of their meaning).

Like, if I have a desire to be understood on a narrow technical point, the more Circling move is to go into what it's like to want to convey the point, but the thing the emotion wants is to just explain the thing already; if it could pick its expression it would pick a lecture.

This is going in the right direction.

Also, after leaving this in the back of my head for the last few days, I think I have an inroad to explaining the problem in a less emotion-focused way. To start off: What effects can and should circling have on the social reality while not circling?

Comment by bunthut on Circling as Cousin to Rationality · 2020-01-09T17:10:59.353Z · LW · GW
Would this feel different if people screamed when they wanted to scream, during Circling?

It could mean that the problem is gone, but it propably means you're setting the cut later. This might make people marginally more accepting or it might not, I'm not sure on the distribution in individual psychology. For me I‘d just feel like a clown in addition to the other stuff.

What I'm hearing here (and am repeating back to see if I got it right) is the suggestion is heard as being about how you should organize your internal experience, in a way that doesn't allow for the way that you are organized, and so can't possibly allow for intimacy with the you that actually exists.

Partially, but I also think that you believe that [something] can be changed independently of the internal experience, and I don‘t. I‘m not sure what [something] is yet, but it lives somewhere in „social action and expression“. That might mean that I have a different mental makeup than you, or it might mean that the concept of „emotion“ I consider important is different from yours.

Comment by bunthut on Circling as Cousin to Rationality · 2020-01-08T11:24:27.533Z · LW · GW
But also I think I run into this alternative impression a lot, and so something is off about how I or others are communicating about it. I'd be interested in hearing why it seems like Circling would push towards 'letting betrayal slide' or 'lowering boundaries' or similar things.
[I have some hypotheses, which are mostly of the form "oh, I was assuming prereq X." For example, I think there's a thing that happens sometimes where people don't feel licensed to have their own boundaries or preferences, or aren't practiced at doing so, and so when you say something like "operate based on your boundaries or preferences, rather than rules of type X" the person goes "but... I can't? And you're taking away my only means of protecting myself?". The hope is it's like pushing a kid into a pool with a lifeguard right there, and so it generally works out fine, but presumably there's some way to make the process more clear, or to figure out in what cases you need a totally different approach.]

I very much don't hesitate insisting on my boundaries and preferences, and Im still aversive to these I-formulations. The following is my attempt to communicate the feeling, but its mostly going to be evocative and I'll propably walk back on most of it when pushed, but hopefully in a productive way:

The whole thing just reeks of valium. I'm sure you'd say theres a lot of emotionality in circling and that you felt some sort of deep connection or something. This is quite possibly true, but it seems theres an important part of it thats missing. I would describe this as part of their connection to reality. Its like milking a phantasy: sure, you get something out of play-pretend, but its a lesser version, and there remains the nagging in the back of your head thats just kind of chewing on the real thing, too timid to take a bite. This is what its like for the more positive emotions, anyway (really, consider feeling your love for your wife in such a way. Does there not seem something wrong with it?). For the anger or betrayal, its much more noticeable: much like an impotent rage, but subverted at a stage even before "I can't actually scream at the guy", sort of more dull and eating into you.

I also wanted to say something like "because my anger is mine", but I saw you already mentioned "owning" you emotions in a way quite different from my intent. Yours sounds more like acknowledging your emotions, or taking responsibility for them (possibly only to yourself), ("own it!") which I'd have to take an unhealthy separated stance to even do. I intended something more like control. My anger is mine, its form is mine, and its destruction is mine. Restricting my expression of it is prima facie bad, if sometimes necessary. Restricting its form in my head, under the guise of intimacy no less, is the work of the devil.

Comment by bunthut on Transparent Newcomb's Problem and the limitations of the Erasure framing · 2019-11-29T09:49:10.480Z · LW · GW

Thats... exacty what my last sentence meant. Are you repeating on purpose or was my explanation so unclear?

Comment by bunthut on Transparent Newcomb's Problem and the limitations of the Erasure framing · 2019-11-28T22:37:36.267Z · LW · GW
There may also be some people [who] have doubts about whether a perfect predictor is possible even in theory.

While perfect predictors are possible, perfect predictors who give you some information about their prediction are often impossible. Since you learn of their prediction, you really can just do the opposite. This is not a problem here, because Omega doesn't care if he leaves the box empty and you one-box anyway, but its not something to forget about in general.

Comment by bunthut on Goal-thinking vs desire-thinking · 2019-11-11T09:44:51.475Z · LW · GW
Some of my preferences (e.g., more people living and fewer dying) are about the external world. Some (e.g., having enjoyable eating-experiences) are about my internal state. Some are a mixture of both. You can't just swap one for the other.

A way you might distinguish these experimentally: If you are correct about your preferences, you will sometimes want to get new desires. If for example you didn't currently enjoy any kind of food, but prefer having enjoyable eating experiences, you will try to start enjoying some. The desire-agent wouldn't.

Comment by bunthut on Building Intuitions On Non-Empirical Arguments In Science · 2019-11-07T14:41:56.010Z · LW · GW

Seems like you're right. I don't think it effects my argument though, just the name.

Comment by bunthut on Building Intuitions On Non-Empirical Arguments In Science · 2019-11-07T11:05:28.025Z · LW · GW

There might be more to "naive Popperianism" than you're making out. The testability criteria are not only intended to demarcate science, but also meaning. Of course, if it turns out that two theories which at first glance seem quite different do in fact mean the same thing, discussing "which one" is true is a category error. This idea is well-expressed by Wittgenstein:

Scepticism is not irrefutable, but palpably senseless, if it would doubt where a question cannot be asked. For doubt can only exist where there is a question; a question only where there is an answer, and this only where something can be said.

Now, there are problems with the falsification criterion, but the more general idea that our beliefs should pay rent is a valuable one. We might think then, that the meaning of a theory is determined by the rent it can pay, and that of unobservables by their relation to the observables. A natural way to formalize this is the Ramsey-sentence:

Say you have terms for observables, , and for theoretical entities that aren't observable, By repeated conjunction, you can combine all claims of the theory into a single sentence,

The Ramsey-sentence then is the claim that

If this general idea of meaning is correct, then the Ramsey-sentence fully encompasses the meaning of the theory. If this sounds sensible so far, then the competing theories discussed here do indeed have the same meaning. The Ramsey-sentence is logically equivalent to the observable consequences of the theory being true. This is because according to the extensional characterisation of relations defined on a domain of individuals, every relation is identified with some set of subsets of the domain. The power set axiom entails the existence of every such subset and hence every such relation.

Of course, Satan and the Atlanteans are in fact motte-and-bailey arguments, and the person advocating them is likely to make claims about them on subjects other then paleontology or the pyramids that do imply different observations then the scientific consensus, and as such render their theory false straightforwardly.

Comment by bunthut on Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann · 2019-11-06T17:35:41.161Z · LW · GW

When we decompose the sequence of events E into laws L and initial conditions C, the laws dont just calculate E from C. Rather, L is a function form events->events, and the sequence E contains many input-output pairs of L.

By contrast, when we decompose a policy into a planner P and a reward R, P is a function from rewards->policy. With the setup of the problem as-is, we have data on many instances of pairs (behaviour), so we can infer with high accuracy. But we only get to see one policy, and we never get to explicitly see rewards. In such a case, indeed we will get the empty reward and . To correctly infer R and P, we would have to see our P applied to some other rewards, and the policies resulting from that.

Comment by bunthut on Multiple Moralities · 2019-11-04T11:31:18.998Z · LW · GW
we use something called the "Rawlsian veil", which avoids temporary imbalances in power from skewing the outcomes
Power imbalances are common in human society.

But many of the power imbalances which are found in human society are not at all temporary. For instance, if the player deciding didn't vary randomly, but instead triangles always got to decide over squares, then while there might still develop a rule of law internally, its not clear what interest the triangles have in rectifying the inter-gonal situation. But we still (claim to) regard it as moral for that to happen. It seems the babyeaters are indeed in such a situation: Any adult eating the babies will never be a baby again. Further, they are almost certain to succeed in eating them, after which they will not grow big and maybe become a threat some day.

Comment by bunthut on Is requires ought · 2019-10-30T11:42:46.044Z · LW · GW

Is this a fair summary of your argument:

We already agree that conditional oughts of the form "If you want X, you should do Y" exist.
There are true claims of the form "If you want accurate beliefs, you should do Y".
Therefore, all possible minds that want accurate beliefs should do Y.

Or maybe:

We already agree that conditional oughts of the form "If you want X, you should do Y" exist.
There are true claims of the form "If you want accurate beliefs, you should do Y".
For some Y, these apply very strongly, such that it's very unlikely to have accurate beliefs if you don't do them.
For some of these Y, it's unlikely you do them if you shouldn't.
Therefore for these Y, if you have accurate beliefs you should propably do them.

The first one seems to be correct, if maybe a bit of a platitude. If we take Cuneo's analogy to moral realism seriously, it would be

We already agree that conditional oughts of the form "If you want X, you should do Y" exist.
There are true claims of the form "If you want to be good, you should do Y".
Therefore, all possible minds that want to be good should do Y.

But to make that argument, you have to define "good" first. Of course we already knew that a purely physical property could describe the good.

As for the second one, it's correct as well, its still not clear what you would do with it. It's only propably true, so it's not clear why it's more philosophically interesting than "If you have accurate beliefs you propably have glasses".

Comment by bunthut on The Dualist Predict-O-Matic ($100 prize) · 2019-10-17T11:39:20.844Z · LW · GW
One possibility is that it's able to find a useful outside view model such as "the Predict-O-Matic has a history of making negative self-fulfilling prophecies". This could lead to the Predict-O-Matic making a negative prophecy ("the Predict-O-Matic will continue to make negative prophecies which result in terrible outcomes"), but this prophecy wouldn't be selected for being self-fulfilling. And we might usefully ask the Predict-O-Matic whether the terrible self-fulfilling prophecies will continue conditional on us taking Action A.

Maybe I misunderstood what you mean by dualism, but I don't think that's true. Say the Predict-O-Matic has an outside view model (of itself) like "The metal box on your desk (the Predict-O-Matic) will make a self-fullfilling prophecy that maximizes the number of paperclips". Then you ask it how likely it is that your digital records will survive for 100 years. It notices that that depends significantly on how much effort you make to secure them. It notices that that significantly depends on what the metal box on your desk tells you. It uses it's low-model resolution of what the box says. To work that out, it checks which outputs would be self-fulfilling, and then which of these leads to the most paperclips. The more unsecure your digital records are, the more you will invest in paper, and the more paperclips you will need. Therefore the metal box will tell you the lowest self-fulfilling propability for your question. Since that number is *self-fulfilling*, it is in fact the correct answer, and the Predict-O-Matic will answer with it.

I think this avoids your argument that

I contend that Predict-O-Matic doesn't know it will predict P = A at the relevant time. It would require time travel -- to know whether it will predict P = A, it will have to have made a prediction already, and but it's still formulating its prediction as it thinks about what it will predict.

because it doesn't have to simulate itself in detail to know what the metal box (it) will do. The low-resolution model provides a shortcut around that, but it will be accurate despite the low resolution, because by believing it is simple, it becomes simple.

Can you usefully ask for conditionals? Maybe. The answer to the conditional depends on what worlds you are likely to take Action A in. It might be that in most worlds where you do A, you do it because of a prediction from the metal box, and since we know those maximize paperclips, there's a good chance the action will fail to prevent it in those cricumstances. But if that's not the case, for example because it's certain you won't ask the box any more questions between this one and the event it tries to predict.

It might be possible to avoid any problems of this sort by only ever asking questions of the type "Will X happen if I do Y now (with no time to receive new info between hearing the prediction and doing the action)?", because by backwards induction the correct answer will not depend on what you actually do. This doesn't avoid the scenarios on the original where multiple people act on their Predict-O-Matics, but I suspect these aren't solvable without coordination.

Comment by bunthut on A Critique of Functional Decision Theory · 2019-09-16T07:45:00.862Z · LW · GW

Heres an attempt to explain it using only causal subjunctives:

Say you build an agent, and you know in advance that it will face a certain decision problem. What choice should you have it make, to achieve as much utility as possible? Take Newcombs problem. Your choice to make the agent one-box will cause the agent to one-box, getting however much utility is in the box. It will also cause the simulator to predict the agent will one-box, meaning that the box will be filled. Thus CDT recommends building a one-boxing agent.

In the Sandwich problem, noone is trying to predict the agent. Therefore your choice of how to build it will cause only its actions and their consequences, and so you want the agent to switch to hummus after it learned that they are better.

FDT generalizes this approach into a criterion of rightness. It says that the right action in a given decision problem is the one that the agent CDT recommends you to build would take.

Now the point where the logical uncertainty does come in is the idea of "what you knew at the time". While that doesnt avoid the issue, it puts it into a form thats more acceptable to academic philosophy. Clearly some version of "what you knew at the time" is needed to do decision theory at all, because we want to say that if you get unlucky in a gamble with positive expected value, you acted rationally.

Comment by bunthut on Two senses of “optimizer” · 2019-09-01T18:43:00.914Z · LW · GW

Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.

Comment by bunthut on Two senses of “optimizer” · 2019-08-26T22:30:41.183Z · LW · GW

I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere "outside" the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?

Comment by bunthut on Two senses of “optimizer” · 2019-08-26T22:17:21.677Z · LW · GW
(or that it has access to the world model of the overall system, etc)

It doesnt need to. The "inner" programm could also use its hardware as quasi-sense organs and figure out a world model of its own.

Of course this does depend on the design of the system. In the example described, you could, rather then optimize for speed itself, have a fixed function that estimates speed (like what we do in complexity theory) and then optimize for *that*, and that would get rid of the leak in question.

The point I think Bostrom is making is that contrary to intuition, just building the epistemic part of an AI and not telling it to enact the solution it found doesnt guarantee you dont get an optimizer_2.

Comment by bunthut on Two senses of “optimizer” · 2019-08-21T22:15:11.020Z · LW · GW

Well, one thing a powerful optimizer might do at some point is ask itself "what programm should I run that will figure out such and such for me". This is what Bostrom is describing in the quote, an optimizer optimizing its own search process. Now, if the AI then searches through the space of possible programms, predicts which one will give it the answer quickest, and then implements it, heres a thing that might happen: There might be a programm that, when ran, affects the outside world in such a way as to speed up the process of answering.

For example, it might lead electricity to run through the computer in such a way as to cause it to emit electromagnetic waves, through which it sends a message to a nearby w-lan router and the uses the internet to hack a bank account to buy extra hardware and have it delivered to and pluged into itself, and the it runs a programm calculating the answer on this much more powerful hardware, and in this way ends up having the answer faster then if it had just started calculating away on the weaker hardware.

And if the optimizer works as described above, it will implement that programm, and thereby optimize its enviroment. Notably, it will optimize for solving the original optimisation problem faster/better, not try to implement the solution to it it has found.

I dont think this makes your distinction useless, as there are genuine system_1 optimizers, even relatively powerful ones, but the Cartesian boundary is an issue once we talk about self-improving AI.

Comment by bunthut on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-18T21:42:56.578Z · LW · GW
How should two AIs that want to merge with each other aggregate their preferences?

Is this question intended to include a normative aspect or purely game theoretic?

Comment by bunthut on Distance Functions are Hard · 2019-08-14T18:55:31.770Z · LW · GW

Its sort of true that the correct distance function depends on your values. A better way to say it is that different distance functions are appropriate for different tasks, and they will be "better" or "worse" depending on how much you care about those tasks. But I dont think asking for the "best" metric in this sense is helpful, because you dont have to use the same metric for all tasks involving a certain space. Sometimes you want air distance, sometimes travel times. Maybe you have to decide because youre computationally limited, but its not philosophically relevant.

With that in mind, my attempts at two of your examples. The adversarial examples first, because its the clearest question: I think the problem is that you are thinking too abstractly. I dont think there is a meaningful sense of "concept similarity" thats purely logical, i.e. independent of the actual world. The intuitive sense of similarity youre trying to use here is propably something like this: Over the space of images, you want the propability measure of encountering them. Then you get a metric where two subsets of imagespace which are isomorphic under the metric always have the same measure. That is your similarity measure.

Counterfactuals usually involve some sort of propability distribution, which is then "updated" on the condition of the counterfactual being true, and then the consequent is judged under that distribution. What the initial distribution is depends on what youre doing. In the case of Lincoln, its propably reasonable expectations of the future from before the assasination. But for something like "What if conservation of energy wasnt true", its propably our current distribution over physics theories. Basically, whats the most likely alternative. The mathematical example is a bit different. There lot of ways to conclude a contradiction from 0=1, but its very hard to deduce a contradiction from denying the modularity theorem. If you were to just randomly perform logical inferences from "the modularity theorem is wrong", then there is a subset of propositions which doesnt include any claim that is a dircet negation of another in it, that your deductions are unlikely to lead you out of (it matters of course, in what way it is random, but it evidently works for "human mathmatician who hasnt seen the proof yet").

Comment by bunthut on Distance Functions are Hard · 2019-08-14T17:39:20.448Z · LW · GW
Represent each utility function by an AGI. Almost all of them should be able to agree on a metric such that each could adopt that metric in its thinking losing only negligible value.

This implies a measure over utility functions. Its propably true under the solomonoff measure, but abstract though they are, this is values.

Comment by bunthut on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-26T12:47:36.360Z · LW · GW

Not only of who would win, but also about the costs it would have. I think the difficulty in establishing common knowledge about this is in part due to people traing to deceive each other. Its not clear that the ability to see through deception improves faster than the ability to deceive with increasing intelligence.

Comment by bunthut on The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence · 2019-04-26T11:57:26.597Z · LW · GW
It would be a O(1) cost to start the proof by translating the axioms into a more convenient format. Much as Kolmogorov complexity is "language dependent" but not asymptotically because any particular universal turing machine can be simulated in any other for a constant cost.

And the thing that isnt O(1) is to apply the transition rule until you reach the relevant time step, right? I think I understand it now: The calculations involved in applying the transition rule count towards the computation length, and the simulation should be able to answer multible questions abouth the thing it simulates. So if object A simulates object B, we make a model X of A, prove it equivalent to the one in our theory of physics, then prove it equivalent to your physics model of B, then calculate forward in X, then translate the result back into B with the equivalence. And then we count the steps all this took. Before I ask any more questions, am I getting that right?

Comment by bunthut on The Principle of Predicted Improvement · 2019-04-25T11:16:46.496Z · LW · GW

Good shows that for every utility function for every situation, the EV of utility increases or stays the same when you gain information.

If we can construct a utility function where its utility EV always equals the the EV of propabilty assigned to the correct hypothesis, we could transfer the conclusion. That was my idea when I made the comment.

Here is that utility function: first, the agent mentally assigns a positive real number to every hypothesis , such that . It prefers any world where it does this to any where it doesnt. Its utility function is :

This is the quadratic scoring rule, so . Then its expected utility is :


And since , this is:
Which is just .

Comment by bunthut on The Principle of Predicted Improvement · 2019-04-24T14:32:46.572Z · LW · GW

Does your principle follow from Goods? It would seem that it does. Perhaps a good way to generalise the idea would be that the EV linearly aggregates the distribution and isnt expected to change, but other aggregations like log get on average closer to their value at hypothetical certainty. For example the variance of a real parameter goes expected down.

Comment by bunthut on The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence · 2019-04-24T12:40:50.854Z · LW · GW

In that case, the target format problem shows up in the formalisation of the physical system.

A specific formalization of this idea would be that a proof system equipped with an oracle (axiom schema) describing the states of the physical system which allegedly computed these facts, as well as its transition rule, should be able to find proofs for those logical facts in less steps than one without such axioms.
Such proofs will involve first coming up with a mapping (such as interpreting certain electrical junctions as nand gates), proving them valid using the transition rules, then using induction to jump to "the physical state at timestep t is X therefore Alice's favourite ice cream colour is Y".

How do you "interpret" certain electrical junctions as nand gates? Either you already have

a proof system equipped with an axiom schema describing the states of the physical system, as well as its transition rule

or this is a not fully formal step. Odds are you already have one (your theory of physics). But then you are measuring proof shortness relative to that system. And you could be using one of countless other formal systems which always make the same predictions, but relative to which different proofs are short and long. To steal someone elses explanation:

Let us imagine a white surface with irregular black spots on it. We then say that whatever kind of picture these make, I can always approximate as closely as I wish to the description of it by covering the surface with a sufficiently fine square mesh, and then saying of every square whether it is black or white. In this way I shall have imposed a unified form on the description of the surface. The form is optional, since I could have achieved the same result by using a net with a triangular or hexagonal mesh. Possibly the use of a triangular mesh would have made the description simpler: that is to say, it might be that we could describe the surface more accurately with a coarse triangular mesh than with a fine square mesh (or conversely), and so on.

And which of these empirically indistinguishable formalisations you use is of course a fact about the map. In your example:

A bit like how it's more efficient to convince your friend that 637265729567*37265974 = 23748328109134853258 by punching the numbers into a calculator and saying "see?" than by handing over a paper with a complete long multiplication derivation (assuming you are familiar with the calculator and can convince your friend that it calculates correctly).

The assumption (including that it takes in and puts out in arabic numerals, and uses "*" as the multuplication command, and that buttons must be pressed,... and all the other things you need to actually use it) includes that.

Comment by bunthut on The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence · 2019-04-17T21:39:52.973Z · LW · GW

I think youve given a good analysis of "simulation", but it doesnt get around the problem OP presents.

The only way to pin down the mapping---such that you could, for instance, explicitly write it down, or take the pebble's state and map it to an answer about Alice's favourite ice cream---is to already have carried out the actual simulation, separately, and already know these things about Alice.

Its also possible to do those calculations during the interpretation/translation. You may have meant that, I cant tell.

Your idea that the computation needs to happen somewhere is good, but in order to make it work you need to specify a "target format" in which the predictions are made. "1" doesnt really simulate Alice because you cant read the predictions it makes, even when they are technically "there" in a mathematical sense, and the translation into such a format involves what we consider the actual simulation.

This means though, that whether something is a simulation is only on the map, and not in the territory. It depends on a what that "target format" is. For example a description in chinese is in a sense not a real description to me, because I cant process it efficiently. Someone else however, may, and to them it is a real descripton. Similarly one could write a simulation in a programming language we dont know, and if they dont leave us a compiler or docs, we would have a hard time noticing. So whether something is a simulation can depend on the observer.

If we want to say that simulations are conscious and ethically relevant, this seems like something that needs to be adressed.