Comment by chris_leong on Would an option to publish to AF users only be a useful feature? · 2019-05-21T05:27:41.496Z · score: 6 (3 votes) · LW · GW

That wouldn't stay secret. I'm pretty confident that someone would leak all this information at some point. But beyond this is creates difficulties in who gets access to the Alignment Forum as it then wouldn't just be about having sufficient knowledge to comment on these issues, but also about trust.

Comment by chris_leong on "UDT2" and "against UD+ASSA" · 2019-05-21T05:17:21.014Z · score: 2 (1 votes) · LW · GW

One aspect of UD+ASSA that is weird is that the UD is uncomputable itself. This seems to contradict the notion of assuming that everything is computable, although maybe there is a special justification that can be given for this?

I don't think the the 0.91 probability is necessarily incorrect. You just have to remember that as long as you care about your family and not your experience of knowing your family is looked after, you only get paid out once in the universe, not once per copy.

Comment by chris_leong on mAIry's room: AI reasoning to solve philosophical problems · 2019-05-21T04:25:41.545Z · score: 4 (2 votes) · LW · GW

This post clearly helped a lot of other people, but it follows a pattern that many other posts on Less Wrong also follow which I consider negative. The valuable contribution here is not the formalisation, but the generator behind the formalisation. The core idea appears to be the following:

"Human brains contain two forms of knowledge. Explicit knowledge and weights that are used in implicit knowledge (admittedly the former is hacked on top of the later, but that isn't relevant here). Mary doesn't gain any extra explicit knowledge from seeing blue, but her brain changes some of her implicit weights so that when a blue object activates in her vision a sub-neural network can connect this to the label "blue"."

Unfortunately, there is a wall of maths that you have to wade through before this is explained to you. I feel it is much better when you provide your readers with a conceptual understanding of what is happening and only then include the formal details.

Comment by chris_leong on Offer of collaboration and/or mentorship · 2019-05-19T02:57:55.447Z · score: 2 (1 votes) · LW · GW

I mean that there isn't a property of logical counterfactuals in the universe itself. However, once we've created a model (/map) of the universe, we can then define logical counterfactuals as about asking a particular question about this model. We just need to figure out what that question is.

Comment by chris_leong on Offer of collaboration and/or mentorship · 2019-05-17T22:24:28.977Z · score: 2 (1 votes) · LW · GW

You've explained the system. But what's the motivation behind this?

Even though I only have a high level understanding of what you're doing, I generally disagree with this kind of approach on a philosophical level. It seems like you're reifying logical counterfactuals, when I see them more like an analogy, ie. positing a logical counterfactual is an operation that takes place on the level of the map, not the territory.

Comment by chris_leong on Offer of collaboration and/or mentorship · 2019-05-16T22:46:52.643Z · score: 4 (2 votes) · LW · GW

Can you tell me more about your ideas related to logical counterfactuals? They're an area of been working on as well.

Comment by chris_leong on Feature Request: Self-imposed Time Restrictions · 2019-05-16T10:16:55.312Z · score: 2 (1 votes) · LW · GW

I'm hugely in favour of this. There have been quite reasonable questions raised about how much Less Wrong improves us and how much it sucks up our time.

Comment by chris_leong on Coherent decisions imply consistent utilities · 2019-05-14T18:06:29.991Z · score: 4 (2 votes) · LW · GW

Okay, so there is an additional assumption that these strings are all encoded as infinite sequences. Instead, they could be encoded with a system that starts by listing the number of digits or -1 if the sequence if infinite, then provide those digits. That's a pretty key property to not mention (then again, I can't criticise too much as I was too lazy to read the PDF). Thanks for the explanation!

Comment by chris_leong on Physical linguistics · 2019-05-14T02:54:08.262Z · score: 5 (3 votes) · LW · GW

I agree that "Why is this rock this rock instead of that rock?" is a good place to start, even if they aren't perfectly analogous. Now, it isn't entirely clear what is being asked. The first question that we could be asking is: "Why is this rock the way that it is instead of the way that rock is?", in which case we could talk about the process of rock formation and the rock's specific history. Another question we could be asking is, "Why is this rock here at this time instead of that rock?" and again we'd be talking about history and how people or events moved it. We could even make anthropic arguments, "This rock isn't a million degrees because if it were that hot it would not longer be a rock" or "This rock isn't a diamond and this is unsurprising as they are rare". Here we'd be asking, "Given a random rock, why are we most likely to be observing certain characteristics?"

One difference with the human example is that the human is asking the question, "Why am I me instead of someone else?" So you can also reason about your likely properties on the basis of being the kind of being who is asking that question. Here the question is interpreted as, "Why is the entity asking this question this entity instead of another entity?".

Another issue which becomes clearer is the symmetry. Barrack Obama might ask, "Why am I me instead of the Pope?" whilst at the same time the Pope asks, "Why am I me instead of Barrack Obama?". So even if you had been someone else, you might very well of been asking the same question. I think this ties well into the notion of surprise. Let's suppose a million people receive a social security number and you receive 235,104. You might argue, "How surprising, there was only a one in a million chance of receiving this number!". However you could have said this regardless of which number you'd been given, so it isn't that surprising after all.

Another question that could be asked is, "Why is my consciousness receiving the qualia (subjective experience) from this physical body?" In this case, the answer depends on your metaphysics. Materialists would say this is a mistaken question as qualia don't exist. Christianity might say it's because God chose to attach this soul to this body. Other spiritual theories might have souls floating around which inhabit any body which is free (although this raises questions such as: what if no soul chooses to inhabit a body and which soul gets to inhabit which body). Lastly, there's theories like property dualism where the consciousness is a result of the mental properties of particles so that the consciousness corresponding to any one particular body couldn't be attached to anyone else without breaking the laws of the universe. So as described in my post Natural Structures and Definitions, this last interpretation is one of those questions that is conditionally meaningful to ask.

Comment by chris_leong on Coherent decisions imply consistent utilities · 2019-05-14T02:09:04.395Z · score: 2 (1 votes) · LW · GW

Hmm, I'm still not following. Limits are uncomputable in general, but I just need one computational function where I know the limits at one point and then I can set it to p+1 instead. Why wouldn't this function still be computable? Maybe "computable function" is being defined differently than I would expect.

Comment by chris_leong on Coherent decisions imply consistent utilities · 2019-05-13T04:29:45.463Z · score: 4 (4 votes) · LW · GW

My understanding of the arguments against using a utility maximiser is that proponents accept that this will lead to sub-optimal or dominated outcomes, but they are happy to accept this because they believe that these AIs will be easier to align. This seems like a completely reasonable trade-off to me. For example, imagine that choosing option A is worth 1 utility. Option B is worth 1.1 utility if 100 mathematical statements are all correct, but -1000 otherwise (we are ignoring the costs of reading through and thinking about all 100 mathematical statements). Even if each of the statements seems obviously correct, there is a decent chance that you messed up on at least 1 of them, so you'll most likely want to take the outside view and pick option A. So I don't think it's necessarily an issue if the AI is doing things that are obviously stupid from an inside view.

Comment by chris_leong on Coherent decisions imply consistent utilities · 2019-05-13T04:19:36.345Z · score: 4 (2 votes) · LW · GW

"Because all computable functions are continuous" - how does this make any sense? Why can't I just pick a value x=1 and if it's left limit and right limit are p, set the function to p+1 at x=1.

Comment by chris_leong on Coherent decisions imply consistent utilities · 2019-05-13T04:15:54.303Z · score: 2 (1 votes) · LW · GW

"Is a fleeting emotional sense of certainty over 1 minute, worth automatically discarding the potential $5-million outcome?" - I know it's mostly outside of what is being modelled here, but suspect that someone who takes the 90% bet and wins nothing might experience much more than just a fleeting sense of disappointment, much more than someone who takes the 45% chance and doesn't win.

Comment by chris_leong on Narcissism vs. social signalling · 2019-05-12T09:23:13.608Z · score: 2 (1 votes) · LW · GW

Do you have any empirical evidence?

Comment by chris_leong on Mixed Reference: The Great Reductionist Project · 2019-05-12T09:19:25.278Z · score: 2 (1 votes) · LW · GW

This is a really good point, I'm disappointed that he didn't respond to it.

Comment by chris_leong on Probability interpretations: Examples · 2019-05-12T08:42:33.358Z · score: 5 (3 votes) · LW · GW

"The propensity and frequentist views regard as nonsense the notion that we could talk about the probability of a mathematical fact" - couldn't a frequentist define a reference class using all the digits of Pi? And then assume that the person knows nothing about Pi so that they throw away the place of the digit?

Comment by chris_leong on Narcissism vs. social signalling · 2019-05-12T04:23:00.893Z · score: 2 (1 votes) · LW · GW

What did he believe changed?

Narcissism vs. social signalling

2019-05-12T03:26:31.552Z · score: 15 (7 votes)
Comment by chris_leong on Natural Structures and Definitions · 2019-05-01T10:29:11.993Z · score: 2 (1 votes) · LW · GW

Sure you have to be using the word in some way, but there's not guarantee that there's a meaningful concept that can be extracted from it or whether the term is just used in ways that are hopelessly confused.

Comment by chris_leong on Natural Structures and Definitions · 2019-05-01T09:25:36.206Z · score: 2 (1 votes) · LW · GW

"You don't need to guess what someone means, or what level of discussion they're looking for" - yes, that is part of the point of providing possible interpretations - to help with the clarification

Comment by chris_leong on Natural Structures and Definitions · 2019-05-01T09:24:26.548Z · score: 3 (2 votes) · LW · GW

"It is the same if you ask "What is consciousness?" You must already being using that word to refer to some thing or phenomenon that you are familiar with, and what you are asking about is what that thing is made of, or how that thing works, questions about the world, not about words" - philosophy discussions ask this all the time without presuming that the definition is already known.

Natural Structures and Definitions

2019-05-01T00:05:35.698Z · score: 21 (8 votes)
Comment by chris_leong on Buying Value, not Price · 2019-04-30T06:48:01.304Z · score: 3 (2 votes) · LW · GW

The appropriate calculation is whether the marginal value of the upgrade is > $500, where the value of $500 is the marginal value of having that additional money.j

Comment by chris_leong on Liar Paradox Revisited · 2019-04-23T09:53:28.103Z · score: 2 (1 votes) · LW · GW

Yep, it can be assigned that if you use the fixed point definition of truth.

Comment by chris_leong on Liar Paradox Revisited · 2019-04-17T23:03:01.120Z · score: 2 (1 votes) · LW · GW

Didn't work, just showed the triple ticks

Liar Paradox Revisited

2019-04-17T23:02:45.875Z · score: 11 (3 votes)
Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-17T22:32:36.276Z · score: 2 (1 votes) · LW · GW

""The box" that's correlated with our output subjectively is a box which is chosen differently in cases where our output is different; and, the choice-of-box contains a copy of us. So the example works" - that's a good point and if you examine the source code, you'll know it was choosing between two boxes. Maybe we need an extra layer of indirection. There's a Truth Tester who can verify that the Predictor is accurate by examining its source code and you only get to examine the Truth Tester's code, so you never end up seeing the code within the predictor that handles the case where the box doesn't have the same output as you. As far as you are subjectively concerned, that doesn't happen.

Comment by chris_leong on Liar Paradox Revisited · 2019-04-17T10:20:50.526Z · score: 4 (2 votes) · LW · GW

I can't figure out how to indent my code

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-17T04:17:47.383Z · score: 2 (1 votes) · LW · GW

"How do you propose to reliably put an agent into the described situation?" - Why do we have to be able to reliably put an agent in that situation? Isn't it enough that an agent may end up in that situation?

But in terms of how the agent can know the predictor is accurate, perhaps the agent gets to examine its source code after it has run and its implemented in hardware rather than software so that the agent knows that it wasn't modified?

But I don't know why you're asking so I don't know if this answers the relevant difficulty.

(Also, just wanted to check whether you've read the formal problem description in Logical Counterfactuals and the Co-operation Game)

Comment by chris_leong on Open Problems in Archipelago · 2019-04-17T03:08:02.760Z · score: 3 (2 votes) · LW · GW

Policing is only one aspect. Listing rules sets norms and the effect of selecting for people with more than just a casual interest in a topic helps as well.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-17T03:04:47.262Z · score: 2 (1 votes) · LW · GW

"The bullet I want to bite is the weaker claim that anything subjunctively linked to me has me somewhere in its computation (including its past)" - That doesn't describe this example. You are subjunctively linked to the dumb boxes, but they don't have you in their past. The thing that has you in its past is the predictor.

Comment by chris_leong on Open Problems in Archipelago · 2019-04-16T23:30:35.872Z · score: 4 (0 votes) · LW · GW

I'm very optimistic about sub-reddits - there are many examples such as AskPhilosophy, ChangeMyView and Slatestarcodex that demonstrate how powerful they can be. One major advantage of LW vs. Reddit is that it draws users from a different demographic. LW users are much less likely to stir up trouble or post low quality comment, so there'll probably be minimal work policing boundaries.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-16T02:41:32.972Z · score: 2 (1 votes) · LW · GW

"OK, subjunctive statements are linked to subjective states of knowledge. Where does that speak against the naive functionalist position?" - Actually, what I said about relativism isn't necessarily true. You could assert that any process that is subjunctively linked to what is generally accepted to be a consciousness from any possible reference frame is cognitively identical and hence experiences the same consciousness. But that would include a ridiculous number of things.

By telling you that a box will give the same output as you, we can subjunctively link it to you, even if it is only either a dumb box that immediately outputs true or a dumb box that immediately outputs false. Further, there is no reason why we can't subjunctively link someone else facing a completely different situation to the same black box, since the box doesn't actually need to receive the same input as you to be subjunctively linked (this idea is new, I didn't actually realise that before). So the box would be having the experiences of two people at the same time. This feels like a worse bullet than the one you already want to bite.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-16T00:49:50.260Z · score: 2 (1 votes) · LW · GW

"Namely, that we should collapse apparently distinct notions if we can't give any cognitive difference between them" - I don't necessarily agree that being subjunctively linked to you (such that it gives the same result) is the same as being cognitively identical, so this argument doesn't get off the ground for me. If adopt a functionalist theory, it seems quite plausible that the degree of complexity is important too (although perhaps you'd say that isn't pure functionalism?)

It might be helpful to relate this to the argument I made in Logical Counterfactuals and the Cooperation Game. The point I make there is that the processes are subjunctively linked to you is more a matter of your state of knowledge than anything about the intrinsic properties of the object itself. So if you adopt the position that things that are subjunctively linked to you are cognitively and hence consciously the same, you end up with a highly relativistic viewpoint.

I'm curious, how much do people at MIRI lean towards naive functionalism? I'm mainly asking because I'm trying to figure out whether there's a need to write a post arguing against this.

Comment by chris_leong on The Happy Dance Problem · 2019-04-13T10:45:41.241Z · score: 2 (1 votes) · LW · GW

I'm confused how we can assign probabilities to what the agent will do as above and also act as though the agent is an updateless agent, as the updateless agent will presumably never do the Happy Dance. You've argued against this in the Smoking Lesion, so why can we do it here?

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-11T02:29:02.936Z · score: 2 (1 votes) · LW · GW
I suspect you've been thinking of me as wanting to open up the set of anthropic instances much wider than you would want. But, my view is equally amenable to narrowing down the scope of counterfactual dependence, instead. I suspect I'm much more open to narrowing down counterfactual dependence than you might think.

Oh, I completely missed this. That said, I would be highly surprised if these notions were to coincide since they seem like different types. Something for me to think about.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-11T01:36:51.791Z · score: 2 (1 votes) · LW · GW
Both of us are thinking about how to write a decision theory library.

That makes your position a lot clearer. I admit that the Abstraction Approach makes things more complicated and that this might affect what you can accomplish either theoretically or practically by using the Reductive Approach, so I could see some value in exploring this path. For Stuart Armstrong's paper in particular, the Abstraction Approach wouldn't really add much in the way of complications and it would make it much clearer what was going on. But maybe there are other things you are looking into where it wouldn't be anywhere near this easy. But in any case, I'd prefer people to use the Abstraction Approach in the cases where it is easy to do so.

An argument in favor of naive functionalism makes applying the abstraction approach less appealing

True, and I can imagine a level of likelihood below which adopting the Abstraction Approach would be adding needless complexity and mostly be a waste of time.

I think it is worth making a distinction between complexity in the practical sense and complexity in the hypothetical sense. In the practical sense, using the Abstraction Approach with Naive Functionalism is more complex than the Reductive Approach. In the hypothetical sense, they are equally complex in term of explaining how anthropics works given Naive Functionalism as we haven't postulated anything additional within this particular domain (you may say that we've postulated consciousness, but within this assumption it's just a renaming of a term, rather than the introduction of an extra entity). I believe that Occam's Razor should be concerned with the later type of complexity, which is why I wouldn't consider it a good argument for the Reductive Approach.

But that you strongly prefer to abstract in this case

I'm very negative on Naive Functionalism. I've still got some skepticism about functionalism itself (property dualism isn't implausible in my mind), but if I had to choose between Functionalist theories, that certainly isn't what I'd pick.

Comment by chris_leong on Excerpts from a larger discussion about simulacra · 2019-04-11T00:57:05.860Z · score: 24 (9 votes) · LW · GW

Baudrillard's language seems quite religious, so I almost feel that a religious example might relate directly to his claims better. I haven't really read Baudrillard, but here's how I'd explain my current understanding:

Stage 1: People pray faithfully in public because they believe in God and follow a religion. Those who witness this prayer experience a window into the transcendent.

Stage 2: People realise that they can gain social status by praying in public, so they pretend to believe. Many people are aware of this, so witnessing an apparently sincere prayer ceases to be the same experience as you don't know whether it is genuine or not. It still represent the transcendent to some degree, but the experience of witnessing it just isn't the same.

Stage 3: Enough people have started praying insincerely that almost everyone starts jumping on the bandwagon. Publicly prayer has ceased to be an indicator of religiosity or faith any more, but some particularly naive people still haven't realised the pretence. People still gain status from this for speaking sufficiently elegantly. People can't be too obviously fake though or they'll be punished either by the few still naive enough to buy into it or by those who want to keep up the pretence.

Stage 4: Praying is now seen purely as a social move which operates according to certain rules. It's no longer necessary in and of itself to convince people that you are real, but part of the game may include punishments for making certain moves. For example, if you swear during your prayer, that might be punished for being inappropriate, even though no-one cares about religion any more, because that's seen as cheating or breaking the rules of the game. However, you can be obviously fake in ways that don't violate these rules, as the spirit of the rules has been forgotten. Maybe people pray for vain things like becoming wealthy. Or they go to church one day, then post pictures of them getting smashed the next day on Facebook, which all their church friends see, but none of them care. The naive are too few to matter and if they say anything, people will make fun of them.

I'll admit that I've added something of my own interpretation here, especially in terms of how strongly you have to pretend to be real at the various stage

Comment by chris_leong on Deconfusing Logical Counterfactuals · 2019-04-10T23:20:03.789Z · score: 2 (1 votes) · LW · GW
The interpretation issue of a decision problem should be mostly gone when we formally specify it

In order to formally specify a problem, you will have already explicitly or implicitly expressed what an interpretation of what decision theory problems are. But this doesn't make the question, "Is this interpretation valid?" disappear. If we take my approach, we will need to provide a philosophical justification for the forgetting; if we take yours, we'll need to provide a philosophical justification that we care about the results of these kinds of paraconsistent situations. Either way, there will be further work beyond the formularisation.

The decision algorithm considers each output from a given set... It's a property of the formalism, but it doesn't seem like a particularly concerning one

This ties into the point I'll discuss later about how I think being able to ask an external observer to evaluate whether an actual real agent took the optimal decision is the core problem in tying real world decision theory problems to the more abstract theoretical decision theory problems. Further down you write:

The agent already considers what it considers (just like it already does what it does)

But I'm trying to find a way of evaluating an agent from the external perspective. Here, it is valid to criticise an agent for not selecting as action that it didn't consider. Further, it isn't always clear what actions are "considered" as not all agent might have a loop over all actions and they may use shortcuts to avoid explicitly evaluating a certain action.

I feel like I'm over-stating my position a bit in the following, but: this doesn't seem any different from saying that if we provide a logical counterfactual, we solve decision theory for free

"Forgetting" has a large number of free parameters, but so does "deontology" or "virtue ethics". I've provided some examples and key details about how this would proceed, but I don't think you can expect too much more in this very preliminary stage. When I said that a forgetting criteria would solve the problem of logical counterfactuals for free, this was a slight exaggeration. We would still have to justify why we care about raw counterfactuals, but, actually being consistent, this would seem to be a much easier task than arguing that we should care about what happens in the kind of inconsistent situations generated by paraconsistent approaches.

I disagree with your foundations foundations post in so far as it describes what I'm interested in as not being agent foundations foundations

I actually included the Smoking Lesion Steelman ( as Foundations Foundations research. And CDT=EDT is pretty far along in this direction as well (, although in my conception of what Foundations Foundations research should look like, more attention would have been paid to the possibility of the EDT graph being inconsistent, while the CDT graph was consistent.

Your version of the 5&10 problem... The agent takes some action, since it is fully defined, and the problem is that the decision theorist doesn't know how to judge the agent's decision.

That's exactly how I'd put it. Except I would say I'm interested in the problem from the external perspective and the reflective perspective. I just see the external perspective as easier to understand first.

From the agent's perspective, the 5&10 problem does not necessarily look like a problem of how to think about inconsistent actions

Sure. But the agent is thinking about inconsistent actions beneath the surface which is why we have to worry about spurious counterfactuals. And this is important for having a way of determining if it is doing what it should be doing. (This becomes more important in the edge cases like Troll Bridge -

My interest is in how to construct them from scratch

Consider the following types of situations:

1) A complete description of a world, with an agent identified

2) A theoretical decision theory problem viewed by an external observer

3) A theoretical decision theory problem viewed reflectively

I'm trying to get from 1->2, while you are trying to get from 2->3. Whatever formalisations we use need to ultimately relate to the real world in some way, which is why I believe that we need to understand the connection from 1->2. We could also try connecting 1->3 directly, although that seems much more challenging. If we ignore the link from 1->2 and focus solely on a link from 2->3, then we will end up implicitly assuming a link from 1->2 which could involve assumptions that we don't actually want.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-10T04:58:39.527Z · score: 2 (1 votes) · LW · GW
Making a theory of counterfactuals take an arbitrary theory of consciousness as an argument seems to cement this free-floating idea of consciousness, as an arbitrary property which a lump of matter can freely have or not have

The argument that you're making isn't that the Abstraction Approach is wrong, it's that by supporting other theories of consciousness, it increases the chance that people will mistakenly fail to choose Naive Functionalism. Wrong theories do tend to attract a certain number of people believing in them, but I would like to think that the best theory is likely to win out over time on Less Wrong.

And there's a cost to this. If we remove the assumption of a particular theory of consciousness, then more people will be able to embrace the theories of anthropics that are produced. And partial agreement is generally better than none.

My whole point is that it is simpler to select the theory of consciousness which requires no extra ontology beyond what decision theory already needs for other reasons

This is an argument for Naive Functionalism vs other theories of consciousness. It isn't an argument for the Abstracting Approach over the Reductive approach. The Abstracting Approach is more complicated, but it also seeks to do more. In order to fairly compare them, you have to compare both on the same domain. And given the assumption of Naive Functionalism, the Abstracting Approach reduces to the Reductive Approach.

What is the claimed inconsistency?

I provided reasons why I believe that Naive Functionalism is implausible in an earlier comment. I'll admit that inconsistency is too strong of a word. My point is just that you need an independent reason to bite the bullet other than simplicity. Like simplicity combined with reasons why the bullets sound worse than they actually are.

When you described your abstraction approach, you said that we could well choose naive functionalism as our theory of consciousness.

Yes. It works with any theory of consciousness, even clearly absurd ones.

Comment by chris_leong on Agent Foundation Foundations and the Rocket Alignment Problem · 2019-04-10T02:41:04.585Z · score: 2 (1 votes) · LW · GW

I think the former is very important, but I'm quite skeptical of the later. What would be the best post of yours for a skeptic to read?

Comment by chris_leong on Agent Foundation Foundations and the Rocket Alignment Problem · 2019-04-09T22:10:36.290Z · score: 6 (3 votes) · LW · GW

I'm not saying that you can't doodle in maths. It's just that when stumble upon a mathematical model, it's very easy to fall into confirmation bias, instead of really deeply considering if what you're doing makes sense from first principles. And I'm worried that this is what is happening in Agent Foundations research.

Comment by chris_leong on Agent Foundation Foundations and the Rocket Alignment Problem · 2019-04-09T22:03:47.017Z · score: 2 (1 votes) · LW · GW

Interesting. Is the phenomenological work to try to figure out what kind of agents are conscious and therefore worthy of concern or do you expect insights into how AI could work?

Agent Foundation Foundations and the Rocket Alignment Problem

2019-04-09T11:33:46.925Z · score: 13 (5 votes)
Comment by chris_leong on Deconfusing Logical Counterfactuals · 2019-04-09T09:38:20.121Z · score: 0 (2 votes) · LW · GW

(This comment was written before reading EDT=CDT. I think some of my views might update based on that when I have more time to think about it)

In your post, you say that before erasing information, a problem where what you do is determined is trivial, in that you only have the one option. That's the position I'm disagreeing with.

It will be convenient for me to make a slightly different claim than the one I made above. Instead of claiming that the problem is trivial in completely determined situations, I will claim that it is trivial given the most straightforward interpretation of a problem* (the set of possible actions for an agent are all those which are consistent with the problem statement and the action which is chosen is selected from this set of possible actions). In so far as both of us want to talk about decision problems where multiple possible options are considered, we need to provide a different interpretation of what decision problems are. Your approach is to allow the selection of inconsistent actions, while I suggest erasing information to provide a consistent situation.

My response is to argue as per my previous comment that there doesn't seem to be any criteria for determining which inconsistent actions are considered and which ones aren't. I suppose you could respond that I haven't provided criteria for determining what information should be erased, but my approach has the benefit that if you do provide such criteria, logical counterfactuals are solved for free, while it's much more unclear how to approach this problem in the allowing inconsistency approach (although there has been some progress with things like playing chicken with the universe).

*excluding unprovability issues

The way you're describing it, it sounds like erasing information isn't something agents themselves are supposed to ever have to do

You're at the stage of trying to figure out how agents should make decisions. I'm at the stage of trying to understand what a making a good decision even means. Once there is a clearer understanding of what a decision is, we can then write an algorithm to make good decisions or we may discover that the concept dissolves, in which case we will have to specify the problem more precisely. Right now, I'd be perfectly happy just to have a clear criteria by which an external evaluator could say whether an agent made a good decision or not, as that would constitute substantial progress.

I'm somewhat confused about what you're saying in this paragraph and what assumptions you might be making

My point was that there isn't any criteria for determining which inconsistent actions are considered and which ones aren't if you are just thrown a complete description of a universe and an agent. Transparent Newcomb's already comes with the options and counterfactuals attached. My interest is in how to construct them from scratch.

My impression is that some philosophers hold a decision theory like CDT and EDT responsible for what advice it offers in a particular situation, even if it would be impossible to put agents in that situation 

I think it is important to use very precise language here. The agent isn't being rated on what it would do in such a situation, it is being rated on whether or not it can be put into that situation at all.

I suspect that sometimes when an agent can't be put into a situation it is because the problem has been badly formulated (or falls outside the scope of problems where its decision theory is defined), while in other cases this is a reason for or against utilising a specific decision theory algorithm. Holding an agent responsible for all situations it can't be in seems like the wrong move, as it feels that there is some more fundamental confusion that needs to be cleaned up.

I take the motto "decisions are for making bad outcomes inconsistent"

I'm not a fan of reasoning via motto when discussing these kinds of philosophical problems which turn on very precise reasoning.

So it is not meaningless to talk about what happens if you take an action which is inconsistent with what you know!... I don't know that you disagree with any of this... We can set up a sort of reverse transparent Newcomb, where you should take the action which makes the situation impossible

There's something of a tension between what I've said in this post about only being able to take decisions that are consistent and what I said in Counterfactuals for Perfect Predictors, where I noted a way of doing something analogous to acting to make your situation inconsistent. This can be cleared up by noting that erasing information in many decision theory problems provides a problem statement where input-output maps can define all the relevant information about an agent. So I'm proposing that this technique be used in combination with erasure, rather than separately. 

Comment by chris_leong on Deconfusing Logical Counterfactuals · 2019-04-09T02:00:10.703Z · score: 5 (2 votes) · LW · GW

I should note: This is my own idiosyncratic take on Logical Counterfactuals, with many of the links referring to my own posts and I don't know if I've convinced anyone else of the merits of this approach yet.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-08T22:54:38.316Z · score: 2 (1 votes) · LW · GW

This seems like a motte and bailey. There's a weak sense of "must have you inside its computation" which you've defined here and a strong sense as in "should be treated as containing as consciousness".

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-08T22:48:26.956Z · score: 2 (1 votes) · LW · GW

You wrote:

decision theory only takes logical control (so the relevant question is not whether a model is detailed enough to be conscious, but rather, whether it is detailed enough to create a logical dependence on your behavior).

Which I interpreted to be you talking about avoiding the issue of consciousness by acting as though any process logically dependent on you automatically "could be you" for the purpose of anthropics. I'll call this the Reductive Approach.

However, when I said:

Firstly, I think it is cleaner to seperate issues about whether simulations have consciousness or not from questions of decision theory

I was thinking about separating these issues, not by using the Reductive Approach, but by using what I'll call the Abstracting Approach. In this approach, you construct a theory of anthropics that is just handed a criteria of which beings are conscious and it is expected to be able to handle any such criteria.

Taking as input a list of conscious entities seems like a rather large point against a decision theory, since it makes it dependent on a theory of consciousness

Part of the confusion here is that we are using the word "depends" in different ways. When I said that the Abstracting Approach avoided creating a dependency on a theory of consciousness, I meant that if you follow this approach, you end up with a decision theory which can have any theory of consciousness just substituted in. It doesn't depend on these theories, as if you discover your theory of consciousness is wrong, you just throw in a new one and everything works.

When you talk about "depends" and say that this is a disadvantage, you mean that in order to obtain a complete theory of anthropics, you need to select a theory of consciousness to be combined with your decision theory. I think that this is actually unfair, because in the Reductive Approach, you do implicitly select a theory of consciousness, which I'll call Naive Functionalism. I'm not using this name to be pejorative, it's the best descriptor I can think of for the version of functionalism which you are using that ignores any concerns that high-level predictors might not deserve to be labelled as a consciousness.

With the Abstracting Approach I still maintain the option of assuming Naive Functionalism, in which case it collapses down to the Reductive Approach. So given these assumptions, both approaches end up being equally simple. In contrast, given any other theory of consciousness, the Reductive Approach complains that you are outside its assumptions, while the Abstracting Approach works just fine. The mistake here was attempting to compare the simplicity of two different theories directly without adjusting for them having different scopes.

"I suspect we don't actually disagree that much about whether simplicity should be a major consideration" - I'm not objecting to simplicity as a consideration. My argument is that Occams' razor is about accepting the simplest theory that is consistent with the situation. In my mind it seems like you are allowing simplicity to let you ignore the fact that your theory is inconsistent with the situation, which is not how I believe Occam's Razor is suppose to work. So it's not just about the cost, but about whether this is even a sensible way of reasoning.

Comment by chris_leong on Deconfusing Logical Counterfactuals · 2019-04-08T15:52:51.933Z · score: 2 (1 votes) · LW · GW

I kind of agree with it, but in a way that makes it trivially true. Once you have erased information to provide multiple possible raw counterfactuals, you have the choice to frame the decision problem as either choosing the best outcome or avoiding sub-optimal outcomes. But of course, this doesn't really make a difference.

It seems rather strange to talk about making an outcome inconsistent which was already inconsistent. Why is this considered an option that was available for you to choose, instead of one that was never available to choose? Consider a situation where the world and agent have both been precisely defined. Determinism means there is only one possible option, but decisions problems have multiple possible options. It is not clear which decisions that are inconsistent with what actually happened count as "could have been chosen" and which count as, "were never possible".

Actually, this relates to my post on Counterfactuals for Perfect Predictors. Talking about making your current situation inconsistent doesn't make sense literally, only analogically. After all, if you're in a situation it has to be consistent. The way that I get round this in my post is by replacing talk of decisions given a situation with talk of decisions given an input representing a situation. While you can't make your current situation inconsistent, it is sometimes possible for a program to be written such that it cannot be put in the situation representing an input as its output would be inconsistent with that. And that lets us define what we wanted to define, without having to fudge philosophically.

Comment by chris_leong on Deconfusing Logical Counterfactuals · 2019-04-08T14:04:01.895Z · score: 2 (1 votes) · LW · GW

UDT* provides a decision theory given a decision tree and a method of determining subjunctive links between choices. I'm investigating how to determine these subjunctive links, which requires understanding what kind of thing a counterfactual is and what kind of thing a decision is. The idea is that any solution should naturally integrate with UDT.

Firstly, even if this technique were limited to hand analysis, I'd be quite pleased if this turned out to be a unifying theory behind our current intuitions about how logical counterfactuals should work. Because if it were able to cover all or even just most of the cases, we'd at least know what assumptions we were implicitly making and it would provide a target for criticism. Different subtypes of forgetting might be able to be identified; it wouldn't surprised me if it turns out that the concept of a decision actually needs to be dissolved.

Secondly, even if there doesn't turn out to be a good way to figure out what information should be forgotten, I expect that figuring out different approaches would prove insightful, as would discovering why there isn't a good way to determine what to forget, if this is indeed the case.

But, to be honest, I've not spent much time thinking about how to determine what information should be forgotten. I'm still currently in the stage of trying to figure out whether this might be a useful research direction.

*Perhaps there are other updateless approaches, I don't know about them except TDT, which is generally considered inferior

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-08T13:31:42.901Z · score: 2 (1 votes) · LW · GW

For the first point, I meant that in order to consider this purely as a decision theory problem without creating a dependency on a particular theory of consciousness, you would ideally want a general theory that can deal with any criteria of consciousness (including just being handed a list of entities that count as conscious).

Regarding the second, when you update your decision algorithm, you have to update everything subjunctively dependent on you regardless of whether they are are agent or not, but that is distinct from "you could be that object".

On the third, biting the bullet isn't necessary to turn this into a decision theory problem as I mention in my response to the first point. But further, elegance alone doesn't seem to be a good reason to accept a theory. I feel I might be misunderstanding your reasoning for biting this bullet.

I haven't had time to read Jessica Taylor's theorem yet, so I have no comment on the forth.

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-07T22:09:58.160Z · score: 2 (1 votes) · LW · GW

I'm confused. This comment is saying that there isn't a strict divide between decision theory and anthropics, but I don't see how that has any relevance to the point that I raised in the comment it is responding to (that a perfect predictor need not utilise a simulation that is conscious in any sense of the word).

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-07T02:29:51.530Z · score: 4 (2 votes) · LW · GW

Anyway, this answers one of key questions: whether it is worth working on anthropics or not. I put some time into reading about it (hopefully I get time to pick up Bostrom's book again at some point), but I got discouraged when I started wondering if the work on logical counterfactuals would make this all irrelevant. Thanks for clarifying this. Anyway, why do you think the second approach is more promising?

Comment by chris_leong on Would solving logical counterfactuals solve anthropics? · 2019-04-07T02:18:52.329Z · score: 2 (1 votes) · LW · GW

"The idea is that you have to be skeptical of whether you're in a simulation" - I'm not a big fan of that framing, though I suppose it's okay if you're clear that it is an analogy. Firstly, I think it is cleaner to seperate issues about whether simulations have consciousness or not from questions of decision theory given that functionalism is quite a controversial philosophical assumption (even though it might be taken for granted at MIRI). Secondly, it seems as though that you might be able to perfectly predict someone from high level properties without simulating them sufficiently to instantiate a consciousness. Thirdly, there isn't necessarily a one-to-one relationship between "real" world runs and simulations. We only need to simulate an agent once in order to predict the result of any number of identical runs. So if the predictor only ever makes one prediction, but there's a million clones all playing Newcomb's the chance that someone in that subjective situation is the single simulation is vanishingly small.

So in so far as we talk about, "you could be in a simulation", I'd prefer to see this as a pretense or a trick or analogy.

Would solving logical counterfactuals solve anthropics?

2019-04-05T11:08:19.834Z · score: 23 (-2 votes)

Is there a difference between uncertainty over your utility function and uncertainty over outcomes?

2019-03-18T18:41:38.246Z · score: 15 (4 votes)

Deconfusing Logical Counterfactuals

2019-01-30T15:13:41.436Z · score: 24 (8 votes)

Is Agent Simulates Predictor a "fair" problem?

2019-01-24T13:18:13.745Z · score: 22 (6 votes)

Debate AI and the Decision to Release an AI

2019-01-17T14:36:53.512Z · score: 8 (3 votes)

Which approach is most promising for aligned AGI?

2019-01-08T02:19:50.278Z · score: 6 (2 votes)

On Abstract Systems

2019-01-06T23:41:52.563Z · score: 14 (8 votes)

On Disingenuity

2018-12-26T17:08:47.138Z · score: 34 (15 votes)

Best arguments against worrying about AI risk?

2018-12-23T14:57:09.905Z · score: 15 (7 votes)

What are some concrete problems about logical counterfactuals?

2018-12-16T10:20:26.618Z · score: 26 (6 votes)

An Extensive Categorisation of Infinite Paradoxes

2018-12-13T18:36:53.972Z · score: 0 (22 votes)

No option to report spam

2018-12-03T13:40:58.514Z · score: 38 (14 votes)

Summary: Surreal Decisions

2018-11-27T14:15:07.342Z · score: 27 (6 votes)

Suggestion: New material shouldn't be released too fast

2018-11-21T16:39:19.495Z · score: 24 (8 votes)

The Inspection Paradox is Everywhere

2018-11-15T10:55:43.654Z · score: 26 (7 votes)

One Doubt About Timeless Decision Theories

2018-10-22T01:39:57.302Z · score: 15 (7 votes)

Formal vs. Effective Pre-Commitment

2018-08-27T12:04:53.268Z · score: 9 (4 votes)

Decision Theory with F@#!ed-Up Reference Classes

2018-08-22T10:10:52.170Z · score: 10 (3 votes)

Logical Counterfactuals & the Cooperation Game

2018-08-14T14:00:34.032Z · score: 17 (7 votes)

A Short Note on UDT

2018-08-08T13:27:12.349Z · score: 11 (4 votes)

Counterfactuals for Perfect Predictors

2018-08-06T12:24:49.624Z · score: 13 (5 votes)

Anthropics: A Short Note on the Fission Riddle

2018-07-28T04:14:44.737Z · score: 12 (5 votes)

The Evil Genie Puzzle

2018-07-25T06:12:53.598Z · score: 21 (8 votes)

Let's Discuss Functional Decision Theory

2018-07-23T07:24:47.559Z · score: 27 (12 votes)

The Psychology Of Resolute Agents

2018-07-20T05:42:09.427Z · score: 11 (4 votes)

Newcomb's Problem In One Paragraph

2018-07-10T07:10:17.321Z · score: 8 (4 votes)

The Prediction Problem: A Variant on Newcomb's

2018-07-04T07:40:21.872Z · score: 28 (8 votes)

What is the threshold for "Hide Low Karma"?

2018-07-01T00:24:40.838Z · score: 8 (2 votes)

The Beauty and the Prince

2018-06-26T13:10:29.889Z · score: 9 (3 votes)

Anthropics: Where does Less Wrong lie?

2018-06-22T10:27:16.592Z · score: 17 (4 votes)

Sleeping Beauty Not Resolved

2018-06-19T04:46:29.204Z · score: 18 (7 votes)

In Defense of Ambiguous Problems

2018-06-17T07:40:58.551Z · score: 8 (6 votes)

Merging accounts

2018-06-16T00:45:00.460Z · score: 6 (1 votes)

Resolving the Dr Evil Problem

2018-06-10T11:56:09.549Z · score: 11 (4 votes)

Principled vs. Pragmatic Morality

2018-05-29T04:31:04.620Z · score: 22 (4 votes)

Decoupling vs Contextualising Norms

2018-05-14T22:44:51.705Z · score: 129 (39 votes)

Hypotheticals: The Direct Application Fallacy

2018-05-09T14:23:14.808Z · score: 45 (14 votes)

Rationality and Spirituality - Summary and Open Thread

2018-04-21T02:37:29.679Z · score: 41 (10 votes)

Raven Paradox Revisited

2018-04-15T00:08:01.907Z · score: 18 (4 votes)

Have you considered either a Kickstarter or a Patreon?

2018-04-13T01:09:41.401Z · score: 23 (6 votes)

On Dualities

2018-03-15T02:10:47.612Z · score: 8 (4 votes)

Welcome to Effective Altruism Sydney

2018-03-14T23:32:31.443Z · score: 11 (2 votes)

Using accounts as "group accounts"

2018-03-09T03:44:42.322Z · score: 21 (4 votes)

Monthly Meta: Common Knowledge

2018-03-03T00:05:09.488Z · score: 47 (12 votes)

Experimental Open Threads

2018-02-26T03:13:16.999Z · score: 66 (15 votes)

Clarifying the Postmodernism Debate With Skeptical Modernism

2018-02-16T09:40:30.757Z · score: 29 (13 votes)