What would you think of , much bigger, sadder, and blander?
Anyway the point of the repugnant conclusion is that any world , no matter how ideal, has a corresponding .
Synthesising divergent preferences: an example in population ethics
20190118T14:29:18.805Z · score: 13 (3 votes)The Very Repugnant Conclusion
20190118T14:26:08.083Z · score: 15 (7 votes)and different types of SIA could be used to answer different questions.
Yep. ^_^
none of the useful information comes from guessing the size of the universe, of whether we are in a simulation,
The reason I assume those is so that only the "standard" updating remain  I'm deliberately removing the anthropically weird cases.
Again, we have to be clear about the question. But if it's "what proportions of versions of me are likely to be in a large universe", then the answer is close to 1 (which is the SIA odds). Then you update on your birthrank, notice, to your great surprise, that it is sufficiently low to exist in both large and small universes, so update towards small and end up at 50:50.
So even with metapreferences, likely there are multiple ways
Yes, almost certainly. That's why I want to preserve all metapreferences, at least to some degree.
No, the event "we survived" is "we (the actual people now considering the anthropic argument and past xrisks) survived".
Over enough draws, you have .
So we update the lottery odds based on whether we win or not; we update the danger odds based on whether we live. If we die, we alas don't get to do much updating (though note that we can consider hypothetical with bets that pay out to surviving relatives, or have a chance of reviving the human race, or whatever, to get the updates we think would be correct in the worlds where we don't exist).
Another way of seeing SAI + update on yourself: weigh each universe by the expected number of exact (subjective) copies of you in them, then renormalise.
The DA, in it's SSA form (where it is rigorous) comes as a posterior adjustment to all probabilities computed in the way above  it's not an argument that doom is likely, just that doom is more likely than objective odds would imply, in a precise way that depends on future (and past) population size.
However my post shows that the SSA form does not apply to the question that people generally ask, so the DA is wrong.
Oh, what definition were you using? Anything interesting? (or do you mean before updating on your own experiences?)
There are two versions of the DA; the first is "we should roughly be in the middle", and the second is "our birth rank is less likely if there were many more humans in the future".
I was more thinking of the second case, but I've changed the post slightly to make it more compatible with the first.
Anthropics is pretty normal
20190117T13:26:22.929Z · score: 24 (8 votes)Maybe: larger reference classes make the universes more likely, but make it less likely that you would be a specific member of that reference class, so when you update on who you are in the class, the two effects cancel out.
More conceptually: in SAI, the definition of reference class commutes with restrictions on that reference class. So it doesn't matter if you take the reference class of all humans, then specialise to the ones alive today, then specialise to you; or take the reference class of all humans alive today, then specialise to you; or just take the reference class of you. SIA is, in a sense, sensible with respect to updating.
Does that help?
Solving the Doomsday argument
20190117T12:32:23.104Z · score: 9 (4 votes)The questions and classes of SSA
20190117T11:50:50.828Z · score: 10 (2 votes)You are correct, I dropped a in the proof, thanks! Put it back in, and the proof is now shorter.
In SIA, reference classes (almost) don't matter
20190117T11:29:26.131Z · score: 17 (6 votes)Where does the R(Ui) go missing? It's there in the subsequent equation.
SSA is not reference class independent. If it uses , then the SSA prob is (rather that ), which is , which is not independent of (consider doubling the size of in one world only  that makes that world less likely relative to all the others).
This is not the standard sleeping beauty paradox, as the information you are certain to get does not involve any amnesia or duplication before you get it.
Anthropic probabilities: answering different questions
20190114T18:50:56.086Z · score: 17 (6 votes)Anthropics: Full Nonindexical Conditioning (FNC) is inconsistent
20190114T15:03:04.288Z · score: 22 (5 votes)Hierarchical system preferences and subagent preferences
20190111T18:47:08.860Z · score: 19 (3 votes)Or a warning, rather than an error.
I was mainly wondering since the webpage must run the latex code and throw an error if it decides an expression is malformed (and then not show the output), whether it would be easy to see the fact there was an error?
I just use plain markdown, so it's not always clear whether the expression works or not.
Latex rendering
20190109T22:32:52.881Z · score: 10 (2 votes)No surjection onto function space for manifold X
20190109T18:07:26.157Z · score: 19 (5 votes)I like your phrasing.
What emotions would AIs need to feel?
20190108T15:09:32.424Z · score: 15 (5 votes)The "subsequent post" has been delayed for a long time because of other research avenues I need to catch up with :(
I currently agree with this view. But I'd add that a theory of human values is a direct way to solve some of the critical considerations.
Ah yes, I misread "SSA Adan and Eve" as "SSAlike ADT Adam and Eve (hence average utilitarian)".
Cheers!
And of course, one shouldn't forget that, by their own standards, SSA Adam and Eve are making a mistake.
Nope, they are doing the correct decision if they value their own pleasure in an average utilitarian way, for some reason.
Anthropic probabilities and cost functions
20181221T17:54:20.921Z · score: 16 (5 votes)Moral preferences are a specific subtype of preferences.
but our decision theory can just output betting outcomes instead of probabilities.
Indeed. And ADT outputs betting outcomes without any problems. It's when you interpret them as probabilities that you start having problems, because in order to go from betting odds to probabilities, you have to sort out how much you value two copies of you getting a reward, versus one copy.
I see it as necessary, because I don't see Anthropic probabilities as actually meaning anything.
Standard probabilities are informally "what do I expect to see", and this can be formalised as a cost function for making the wrong predictions.
In Anthropic situations, the "I" in that question is not clear  you, or you and your copies, or you and those similar to you? When you formalise this as cost function, you have to decide how to spread the cost amongst you different copies  do you spread it as a total cost, or an average one? In the first case, SIA emerges; in the second, SSA.
So you can't talk about anthropic "probabilities" without including how much you care about the cost to your copies.
Would you like to be a coauthor when (if) the whole thing gets published? You developed UDT, and this way there would be a publication partially on the subject.
It will be mentioned! ADT is basically UDT in anthropic situations (and I'm willing to say that publicly). I haven't updated the paper in a long time, as I keep on wanting to do it properly/get it published, and never have the time.
What's the best reference for UDT?
Anthropic paradoxes transposed into Anthropic Decision Theory
20181219T18:07:42.251Z · score: 19 (9 votes)Yes, that's the almost fully general counterargument: punt all the problems to the wiser versions of ourselves.
But some of these problems are issues that I specifically came up with. I don't trust that idealised nonmes would necessarily have realised these problems even if put in that idealised situation. Or they might have come up with them too late, after they had already altered themselves.
I also don't think that I'm particularly special, so other people can and will think up problems with the system that hadn't occurred to me or anyone else.
This suggests that we'd need to include a huge amount of different idealised humans in the scheme. Which, in turn, increases the chance of the scheme failing due to social dynamics, unless we design it carefully ahead of time.
So I think it is highly valuable to get a lot of people thinking about the potential flaws and improvements for the system before implementing it.
That's why I think that "punting to the wiser versions of ourselves" is useful, but not a sufficient answer. The better we can solve the key questions ("what are these 'wiser' versions?", "how is the whole setup designed?", "what questions exactly is it trying to answer?"), the better the wiser ourselves will be at their tasks.
That said, this does seem to be the value learning approach I am most optimistic about right now.
Thanks! I'm not sure I fully get all your concerns, but I'll try and answer to the best of my understanding.
14 (and a little bit of 6): this is why I started looking at semantics vs syntax. Consider the small model "If someone is drowning, I should help them (if it's an easy thing to do)". Then "someone", "downing", "I", and "help them" are vague labels for complex categories (as re most of there rest of the terms, really). The semantics of these categories need to be established before the AI can do anything. And the central examples of the categories will be clearer than the fuzzy edges. Therefore the AI can model me as having a strong preferences in the central example of the categories, which become much weaker as we move to the edges (the metapreferences will start to become very relevant in the edge cases). I expect that "I should help them" further decomposes into "they should be helped" and "I should get the credit for helping them".
Therefore, it seems to me, that an AI should be able to establish that if someone is drowning, it should try and enable me to save them, and if it can't do that, then it should save them itself (using nanotechnology or anything else). It doesn't seem that it would be seeing the issue from my narrow perspective, because I don't see the issue just from my narrow perspective.
5: I am pretty sure that we could use neuroscience to establish that, for example, people are truthful when they say that they see the anchoring bias as a bias. But I might have been a bit glib when mentioning neuroscience; that is mainly the "science fiction superpowers" end of the spectrum for the moment.
What I'm hoping, with this technique, is that if we end up using indirect normativity or stated preferences, that my keeping in mind this model of what protopreferences are, we can better automate the limitations of these techniques (eg when we expect lying), rather than putting them in by hand.
6: Currently I don't see reflexes as embodying values at all. However, people's attitudes towards their own reflexes are valid metapreferences.
Indirect normativity has specific failure mode  eg siren worlds, or social pressures going bad, or humans getting very twisted in that ideal environment in ways that we can't yet predict. More to the point, these failure modes are ones that we can talk about from outside  we can say things like "these precautions should prevent the humans from getting too twisted, but we can't fully guarantee it".
That means that we can't use indirect normativity as a definition of human values, as we already know how it could fail. A better understanding of what values are could result in being able to automate the checking as to whether it failed or not, which would me that we could include that in the definition.
EDIT: This idea is wrong: https://www.lesswrong.com/posts/eqi83c2nNSX7TFSfW/nosurjectionontofunctionspaceformanifoldx
Ok, here's an idea for constructing such a map, with a few key details left unproven; let me know if people see any immediate flaws in the approach, before I spend time filling in the holes.
Let be a countable collection of open intervals (eg ), given the usual topology. Let be the closed unit interval, and the set of continuous functions from to . Give the compactopen topology.
By the properties of the compactopen topology, since is (Tychonoff), then so is . I'm hoping that the proof can be extended, at least in this case, to show that is (normal Haussdorff).
It seems clear that is secondcountable: let consist of all functions that map into , where is the intersection of with a closed interval with rational endpoints, and is an open interval with rational endpoints. The set of all such is countable, and forms a subbasis of . A countable subbasis means a countable basis, as the set of finite subsets of countable set, is itself countable.
If is and second countable, then it is homeomorphic to a subset of the Hilbert Cube. To simplify notation, we will identify with its image in the Hilbert Cube.
Take the closure of within the Hilbert Cube. This closure is compact and second countable (since the Hilbert Cube itself is both). It seems clear that is connected and locally connected; connected will extend to the closure, we'll need to prove that locally connected does as well.
Then we can apply the HahnMazurkiewicz theorem:
 A nonempty Hausdorff topological space is a continuous image of the unit interval if and only if it is a compact, connected, locally connected secondcountable space.
So there is a continuous surjection . Pull back , defining . If is open in (with the subspace topology), then is open in . Even if is not open, we can hope that, at worst, it consists of a countable collection of points, open intervals, halfclosed intervals, and closed intervals (this is not a general property of subsets of the interval, cf the Cantor set, but it feels very likely that it will apply here).
In that case, these is a continuous surjection from to , mapping each open interval to one of the points or intervals ("folding" over the ends when mapping to those with closed endpoints).
Then is the continuous surjection we are looking for.
Note: I'm thinking now that might not be connected, but this would not be a problem as long as it has a countable number of connected components.
Oh, I don't claim to have a full definition yet, but I believe it's better than preformal. Here would be my current definition:
 Humans are partially modelbased agents. We often generate models (or at least partial models) of situations (real or hypothetical), and, within those models, label certain actions/outcomes/possibilities as better or worse than others (or sometimes just generically "good" or "bad"). This model, along with the label, is what I'd call a protopreference (or prepreference).
That's why neuroscience is relevant, for identifying the mental model human use. The "previous Alice post" I mentioned is here. and was a toy version of this, in the case of an algorithm rather than a human. The reason these get around the No Free Lunch theorem is that they look inside the algorithm (so different algorithms with the same policy can be seen to have different preferences, which breaks NFL), and is making the "normative assumption" that these modelled protopreferences correspond, (modulo preference synthesis) to the agent's actual preferences.
Note that that definition puts preferences and metapreferences into the same type, the only difference being the sort of model being considered.
Then you're going to run into a problem: protopreferences aren't identifiable.
I interpreted you as trying to fix this problem by looking at how humans infer each other's preferences...
The protopreferences are a definition of the components that make up preferences. Methods of figuring them out  be they stated preferences, revealed preferences, FMRI machines, how other people infer each other's preferences...  are just methods. The advantage of having a definition is that this guides us explicitly as to when a specific method for figuring them out, ceases to be applicable.
And I'd argue that protopreferences are identifiable. We're talking about figuring out how humans model their own situations, and the betterworse judgements they assign in their internal models. This is not unidentifiable, and neuroscience already has some things to say on it. The previous Alice post showed how you could do it a toy model (with my posts on semantics and symbol grounding, relevant to applying this approach to humans).
That second sentence of mine is somewhat poorly phrased, but I agree that "extracting the normative assumptions humans make is no easier than extracting protopreferences"  I just don't see that second one as being insoluble.
Agree that it's useful to disentangle them, but it's also useful to realise that they can't be fully disentangled... yet.
We're getting close to something important here, so I'll try and sort things out carefully.
In my current approach, I'm doing two things:

Finding some components of preferences or protopreferences within the human brain.

Synthesising them together in a way that also respects (proto)metapreferences.
The first step is needed because of the No Free Lunch in preference learning result. We need to have some definition of preferences that isn't behavioural. And the statedvaluesafterreflection approach has some specific problems that I listed here.
Then it took an initial stab at how one could sythesise the preferences in this post.
If I'm reading you correctly, your main fear is that by focusing on the protopreferences of the moment, we might end up in a terrible place, foreclosing moral improvements. I share that fear! That's why the process of synthesising values in accordance both with metapreferences and "far" preferences ("I want everyone to live happy worthwhile lives" is a perfectly valid protopreference).
Where we might differ the most, is that I'm very reluctant to throw away any protopreferences, even if our metapreferences would typically overrule it. I would prefer to keep it around, with a very low weight. Once we get in the habit of ditching protopreferences, there's no telling where that process might end up.
In the example in https://www.lesswrong.com/posts/rcXaY3FgoobMkH2jc/figuringoutwhatalicewantspartii , I give examples of two algorithms with the same outputs but where we would attribute different preferences to them. This sidesteps the impossibility result, since it allows us to consider extra information, namely the internal structure of the algorithm, in a way relevant to valuecomputing.
I think it depends on the individual. Certainly, before realising the points above, I would occasionally mentally do the "assume human values solved" in my mind, in an unrigorous and mentally misleading way.
What do you mean by "you actually have Y values"? What are you defining values to be?
Because once we have these parameters, we can learn the values of any given human. In contrast, it we learn the values of a given human, we don't get to learn the values of any other one.
I'd argue further: these parameters form part of a definition of human values. We can't just "learn human values", as these don't exist in the world. Whereas "learn what humans model each other's values (and rationality) to be" is something that makes sense in the world.
If we want to apply it to humans, something much more complicated than that, which uses some measure of how complex humans see actions, takes into account how and when we search for alternate solutions. There's a reason most models don't use bounded rationality; it ain't simple.
Corrected, thanks!
A hundred Shakespeares
20181211T23:11:48.668Z · score: 31 (12 votes)I agree it's part of the story, but only a part. And real humans don't act as if there was a set of actions of size n, and they could consider all of them with equal ease. Sometimes humans have much smaller action sets, sometimes they can produce completely unexpected actions, and most of the time we have a pretty small set of obvious actions and a much larger set of potential actions we might be able to think up at the cost of some effort.
Bounded rationality abounds in models, not explicitly defined
20181211T19:34:17.476Z · score: 12 (6 votes)Figuring out what Alice wants: nonhuman Alice
20181211T19:31:13.830Z · score: 12 (4 votes)Assuming we've solved X, could we do Y...
20181211T18:13:56.021Z · score: 34 (14 votes)Why we need a *theory* of human values
20181205T16:00:13.711Z · score: 53 (17 votes)Thanks for introducing me to the box topology  seeing it defined so explicitly, and seeing what properties it fails, cleared up a few of my intuitions.
A chess tree search algorithm would never hit upon killing other processes. An evolutionary chessplaying algorithm might learn to do that. It's not clear whether goaldirected is relevant to that distinction.
Hum, should be compact by Tychonoff's theorem (see also the Hilbert Cube, which is homeomorphic to ).
For your proof, I think that is not open in the product topology. The product topology is the coarsest topology where all the projection maps are continuous.
To make all the projection maps continuous we need all sets in to be open, where we define iff there exists an , such that is open in and .
Let be the set of finite intersection of these sets. For any , there exists a finite set such that if and for , then as well.
If we take to be the arbitrary union of , this condition will be preserved. Thus is not contained in the arbitrary unions and finite intersections of , so it seems it is not an open sent.
Also, is secondcountable. From the wikipedia article on secondcountable:
Any countable product of a secondcountable space is secondcountable
Note that this starts from the assumption of goaldirected behavior and derives that the AI will be an EU maximizer along with the other convergent instrumental subgoals.
The result is actually stronger than that, I think: if the AI is goaldirected at least in part, then that part will (tend to) purge the nongoal directed behaviours and then follow the EU path.
I wonder if we could get theorems as to what kinds of minimal goal directed behaviour will result in the agent becoming a completely goaldirected agent.
But still? A hundred Shakespeares?
I'd wager there are thousands of Shakespeareequivalents around today. The issue is that Shakespeare was not only talented, he was successful  wildly popular, and able to live off his writing. He was a superstar of theatre. And we can only have a limited amount of superstars, no matter how large the population grows. So if we took only his first few plays (before he got the fame feedback loop and money), and gave them to someone who had, somehow, never heard of Shakespeare, I'd wager they would find many other authors at least as good.
This is a mild point in favour of explanation 1, but it's not that the number of devoted researchers is limited, it's that the slots at the top of the research ladder are limited. In this view, any very talented individual who was also a superstar, will produce a huge amount of research. The number of very talented individuals has gone up, but the number of superstar slots has not.
But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider.
This is deliberate  a lot what I'm trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.
Humans can be assigned any values whatsoever…
20181105T14:26:41.337Z · score: 43 (12 votes)politicians like being able to actually change public behavior
But the ways in which they want to/can change them is strongly influenced by moral preferences among voters, donor, and civil servants. Why did they shift recycling or bring in clean air/water acts, rather than bringing in any of a million other policy changes they could have?