Comment by stuart_armstrong on The Very Repugnant Conclusion · 2019-01-18T15:50:06.961Z · score: 5 (2 votes) · LW · GW

What would you think of , much bigger, sadder, and blander?

Anyway the point of the repugnant conclusion is that any world , no matter how ideal, has a corresponding .

Synthesising divergent preferences: an example in population ethics

2019-01-18T14:29:18.805Z · score: 13 (3 votes)

The Very Repugnant Conclusion

2019-01-18T14:26:08.083Z · score: 15 (7 votes)
Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-18T13:29:47.971Z · score: 3 (2 votes) · LW · GW

and different types of SIA could be used to answer different questions.

Yep. ^_^

Comment by stuart_armstrong on Anthropics is pretty normal · 2019-01-18T08:21:28.589Z · score: 2 (1 votes) · LW · GW

none of the useful information comes from guessing the size of the universe, of whether we are in a simulation,

The reason I assume those is so that only the "standard" updating remain - I'm deliberately removing the anthropically weird cases.

Comment by stuart_armstrong on Solving the Doomsday argument · 2019-01-17T21:04:50.009Z · score: 2 (1 votes) · LW · GW

Again, we have to be clear about the question. But if it's "what proportions of versions of me are likely to be in a large universe", then the answer is close to 1 (which is the SIA odds). Then you update on your birthrank, notice, to your great surprise, that it is sufficiently low to exist in both large and small universes, so update towards small and end up at 50:50.

Comment by stuart_armstrong on Hierarchical system preferences and subagent preferences · 2019-01-17T18:27:54.642Z · score: 2 (1 votes) · LW · GW

So even with meta-preferences, likely there are multiple ways

Yes, almost certainly. That's why I want to preserve all meta-preferences, at least to some degree.

Comment by stuart_armstrong on Anthropics is pretty normal · 2019-01-17T18:23:39.356Z · score: 5 (2 votes) · LW · GW

No, the event "we survived" is "we (the actual people now considering the anthropic argument and past xrisks) survived".

Over enough draws, you have .

So we update the lottery odds based on whether we win or not; we update the danger odds based on whether we live. If we die, we alas don't get to do much updating (though note that we can consider hypothetical with bets that pay out to surviving relatives, or have a chance of reviving the human race, or whatever, to get the updates we think would be correct in the worlds where we don't exist).

Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-17T18:18:57.278Z · score: 2 (1 votes) · LW · GW

Another way of seeing SAI + update on yourself: weigh each universe by the expected number of exact (subjective) copies of you in them, then renormalise.

Comment by stuart_armstrong on Solving the Doomsday argument · 2019-01-17T16:36:46.295Z · score: 2 (1 votes) · LW · GW

The DA, in it's SSA form (where it is rigorous) comes as a posterior adjustment to all probabilities computed in the way above - it's not an argument that doom is likely, just that doom is more likely than objective odds would imply, in a precise way that depends on future (and past) population size.

However my post shows that the SSA form does not apply to the question that people generally ask, so the DA is wrong.

Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-17T16:20:44.676Z · score: 2 (1 votes) · LW · GW

Oh, what definition were you using? Anything interesting? (or do you mean before updating on your own experiences?)

Comment by stuart_armstrong on Solving the Doomsday argument · 2019-01-17T13:45:52.686Z · score: 3 (2 votes) · LW · GW

There are two versions of the DA; the first is "we should roughly be in the middle", and the second is "our birth rank is less likely if there were many more humans in the future".

I was more thinking of the second case, but I've changed the post slightly to make it more compatible with the first.

Anthropics is pretty normal

2019-01-17T13:26:22.929Z · score: 24 (8 votes)
Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-17T12:44:57.457Z · score: 5 (3 votes) · LW · GW

Maybe: larger reference classes make the universes more likely, but make it less likely that you would be a specific member of that reference class, so when you update on who you are in the class, the two effects cancel out.

More conceptually: in SAI, the definition of reference class commutes with restrictions on that reference class. So it doesn't matter if you take the reference class of all humans, then specialise to the ones alive today, then specialise to you; or take the reference class of all humans alive today, then specialise to you; or just take the reference class of you. SIA is, in a sense, sensible with respect to updating.

Does that help?

Solving the Doomsday argument

2019-01-17T12:32:23.104Z · score: 9 (4 votes)

The questions and classes of SSA

2019-01-17T11:50:50.828Z · score: 10 (2 votes)
Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-17T11:30:29.808Z · score: 4 (2 votes) · LW · GW

You are correct, I dropped a in the proof, thanks! Put it back in, and the proof is now shorter.

In SIA, reference classes (almost) don't matter

2019-01-17T11:29:26.131Z · score: 17 (6 votes)
Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-16T12:56:45.572Z · score: 2 (1 votes) · LW · GW

Where does the R(Ui) go missing? It's there in the subsequent equation.

Comment by stuart_armstrong on In SIA, reference classes (almost) don't matter · 2019-01-15T06:09:49.401Z · score: 2 (1 votes) · LW · GW

SSA is not reference class independent. If it uses , then the SSA prob is (rather that ), which is , which is not independent of (consider doubling the size of in one world only - that makes that world less likely relative to all the others).

Comment by stuart_armstrong on Anthropics: Full Non-indexical Conditioning (FNC) is inconsistent · 2019-01-15T05:58:31.097Z · score: 2 (1 votes) · LW · GW

This is not the standard sleeping beauty paradox, as the information you are certain to get does not involve any amnesia or duplication before you get it.

Anthropic probabilities: answering different questions

2019-01-14T18:50:56.086Z · score: 17 (6 votes)

Anthropics: Full Non-indexical Conditioning (FNC) is inconsistent

2019-01-14T15:03:04.288Z · score: 22 (5 votes)

Hierarchical system preferences and subagent preferences

2019-01-11T18:47:08.860Z · score: 19 (3 votes)
Comment by stuart_armstrong on Latex rendering · 2019-01-10T18:36:20.863Z · score: 2 (1 votes) · LW · GW

Or a warning, rather than an error.

Comment by stuart_armstrong on Latex rendering · 2019-01-10T15:29:05.739Z · score: 2 (1 votes) · LW · GW

I was mainly wondering since the webpage must run the latex code and throw an error if it decides an expression is malformed (and then not show the output), whether it would be easy to see the fact there was an error?

Comment by stuart_armstrong on Latex rendering · 2019-01-10T12:53:00.422Z · score: 2 (1 votes) · LW · GW

I just use plain markdown, so it's not always clear whether the expression works or not.

Latex rendering

2019-01-09T22:32:52.881Z · score: 10 (2 votes)

No surjection onto function space for manifold X

2019-01-09T18:07:26.157Z · score: 19 (5 votes)
Comment by stuart_armstrong on What emotions would AIs need to feel? · 2019-01-09T14:36:22.099Z · score: 2 (1 votes) · LW · GW

I like your phrasing.

What emotions would AIs need to feel?

2019-01-08T15:09:32.424Z · score: 15 (5 votes)
Comment by stuart_armstrong on Bridging syntax and semantics, empirically · 2018-12-31T17:02:53.014Z · score: 2 (1 votes) · LW · GW

The "subsequent post" has been delayed for a long time because of other research avenues I need to catch up with :-(

Comment by stuart_armstrong on Why we need a *theory* of human values · 2018-12-31T16:58:48.743Z · score: 3 (2 votes) · LW · GW

I currently agree with this view. But I'd add that a theory of human values is a direct way to solve some of the critical considerations.

Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-26T20:14:30.307Z · score: 2 (1 votes) · LW · GW

Ah yes, I misread "SSA Adan and Eve" as "SSA-like ADT Adam and Eve (hence average utilitarian)".

Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-23T14:09:49.293Z · score: 2 (1 votes) · LW · GW

Cheers!

And of course, one shouldn't forget that, by their own standards, SSA Adam and Eve are making a mistake.

Nope, they are doing the correct decision if they value their own pleasure in an average utilitarian way, for some reason.

Anthropic probabilities and cost functions

2018-12-21T17:54:20.921Z · score: 16 (5 votes)
Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-21T17:28:11.763Z · score: 4 (2 votes) · LW · GW

Moral preferences are a specific subtype of preferences.

Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-21T12:43:43.033Z · score: 4 (2 votes) · LW · GW

but our decision theory can just output betting outcomes instead of probabilities.

Indeed. And ADT outputs betting outcomes without any problems. It's when you interpret them as probabilities that you start having problems, because in order to go from betting odds to probabilities, you have to sort out how much you value two copies of you getting a reward, versus one copy.

Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-21T12:41:10.906Z · score: 4 (2 votes) · LW · GW

I see it as necessary, because I don't see Anthropic probabilities as actually meaning anything.

Standard probabilities are informally "what do I expect to see", and this can be formalised as a cost function for making the wrong predictions.

In Anthropic situations, the "I" in that question is not clear - you, or you and your copies, or you and those similar to you? When you formalise this as cost function, you have to decide how to spread the cost amongst you different copies - do you spread it as a total cost, or an average one? In the first case, SIA emerges; in the second, SSA.

So you can't talk about anthropic "probabilities" without including how much you care about the cost to your copies.

Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-21T12:27:40.504Z · score: 2 (1 votes) · LW · GW

Would you like to be a co-author when (if) the whole thing gets published? You developed UDT, and this way there would be a publication partially on the subject.

Comment by stuart_armstrong on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-20T15:37:48.647Z · score: 2 (1 votes) · LW · GW

It will be mentioned! ADT is basically UDT in anthropic situations (and I'm willing to say that publicly). I haven't updated the paper in a long time, as I keep on wanting to do it properly/get it published, and never have the time.

What's the best reference for UDT?

Anthropic paradoxes transposed into Anthropic Decision Theory

2018-12-19T18:07:42.251Z · score: 19 (9 votes)
Comment by stuart_armstrong on Why we need a *theory* of human values · 2018-12-18T12:19:02.233Z · score: 4 (2 votes) · LW · GW

Yes, that's the almost fully general counterargument: punt all the problems to the wiser versions of ourselves.

But some of these problems are issues that I specifically came up with. I don't trust that idealised non-mes would necessarily have realised these problems even if put in that idealised situation. Or they might have come up with them too late, after they had already altered themselves.

I also don't think that I'm particularly special, so other people can and will think up problems with the system that hadn't occurred to me or anyone else.

This suggests that we'd need to include a huge amount of different idealised humans in the scheme. Which, in turn, increases the chance of the scheme failing due to social dynamics, unless we design it carefully ahead of time.

So I think it is highly valuable to get a lot of people thinking about the potential flaws and improvements for the system before implementing it.

That's why I think that "punting to the wiser versions of ourselves" is useful, but not a sufficient answer. The better we can solve the key questions ("what are these 'wiser' versions?", "how is the whole setup designed?", "what questions exactly is it trying to answer?"), the better the wiser ourselves will be at their tasks.

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-14T15:22:32.744Z · score: 2 (1 votes) · LW · GW

That said, this does seem to be the value learning approach I am most optimistic about right now.

Thanks! I'm not sure I fully get all your concerns, but I'll try and answer to the best of my understanding.

1-4 (and a little bit of 6): this is why I started looking at semantics vs syntax. Consider the small model "If someone is drowning, I should help them (if it's an easy thing to do)". Then "someone", "downing", "I", and "help them" are vague labels for complex categories (as re most of there rest of the terms, really). The semantics of these categories need to be established before the AI can do anything. And the central examples of the categories will be clearer than the fuzzy edges. Therefore the AI can model me as having a strong preferences in the central example of the categories, which become much weaker as we move to the edges (the meta-preferences will start to become very relevant in the edge cases). I expect that "I should help them" further decomposes into "they should be helped" and "I should get the credit for helping them".

Therefore, it seems to me, that an AI should be able to establish that if someone is drowning, it should try and enable me to save them, and if it can't do that, then it should save them itself (using nanotechnology or anything else). It doesn't seem that it would be seeing the issue from my narrow perspective, because I don't see the issue just from my narrow perspective.

5: I am pretty sure that we could use neuroscience to establish that, for example, people are truthful when they say that they see the anchoring bias as a bias. But I might have been a bit glib when mentioning neuroscience; that is mainly the "science fiction superpowers" end of the spectrum for the moment.

What I'm hoping, with this technique, is that if we end up using indirect normativity or stated preferences, that my keeping in mind this model of what proto-preferences are, we can better automate the limitations of these techniques (eg when we expect lying), rather than putting them in by hand.

6: Currently I don't see reflexes as embodying values at all. However, people's attitudes towards their own reflexes are valid meta-preferences.

Comment by stuart_armstrong on Why we need a *theory* of human values · 2018-12-14T13:51:00.618Z · score: 10 (3 votes) · LW · GW

Indirect normativity has specific failure mode - eg siren worlds, or social pressures going bad, or humans getting very twisted in that ideal environment in ways that we can't yet predict. More to the point, these failure modes are ones that we can talk about from outside - we can say things like "these precautions should prevent the humans from getting too twisted, but we can't fully guarantee it".

That means that we can't use indirect normativity as a definition of human values, as we already know how it could fail. A better understanding of what values are could result in being able to automate the checking as to whether it failed or not, which would me that we could include that in the definition.

Comment by stuart_armstrong on Formal Open Problem in Decision Theory · 2018-12-13T13:59:50.989Z · score: 2 (1 votes) · LW · GW

EDIT: This idea is wrong: https://www.lesswrong.com/posts/eqi83c2nNSX7TFSfW/no-surjection-onto-function-space-for-manifold-x

Ok, here's an idea for constructing such a map, with a few key details left unproven; let me know if people see any immediate flaws in the approach, before I spend time filling in the holes.

Let be a countable collection of open intervals (eg ), given the usual topology. Let be the closed unit interval, and the set of continuous functions from to . Give the compact-open topology.

By the properties of the compact-open topology, since is (Tychonoff), then so is . I'm hoping that the proof can be extended, at least in this case, to show that is (normal Haussdorff).

It seems clear that is second-countable: let consist of all functions that map into , where is the intersection of with a closed interval with rational endpoints, and is an open interval with rational endpoints. The set of all such is countable, and forms a subbasis of . A countable subbasis means a countable basis, as the set of finite subsets of countable set, is itself countable.

If is and second countable, then it is homeomorphic to a subset of the Hilbert Cube. To simplify notation, we will identify with its image in the Hilbert Cube.

Take the closure of within the Hilbert Cube. This closure is compact and second countable (since the Hilbert Cube itself is both). It seems clear that is connected and locally connected; connected will extend to the closure, we'll need to prove that locally connected does as well.

Then we can apply the Hahn-Mazurkiewicz theorem:

  • A non-empty Hausdorff topological space is a continuous image of the unit interval if and only if it is a compact, connected, locally connected second-countable space.

So there is a continuous surjection . Pull back , defining . If is open in (with the subspace topology), then is open in . Even if is not open, we can hope that, at worst, it consists of a countable collection of points, open intervals, half-closed intervals, and closed intervals (this is not a general property of subsets of the interval, cf the Cantor set, but it feels very likely that it will apply here).

In that case, these is a continuous surjection from to , mapping each open interval to one of the points or intervals ("folding" over the ends when mapping to those with closed end-points).

Then is the continuous surjection we are looking for.

Note: I'm thinking now that might not be connected, but this would not be a problem as long as it has a countable number of connected components.

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-13T13:06:02.026Z · score: 8 (4 votes) · LW · GW

Oh, I don't claim to have a full definition yet, but I believe it's better than pre-formal. Here would be my current definition:

  • Humans are partially model-based agents. We often generate models (or at least partial models) of situations (real or hypothetical), and, within those models, label certain actions/outcomes/possibilities as better or worse than others (or sometimes just generically "good" or "bad"). This model, along with the label, is what I'd call a proto-preference (or pre-preference).

That's why neuroscience is relevant, for identifying the mental model human use. The "previous Alice post" I mentioned is here. and was a toy version of this, in the case of an algorithm rather than a human. The reason these get around the No Free Lunch theorem is that they look inside the algorithm (so different algorithms with the same policy can be seen to have different preferences, which breaks NFL), and is making the "normative assumption" that these modelled proto-preferences correspond, (modulo preference synthesis) to the agent's actual preferences.

Note that that definition puts preferences and meta-preferences into the same type, the only difference being the sort of model being considered.

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-13T06:39:24.263Z · score: 2 (1 votes) · LW · GW

Then you're going to run into a problem: proto-preferences aren't identifiable.

I interpreted you as trying to fix this problem by looking at how humans infer each other's preferences...

The proto-preferences are a definition of the components that make up preferences. Methods of figuring them out - be they stated preferences, revealed preferences, FMRI machines, how other people infer each other's preferences... - are just methods. The advantage of having a definition is that this guides us explicitly as to when a specific method for figuring them out, ceases to be applicable.

And I'd argue that proto-preferences are identifiable. We're talking about figuring out how humans model their own situations, and the better-worse judgements they assign in their internal models. This is not unidentifiable, and neuroscience already has some things to say on it. The previous Alice post showed how you could do it a toy model (with my posts on semantics and symbol grounding, relevant to applying this approach to humans).

That second sentence of mine is somewhat poorly phrased, but I agree that "extracting the normative assumptions humans make is no easier than extracting proto-preferences" - I just don't see that second one as being insoluble.

Comment by stuart_armstrong on Assuming we've solved X, could we do Y... · 2018-12-12T20:55:13.737Z · score: 2 (1 votes) · LW · GW

Agree that it's useful to disentangle them, but it's also useful to realise that they can't be fully disentangled... yet.

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-12T20:48:10.460Z · score: 3 (2 votes) · LW · GW

We're getting close to something important here, so I'll try and sort things out carefully.

In my current approach, I'm doing two things:

  1. Finding some components of preferences or proto-preferences within the human brain.

  2. Synthesising them together in a way that also respects (proto-)meta-preferences.

The first step is needed because of the No Free Lunch in preference learning result. We need to have some definition of preferences that isn't behavioural. And the stated-values-after-reflection approach has some specific problems that I listed here.

Then it took an initial stab at how one could sythesise the preferences in this post.

If I'm reading you correctly, your main fear is that by focusing on the proto-preferences of the moment, we might end up in a terrible place, foreclosing moral improvements. I share that fear! That's why the process of synthesising values in accordance both with meta-preferences and "far" preferences ("I want everyone to live happy worthwhile lives" is a perfectly valid proto-preference).

Where we might differ the most, is that I'm very reluctant to throw away any proto-preferences, even if our meta-preferences would typically overrule it. I would prefer to keep it around, with a very low weight. Once we get in the habit of ditching proto-preferences, there's no telling where that process might end up.

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-12T14:16:47.435Z · score: 3 (2 votes) · LW · GW

In the example in https://www.lesswrong.com/posts/rcXaY3FgoobMkH2jc/figuring-out-what-alice-wants-part-ii , I give examples of two algorithms with the same outputs but where we would attribute different preferences to them. This sidesteps the impossibility result, since it allows us to consider extra information, namely the internal structure of the algorithm, in a way relevant to value-computing.

Comment by stuart_armstrong on Assuming we've solved X, could we do Y... · 2018-12-12T13:48:20.533Z · score: 5 (3 votes) · LW · GW

I think it depends on the individual. Certainly, before realising the points above, I would occasionally mentally do the "assume human values solved" in my mind, in an unrigorous and mentally misleading way.

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-12T00:41:38.688Z · score: 2 (1 votes) · LW · GW

What do you mean by "you actually have Y values"? What are you defining values to be?

Comment by stuart_armstrong on Figuring out what Alice wants: non-human Alice · 2018-12-12T00:19:07.135Z · score: 2 (1 votes) · LW · GW

Because once we have these parameters, we can learn the values of any given human. In contrast, it we learn the values of a given human, we don't get to learn the values of any other one.

I'd argue further: these parameters form part of a definition of human values. We can't just "learn human values", as these don't exist in the world. Whereas "learn what humans model each other's values (and rationality) to be" is something that makes sense in the world.

Comment by stuart_armstrong on Bounded rationality abounds in models, not explicitly defined · 2018-12-12T00:07:36.274Z · score: 3 (2 votes) · LW · GW

If we want to apply it to humans, something much more complicated than that, which uses some measure of how complex humans see actions, takes into account how and when we search for alternate solutions. There's a reason most models don't use bounded rationality; it ain't simple.

Comment by stuart_armstrong on A hundred Shakespeares · 2018-12-11T23:13:25.465Z · score: 3 (2 votes) · LW · GW

Corrected, thanks!

A hundred Shakespeares

2018-12-11T23:11:48.668Z · score: 31 (12 votes)
Comment by stuart_armstrong on Bounded rationality abounds in models, not explicitly defined · 2018-12-11T22:42:13.442Z · score: 5 (3 votes) · LW · GW

I agree it's part of the story, but only a part. And real humans don't act as if there was a set of actions of size n, and they could consider all of them with equal ease. Sometimes humans have much smaller action sets, sometimes they can produce completely unexpected actions, and most of the time we have a pretty small set of obvious actions and a much larger set of potential actions we might be able to think up at the cost of some effort.

Bounded rationality abounds in models, not explicitly defined

2018-12-11T19:34:17.476Z · score: 12 (6 votes)

Figuring out what Alice wants: non-human Alice

2018-12-11T19:31:13.830Z · score: 12 (4 votes)

Assuming we've solved X, could we do Y...

2018-12-11T18:13:56.021Z · score: 34 (14 votes)

Why we need a *theory* of human values

2018-12-05T16:00:13.711Z · score: 53 (17 votes)
Comment by stuart_armstrong on Formal Open Problem in Decision Theory · 2018-12-05T15:56:17.583Z · score: 2 (1 votes) · LW · GW

Thanks for introducing me to the box topology - seeing it defined so explicitly, and seeing what properties it fails, cleared up a few of my intuitions.

Comment by stuart_armstrong on Coherence arguments do not imply goal-directed behavior · 2018-12-04T14:54:49.848Z · score: 2 (3 votes) · LW · GW

A chess tree search algorithm would never hit upon killing other processes. An evolutionary chess-playing algorithm might learn to do that. It's not clear whether goal-directed is relevant to that distinction.

Comment by stuart_armstrong on Formal Open Problem in Decision Theory · 2018-12-03T20:56:36.238Z · score: 7 (3 votes) · LW · GW

Hum, should be compact by Tychonoff's theorem (see also the Hilbert Cube, which is homeomorphic to ).

For your proof, I think that is not open in the product topology. The product topology is the coarsest topology where all the projection maps are continuous.

To make all the projection maps continuous we need all sets in to be open, where we define iff there exists an , such that is open in and .

Let be the set of finite intersection of these sets. For any , there exists a finite set such that if and for , then as well.

If we take to be the arbitrary union of , this condition will be preserved. Thus is not contained in the arbitrary unions and finite intersections of , so it seems it is not an open sent.

Also, is second-countable. From the wikipedia article on second-countable:

Any countable product of a second-countable space is second-countable

Comment by stuart_armstrong on Coherence arguments do not imply goal-directed behavior · 2018-12-03T20:43:06.508Z · score: 12 (6 votes) · LW · GW

Note that this starts from the assumption of goal-directed behavior and derives that the AI will be an EU maximizer along with the other convergent instrumental subgoals.

The result is actually stronger than that, I think: if the AI is goal-directed at least in part, then that part will (tend to) purge the non-goal directed behaviours and then follow the EU path.

I wonder if we could get theorems as to what kinds of minimal goal directed behaviour will result in the agent becoming a completely goal-directed agent.

Comment by stuart_armstrong on Is Science Slowing Down? · 2018-12-01T18:00:51.085Z · score: 21 (7 votes) · LW · GW

But still? A hundred Shakespeares?

I'd wager there are thousands of Shakespeare-equivalents around today. The issue is that Shakespeare was not only talented, he was successful - wildly popular, and able to live off his writing. He was a superstar of theatre. And we can only have a limited amount of superstars, no matter how large the population grows. So if we took only his first few plays (before he got the fame feedback loop and money), and gave them to someone who had, somehow, never heard of Shakespeare, I'd wager they would find many other authors at least as good.

This is a mild point in favour of explanation 1, but it's not that the number of devoted researchers is limited, it's that the slots at the top of the research ladder are limited. In this view, any very talented individual who was also a superstar, will produce a huge amount of research. The number of very talented individuals has gone up, but the number of superstar slots has not.

Comment by stuart_armstrong on Web of connotations: Bleggs, Rubes, thermostats and beliefs · 2018-11-28T11:45:08.075Z · score: 2 (1 votes) · LW · GW

But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider.

This is deliberate - a lot what I'm trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.

Humans can be assigned any values whatsoever…

2018-11-05T14:26:41.337Z · score: 43 (12 votes)
Comment by stuart_armstrong on Policy Beats Morality · 2018-10-17T20:42:03.865Z · score: 4 (3 votes) · LW · GW

politicians like being able to actually change public behavior

But the ways in which they want to/can change them is strongly influenced by moral preferences among voters, donor, and civil servants. Why did they shift recycling or bring in clean air/water acts, rather than bringing in any of a million other policy changes they could have?

Standard ML Oracles vs Counterfactual ones

2018-10-10T20:01:13.765Z · score: 15 (5 votes)

Wireheading as a potential problem with the new impact measure

2018-09-25T14:15:37.911Z · score: 25 (8 votes)

Bridging syntax and semantics with Quine's Gavagai

2018-09-24T14:39:55.981Z · score: 20 (7 votes)

Bridging syntax and semantics, empirically

2018-09-19T16:48:32.436Z · score: 18 (5 votes)

Web of connotations: Bleggs, Rubes, thermostats and beliefs

2018-09-19T16:47:39.673Z · score: 20 (9 votes)

Are you in a Boltzmann simulation?

2018-09-13T12:56:08.283Z · score: 19 (10 votes)

Petrov corrigibility

2018-09-11T13:50:51.167Z · score: 21 (9 votes)

Boltzmann brain decision theory

2018-09-11T13:24:30.016Z · score: 10 (5 votes)

Disagreement with Paul: alignment induction

2018-09-10T13:54:09.844Z · score: 33 (12 votes)

Corrigibility doesn't always have a good action to take

2018-08-28T20:30:12.302Z · score: 19 (6 votes)

Using expected utility for Good(hart)

2018-08-27T03:32:51.059Z · score: 39 (15 votes)

Figuring out what Alice wants, part II

2018-07-17T13:59:40.722Z · score: 15 (5 votes)

Figuring out what Alice wants, part I

2018-07-17T13:59:35.395Z · score: 16 (5 votes)

Intertheoretic utility comparison

2018-07-03T13:44:18.498Z · score: 23 (8 votes)

Anthropics and Fermi

2018-06-20T13:04:59.080Z · score: 28 (7 votes)

Duplication versus probability

2018-06-20T12:18:29.459Z · score: 29 (11 votes)

Paradoxes in all anthropic probabilities

2018-06-19T15:31:17.177Z · score: 20 (8 votes)

Anthropics made easy?

2018-06-14T00:56:50.555Z · score: 32 (16 votes)

Poker example: (not) deducing someone's preferences

2018-06-08T03:19:38.144Z · score: 18 (6 votes)

Rigging is a form of wireheading

2018-05-03T12:50:50.220Z · score: 26 (8 votes)

Reward function learning: the value function

2018-04-24T16:29:32.971Z · score: 30 (7 votes)

The limits of corrigibility

2018-04-10T10:49:11.579Z · score: 39 (12 votes)

Resolving human values, completely and adequately

2018-03-30T03:35:03.502Z · score: 50 (11 votes)

Problems with Amplification/Distillation

2018-03-27T11:12:36.749Z · score: 64 (17 votes)

Using lying to detect human values

2018-03-15T11:37:05.408Z · score: 48 (19 votes)

Why we want unbiased learning processes

2018-02-20T14:48:48.885Z · score: 37 (9 votes)

The different types (not sizes!) of infinity

2018-01-28T11:14:00.000Z · score: 104 (39 votes)

Have you felt exiert yet?

2018-01-05T17:03:51.029Z · score: 66 (20 votes)

What would convince you you'd won the lottery?

2017-10-10T13:45:44.996Z · score: 55 (26 votes)

Toy model of the AI control problem: animated version

2017-10-10T11:06:41.518Z · score: 44 (18 votes)