## Intertheoretic utility comparison: examples

2019-07-17T12:39:45.147Z · score: 9 (1 votes)
Comment by stuart_armstrong on Indifference: multiple changes, multiple agents · 2019-07-14T22:14:44.088Z · score: 4 (2 votes) · LW · GW

but if we have a solution that fixes the way they exploit interruption

? Doesn't the design above do that?

Comment by stuart_armstrong on AI Alignment Problem: “Human Values” don’t Actually Exist · 2019-07-10T16:13:53.146Z · score: 2 (1 votes) · LW · GW

The human might have some taste preferences that will determine between tea and coffee, general hedonism preferences that might also work, and meta-preferences about how they should deal with future choices.

Part of the research agenda - "grounding symbols" - about trying to determine where these models are located.

Comment by stuart_armstrong on Indifference: multiple changes, multiple agents · 2019-07-10T11:09:37.557Z · score: 7 (2 votes) · LW · GW

I don't think this is a Goodhart-style effect. Standard indifference is a very carefully constructed effect, and it does exactly what it is designed for: making the agents indifferent to their individual interruptions. It turns out this doesn't make them indifferent to the interruptions of other agents, which is annoying but not really surprising.

It's not Goodhart, it's just that mutual indifference has to be specifically designed for.

Comment by stuart_armstrong on AI Alignment Problem: “Human Values” don’t Actually Exist · 2019-07-09T19:43:21.357Z · score: 4 (2 votes) · LW · GW

Thanks! For the M1 vs M2, I agree these could reach different outcomes - but would either one be dramatically wrong? There are many "free variables" in the process, aiming to be ok.

I'll work on learning partial preferences.

"Just brink me tee, without killing my cat and tilling universe with teapots." [...] and underfdefined – at least based on my self-observation. Thus again I would prefer collectively codified human norm (laws) over extrapolated model of my utility function.

It might be underdefined in some sort of general sense - I understand the feeling, I sometimes get it too. But in practice, it seems like it should ground out to "obey human orders about tea, or do something that is strongly preferred to that by the human". Humans like their orders being obeyed, and presumably like getting what they're ordering for; so to disobey that, you'd need to be very sure that there's a clearly better option for the human.

Of course, it might end up having a sexy server serve pleasantly drugged tea ^_^

Comment by stuart_armstrong on AI Alignment Problem: “Human Values” don’t Actually Exist · 2019-07-09T16:38:38.896Z · score: 4 (2 votes) · LW · GW

Hey there!

Wondering how you felt my research agenda addressed, or failed to address, many of these points: https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into

I have my own opinions on these, but interested in yours.

## Indifference: multiple changes, multiple agents

2019-07-08T13:36:42.095Z · score: 16 (3 votes)
Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-08T11:58:59.656Z · score: 3 (2 votes) · LW · GW Please add "submission" at the top of the post. 1. is insufficiently detailed - can you explain what is going on, how the Oracles are rewarded, what happens when the message is read/not read, and so on. Same for 5. 2. seems potentially very interesting. Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-08T11:53:15.664Z · score: 3 (2 votes) · LW · GW

Thanks!

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-08T11:51:12.493Z · score: 2 (1 votes) · LW · GW "Can't generally be used"; if you understand the setup and are careful, you might be able to do so. Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-04T13:12:21.604Z · score: 2 (1 votes) · LW · GW

Can you develop this model in more detail?

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-04T11:44:15.648Z · score: 4 (2 votes) · LW · GW Thanks! Seems potentially promising. Will analyse this more properly later. Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-03T11:54:39.990Z · score: 2 (1 votes) · LW · GW

Here is assumed that if the proof is true and is in a formal language, there is no hidden messages in the text.

That is never something safe to assume. I can write formally correct proofs that contain hidden messages quite easily - add extra lemmas and extra steps. Unless we're very smart, it would be hard for us to detect which steps are unnecessary and which are needed, especially if it rewrites the main proof thread somewhat.

Another way to check proofs is to run two slightly different non-communicating Oracles and compare outputs.

I'll accept that as a part of a submission if a) you develop it more, in a formal way, and b) you repost it as a top level comment.

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T11:52:42.321Z · score: 5 (3 votes) · LW · GW This is against the technical definition of low bandwidth (small space of possible solutions), but somewhat in the spirit (low bandwidth for solutions humans get to see). I'll accept it as a submission. Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-03T10:10:03.480Z · score: 3 (2 votes) · LW · GW

Of course this has the problem of maximizing for apparent insight rather than actual insight.

Until we can measure actual insight, this will remain a problem ^_^

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T10:08:39.345Z · score: 2 (1 votes) · LW · GW The ideal solution would have huge positive impacts and complete safety, under minimal assumptions. More realistically, there will be a tradeoff between assumptions and impact. I'm not suggesting any area for people to focus their efforts, because a very effective approach with minimal assumptions might win, or a fantastically effective approach under stronger assumptions. It's hard to tell in advance what will be the most useful. Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-02T14:11:00.037Z · score: 2 (1 votes) · LW · GW

See the edit, and make sure you "decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can't generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point."

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T14:10:42.880Z · score: 2 (1 votes) · LW · GW See the edit, and make sure you "decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can't generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point." Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-02T14:09:48.234Z · score: 2 (1 votes) · LW · GW

See the edit (especially for your first suggestion): "decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can't generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point."

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T14:06:31.787Z · score: 2 (1 votes) · LW · GW See the edit: "decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can't generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point." Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-02T14:05:29.404Z · score: 2 (1 votes) · LW · GW

Corrected, thanks!

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T10:39:27.926Z · score: 2 (1 votes) · LW · GW For the low bandwidth Oracle, you need to give it the options. In the case of the counterfactual Oracle, if you don't see the list, how do you reward it? Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-02T10:33:18.685Z · score: 2 (1 votes) · LW · GW

Assume either way, depending on what your suggestion is for.

Comment by stuart_armstrong on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T10:32:44.566Z · score: 2 (1 votes) · LW · GW Yes. Comment by stuart_armstrong on An Increasingly Manipulative Newsfeed · 2019-07-02T10:21:05.988Z · score: 4 (3 votes) · LW · GW Also also, have we considered that we're selecting for deception, if we're looking for it and terminating AIs we find deceptive, while nurturing those we don't detect? Yes. That's a general problem (see the footnote above for a variant of it). Comment by stuart_armstrong on Research Agenda in reverse: what *would* a solution look like? · 2019-07-01T15:52:48.838Z · score: 5 (3 votes) · LW · GW Do you think that AI system doesn't "know" what humans would want, even if it doesn't optimize for it? I think the AI would not know that, because "what humans would want" is not defined. "What humans say they want", "what, upon reflection, humans would agree they want...", etc can be done, but "what humans want" is not a defined things about the world or about humans - without extra assumptions (which cannot be deduced from observation). Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-01T15:50:27.913Z · score: 7 (3 votes) · LW · GW

None of these questions can be asked to the low bandwidth Oracle (you need a list of answers); it might be possible to ask them to the counterfactual Oracle, after some modification, but they would be highly dangerous if you allow unrestricted outputs.

## Contest: $1,000 for good questions to ask to an Oracle AI 2019-07-01T14:39:25.170Z · score: 51 (20 votes) Comment by stuart_armstrong on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-01T14:30:40.243Z · score: 5 (2 votes) · LW · GW

Here are some submission examples (note I'm not saying they are good examples, just well formatted; the Edits in the third submission are deliberate):

• "Submission. For the counterfactual Oracle, ask the Oracle what Google's stock price will be next month (counterfactually if we didn't see the Oracle's answer). In that case, loss function computed as ||predicted price - actual price after erasure||^2. If we don't see the answer, the programmers are assumed to not ask the question again for a month, neither to this Oracle nor to any other. This will demonstrate the true value of Google, and can ultimately be used to remove noise from the stock market.
• "Submission: low-bandwidth oracle. Give it a list of a thousand companies, and ask which one will most increase in value in percentage terms over the week. At the end of the month, rank the companies by percentage increase. Loss function is rank of the company the oracle selected. Programmer will try to invest in selected company, but will do it discreetly. This will help to gather resources for AI safety research.
• "Submission. Ask the low bandwidth Oracle which of my friends will surprise me most this fortnight. It choose from a list of friends; I'll decide which one surprises me most. Loss function is if it choose the wrong friend, if it choose the right one. This will help me figure out myself and my social circle, and better focus on AI safety. The risk is low because none of my friends are particularly important, positively or negatively, to the world. EDIT: To be clear, I also want to use this to figure out what the word "surprise" means to me, and what the AI predicts it will mean to me. EDIT 2: People have pointed out that it might be dangerous to have the AI construct my own meanings for categories, but it only has three bits or so of optimisation power (I don't have that many friends :-(, so it's mainly me thinking this through, not the AI manipulating me.
Comment by stuart_armstrong on Research Agenda in reverse: what *would* a solution look like? · 2019-07-01T09:08:20.396Z · score: 7 (3 votes) · LW · GW

so all we need to do is ensure that it is incentivized to figure out and satisfy our preferences, and then it will do the rest.

That's actually what I'm aiming at with the research agenda, but the Occam's razor argument shows that this itself is highly non-trivial, and we need some strong grounding of the definition of preference.

Comment by stuart_armstrong on Apocalypse, corrupted · 2019-06-28T09:59:58.863Z · score: 2 (1 votes) · LW · GW

I should also note there's a difference between competence rewarded, and incompetence punished. I suspect the second happens a lot more than the first.

Comment by stuart_armstrong on Apocalypse, corrupted · 2019-06-28T09:59:40.481Z · score: 2 (1 votes) · LW · GW

I should also note there's a difference between competence rewarded, and incompetence punished. I suspect the second happens a lot more than the first.

## Self-confirming prophecies, and simplified Oracle designs

2019-06-28T09:57:35.571Z · score: 6 (3 votes)
Comment by stuart_armstrong on Apocalypse, corrupted · 2019-06-27T16:02:06.313Z · score: 2 (1 votes) · LW · GW

How sure are you that hunter-gatherers are much closer to the edge than the typical person in our society?

Very sure; compare death rates.

A better comparison might be people in cold / food-scarce vs warm / food-abundant areas.

Surely abundance of food is relative to population size?

Maybe we could try and estimate how objectively hard it is for certain groups to survive, and then try and work back to reward for competence from that?

Comment by stuart_armstrong on Apocalypse, corrupted · 2019-06-27T13:11:11.690Z · score: 3 (2 votes) · LW · GW

Disasters do promote more pro-social behaviours, and more personal, non-bureaucratic interactions between people (it also promotes more people "acting cooperatively", but actual cooperation seems much more efficient with laws and markets than with good will). But it's precisely that that also promotes corruption and nepotism. Running more informal organisation is almost all about personal politics. And cooperation within the tribe often goes along with antagonism towards the outside.

Comment by stuart_armstrong on Apocalypse, corrupted · 2019-06-27T11:34:41.458Z · score: 2 (1 votes) · LW · GW

See anti-social punishment here: https://www.lesswrong.com/posts/X5RyaEDHNq5qutSHK/anti-social-punishment - cooperators were punished for cooperating!

Charm means the person has allies, and would be dangerous to cross. Competence might mean that they are likely to make a power play (see prestige vs dominance hierarchies).

There are hunter gatherers who punish those of low status who give away too much meat. Because these people are obviously trying to wow people and make a play at raising their status.

Comment by stuart_armstrong on To first order, moral realism and moral anti-realism are the same thing · 2019-06-27T11:30:37.401Z · score: 2 (1 votes) · LW · GW

Just my impression based on discussing the issue with some moral realists/non-realists.

Comment by stuart_armstrong on Research Agenda in reverse: what *would* a solution look like? · 2019-06-27T11:29:00.661Z · score: 2 (1 votes) · LW · GW

Yep ^_^ I make those points in the research agenda (section 3).

Comment by stuart_armstrong on Research Agenda in reverse: what *would* a solution look like? · 2019-06-27T11:28:17.853Z · score: 2 (1 votes) · LW · GW

To be precise: I argue low impact is intractable without learning a subset of human values; the full set is not needed.

## Apocalypse, corrupted

2019-06-26T13:46:05.548Z · score: 19 (11 votes)

## Research Agenda in reverse: what *would* a solution look like?

2019-06-25T13:52:48.934Z · score: 36 (14 votes)
Comment by stuart_armstrong on Upcoming stability of values · 2019-06-25T12:27:30.130Z · score: 2 (1 votes) · LW · GW

Yep. But I note that many people seem to value letting their values drift somewhat, so that needs to be taken into account.

Comment by stuart_armstrong on To first order, moral realism and moral anti-realism are the same thing · 2019-06-22T08:37:51.267Z · score: 2 (1 votes) · LW · GW

What I meant was this: assume that 3^^^3 dust specks on one person is worse that 50 years of torture. As long as the dust specks sensation is somewhat additive, that should be true. Now suppose you have to choose between dust specks and torture 3^^^3 times, one for each person ("so, do we torture individual 27602, or one dust speck on everyone? Now, same question for 27602....).

Then always choosing dust specks is worse, for everyone, than always choosing torture.

So the dust-speck decision becomes worse and worse, the more often you expect to encounter it.

Comment by stuart_armstrong on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-22T08:36:53.535Z · score: 2 (1 votes) · LW · GW

"humans don't have actual preferences so the AI is just going to try to learn something adequate."

Try something like: humans don't have actual consistent preferences, so the AI is going to try and find a good approximation that covers all the contradictions and uncertainties in human preferences.

Comment by stuart_armstrong on Upcoming stability of values · 2019-06-22T08:31:20.986Z · score: 4 (2 votes) · LW · GW

Information can (and should) change your behaviour, even if it doesn't change your values. Becoming a parent should change your attitude to various things whose purpose you didn't see till then! And values can prefer a variety of experiences, if we cash our boredom properly.

The problem is that humans mix information and values together in highly complicated, non-rational ways.

Comment by stuart_armstrong on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-19T14:16:41.023Z · score: 2 (1 votes) · LW · GW

I'd say that this problem doesn't belong in section 2.3-2.4 (collecting and generalising preferences), but in section 1.2 (symbol grounding, and especially the web of connotations). That's where these questions should be solved, in my view.

So yeah, I agree that standard machine learning is not up to the task yet, at all.

(as a minor aside, I'm also a bit unsure how necessary it is to make partial preferences total before combining them; this may be unnecessary)

Comment by stuart_armstrong on For the past, in some ways only, we are moral degenerates · 2019-06-19T14:12:45.789Z · score: 2 (1 votes) · LW · GW

Do these seem like things that could be "put in as a strong conditional meta-preference" in your framework?

Yes, very easily.

The main issue is whether these should count as an overwhelming meta-preference - one that over-weights all other considerations. And, currently as I have things set up, the answer is no. I have no doubt that you feel strongly about potentially true moral realism. But I'm certain that this strong feeling is not absurdly strong compared to other preferences at other moments in your life. So if we synthesised your current preferences, and 1. or 2. ended up being true, then the moral realism would end up playing a large-but-not-dominating role in your moral preferences.

I wouldn't want to change that, because what I'm aiming for is an accurate synthesis of your current preferences, and your current preference for moral-realism-if-it's-true is not, in practice, dominating your preferences. If you wanted to ensure the potential dominance of moral realism, you'd have to put that directly into the synthesis process, as a global meta-preference (section 2.8 of the research agenda).

But the whole discussion feels a bit peculiar, to me. One property of moral realism that is often assumed, is that it is, in some sense, ultimately convincing - that all systems of morality (or all systems derived from humans) will converge to it. Yet when I said a "large-but-not-dominating role in your moral preferences", I'm positing that moral realism is true, but that we have a system of morality - - that does not converge to it. I'm not really grasping how this could be possible (you could argue that the moral realism is some sort of acausal trade convergent function, but that gives an instrumental reason to follow , not an actual reason to have ; and I know that a moral system need not be a utility function ^_^).

So yes, I'm a bit confused by true-but-not-convincing moral realisms.

Comment by stuart_armstrong on Partial preferences and models · 2019-06-19T14:08:05.852Z · score: 2 (1 votes) · LW · GW

This differs, because the z are assumed to be in a "standard" range. There are situations where extreme values of z, if known and reflected upon, would change the sign of the decision (for example, what if your decision is being filmed, and there are billions being bet upon your ultimate choice, by various moral and immoral groups?).

But yeah, if you assume that the z are in that standard range, then this looks a lot like considering just a few nodes of a causal net.

Comment by stuart_armstrong on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-19T13:58:43.060Z · score: 4 (2 votes) · LW · GW

One of the reasons I refer to synthesising (or constructing) the , not learning it.

Comment by stuart_armstrong on Reason isn't magic · 2019-06-18T12:01:53.046Z · score: 4 (2 votes) · LW · GW

Post was good, but I'd recommend adding an introductory paragraph to the link on LessWrong.

Comment by stuart_armstrong on For the past, in some ways only, we are moral degenerates · 2019-06-18T07:36:41.223Z · score: 2 (1 votes) · LW · GW

What scares me is the possibility that moral anti-realism is false, but we build an AI under the assumption that it's true

One way of dealing with this, in part, is to figure out what would convince you that moral realism was true, and put that in as a strong conditional meta-preference.

Comment by stuart_armstrong on One-step hypothetical preferences · 2019-06-17T18:21:08.752Z · score: 2 (1 votes) · LW · GW

Valid point, though conditional meta-preferences are things I've already written about, and the issue of being wrong now about what your own preferences would be in the future, is also something I've addressed multiple times in different forms. Your example is particularly crisp, though.

Comment by stuart_armstrong on For the past, in some ways only, we are moral degenerates · 2019-06-17T18:19:00.226Z · score: 2 (1 votes) · LW · GW

I was talking about "extended family values" in the sense of "it is good for families to stick together and spend time with each other"; this preference can (and often does) apply to other families as well. I see no analogue for that with slavery.

But yeah, you could argue that racism can be a terminal value, and that slave owners would develop it, as a justification for what might have started as an instrumental value.

## Research Agenda v0.9: Synthesising a human's preferences into a utility function

2019-06-17T17:46:39.317Z · score: 51 (13 votes)
Comment by stuart_armstrong on One-step hypothetical preferences · 2019-06-17T15:35:33.000Z · score: 4 (2 votes) · LW · GW

I see the orange-apple preference reversal as another example of conditional preferences.

## Preference conditional on circumstances and past preference satisfaction

2019-06-17T15:30:32.580Z · score: 11 (2 votes)
Comment by stuart_armstrong on Humans can be assigned any values whatsoever… · 2019-06-17T12:10:16.872Z · score: 5 (3 votes) · LW · GW

Yep, basically that. ^_^

Comment by stuart_armstrong on For the past, in some ways only, we are moral degenerates · 2019-06-17T12:08:23.304Z · score: 2 (1 votes) · LW · GW

I realize that you did explicitly define "degeneration = moving away from ours" for the drifting values, but it feels weird to then also define "progress = moving away from ours in a good way"

If I decomposed a bit more, I'd say that we need to distinguish the values of others, and the state of the world, and whether things are moving towards our values, away from our values, or just drifting.

So "progress", in the sense of my post, is composed of a) other people's values moving towards our own, and b) the state of the world moving more towards our own preferences/values. "Moral degeneration", on the other hand, is c) people's values drift away from our own.

I see all three of these happening at once (along with, to some extent "the state of the world moving away from our values", which is another category), so that's why we see both progress and degeneration in the future.

## For the past, in some ways only, we are moral degenerates

2019-06-07T15:57:10.962Z · score: 29 (9 votes)

## To first order, moral realism and moral anti-realism are the same thing

2019-06-03T15:04:56.363Z · score: 17 (4 votes)

## Conditional meta-preferences

2019-06-03T14:09:54.357Z · score: 6 (3 votes)

## Uncertainty versus fuzziness versus extrapolation desiderata

2019-05-30T13:52:16.831Z · score: 20 (5 votes)

## And the AI would have got away with it too, if...

2019-05-22T21:35:35.543Z · score: 72 (27 votes)

## By default, avoid ambiguous distant situations

2019-05-21T14:48:15.453Z · score: 31 (8 votes)

## Oracles, sequence predictors, and self-confirming predictions

2019-05-03T14:09:31.702Z · score: 21 (7 votes)

## Self-confirming predictions can be arbitrarily bad

2019-05-03T11:34:47.441Z · score: 45 (17 votes)

## Nash equilibriums can be arbitrarily bad

2019-05-01T14:58:21.765Z · score: 35 (15 votes)

## Defeating Goodhart and the "closest unblocked strategy" problem

2019-04-03T14:46:41.936Z · score: 41 (12 votes)

## Learning "known" information when the information is not actually known

2019-04-01T17:56:17.719Z · score: 15 (5 votes)

## Relative exchange rate between preferences

2019-03-29T11:46:35.285Z · score: 12 (3 votes)

## Being wrong in ethics

2019-03-29T11:28:55.436Z · score: 22 (5 votes)

## Models of preferences in distant situations

2019-03-29T10:42:14.633Z · score: 11 (2 votes)

## The low cost of human preference incoherence

2019-03-27T11:58:14.845Z · score: 21 (8 votes)

## "Moral" as a preference label

2019-03-26T10:30:17.102Z · score: 16 (5 votes)

## Partial preferences and models

2019-03-19T16:29:23.162Z · score: 13 (3 votes)

## Combining individual preference utility functions

2019-03-14T14:14:38.772Z · score: 12 (4 votes)

## Mysteries, identity, and preferences over non-rewards

2019-03-14T13:52:40.170Z · score: 14 (4 votes)

## A theory of human values

2019-03-13T15:22:44.845Z · score: 29 (8 votes)

## Example population ethics: ordered discounted utility

2019-03-11T16:10:43.458Z · score: 14 (5 votes)

## Smoothmin and personal identity

2019-03-08T15:16:28.980Z · score: 20 (10 votes)

## Preferences in subpieces of hierarchical systems

2019-03-06T15:18:21.003Z · score: 11 (3 votes)

## mAIry's room: AI reasoning to solve philosophical problems

2019-03-05T20:24:13.056Z · score: 64 (21 votes)

## Simplified preferences needed; simplified preferences sufficient

2019-03-05T19:39:55.000Z · score: 31 (12 votes)

## Finding the variables

2019-03-04T19:37:54.696Z · score: 30 (7 votes)

## Syntax vs semantics: alarm better example than thermostat

2019-03-04T12:43:58.280Z · score: 14 (4 votes)

## Decelerating: laser vs gun vs rocket

2019-02-18T23:21:46.294Z · score: 22 (6 votes)

## Humans interpreting humans

2019-02-13T19:03:52.067Z · score: 12 (3 votes)

## Anchoring vs Taste: a model

2019-02-13T19:03:08.851Z · score: 11 (2 votes)

## Would I think for ten thousand years?

2019-02-11T19:37:53.591Z · score: 27 (10 votes)

## "Normative assumptions" need not be complex

2019-02-11T19:03:38.493Z · score: 11 (3 votes)

## Wireheading is in the eye of the beholder

2019-01-30T18:23:07.143Z · score: 25 (10 votes)

## Can there be an indescribable hellworld?

2019-01-29T15:00:54.481Z · score: 19 (8 votes)

## How much can value learning be disentangled?

2019-01-29T14:17:00.601Z · score: 24 (7 votes)

## A small example of one-step hypotheticals

2019-01-28T16:12:02.722Z · score: 14 (5 votes)

## One-step hypothetical preferences

2019-01-23T15:14:52.063Z · score: 10 (6 votes)

## Synthesising divergent preferences: an example in population ethics

2019-01-18T14:29:18.805Z · score: 13 (3 votes)

## The Very Repugnant Conclusion

2019-01-18T14:26:08.083Z · score: 27 (15 votes)

## Anthropics is pretty normal

2019-01-17T13:26:22.929Z · score: 30 (12 votes)

## Solving the Doomsday argument

2019-01-17T12:32:23.104Z · score: 12 (6 votes)

## The questions and classes of SSA

2019-01-17T11:50:50.828Z · score: 11 (3 votes)