Posts

Moral Anti-Epistemology 2015-04-24T03:30:27.972Z · score: 2 (8 votes)
Arguments Against Speciesism 2013-07-28T18:24:58.354Z · score: 34 (58 votes)

Comments

Comment by lukas_gloor on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-21T09:05:43.789Z · score: 1 (1 votes) · LW · GW
It seems to me that rationality is extremely fragile and vulnerable, such that even though rationality might serves other goals, you have to be very uncompromising with regards to rationality, especially core things like hiding information from yourself (I was lightly opposed to the negative karma hiding myself) even if it that has appararant costs.

I agree with that. But people can have very different psychologies. Most people are prone to overconfidence, but some people are underconfident and beat themselves up too much over negative feedback. If the site offers an optional feature that is very useful for people of the latter type, it's at least worth considering whether that's an overall improvement. I wasn't even annoyed that people didn't like the feature; it was more about the way in which the person argued. Generally, more display of awareness of people having different psychologies would please me. :)

Comment by lukas_gloor on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-21T09:00:20.138Z · score: 3 (2 votes) · LW · GW
There are a bunch of conversations going on about the topic right (some in semi-private which might be public soonish).

Cool! And I appreciate the difficulty of the task at hand. :)

When I model these conversations, one failure mode I'm worried about is that the "more civility" position gets lumped together with other things that Lesswrong is probably right to be scared of.

So, the following is to delineate my own views from things I'm not saying:

I could imagine being fine with Bridgewater culture in many (but not all) contexts. I hate that in "today's climate" it is difficult to talk about certain topics. I think it's often the case that people complaining about tone or about not feeling welcome shouldn't expect to have their needs accommodated.

And yet I still find some features of what I perceive to be "rationalist culture" very off-putting.

I don't think I phrased it as well in my first comment, but I can fully get behind what Raemon said elsewhere in this thread:

Some of the language about "holding truth sacred" [...] has came across to me with a tone of single-minded focus that feels like not being willing to put an upper bound on a heart transplant, rather than earnestly asking the question "how do we get the most valuable truthseeking the most effective way?"

So it's not that I'm saying that I'd prefer a culture where truth-seeking is occasionally completely abandoned because of some other consideration. Just that the side that superficially looks more virtuous when it comes to truth-seeking (for instance because they boldly proclaim the importance of not being bothered by tone/tact, downvote notifications, etc.) isn't automatically what's best in the long run.

Comment by lukas_gloor on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-20T21:13:18.966Z · score: 5 (3 votes) · LW · GW
Can you clarify which bit was off-putting? The fact that any norms were being promoted or the specific norms being promoted?

Only the latter. And also the vehemence with which these viewpoints seemed to be held and defended. I got the impression that statements of the sort "yay truth as the only sacred value" received strong support; personally I find that off-putting in many contexts.

Edit: The reason I find it off-putting isn't that I disagree with the position as site policy. More that sometimes the appropriate thing in a situation isn't just to respond with some tirade about why it's good to have an unempathetic site policy.

To give some more context: Only the first instance of this had to do with explicit calls for forum policy. This was probably the same example that inspired the dialogue between Jill and John above.

The second example was a comment on the question of making downvotes less salient. While I agree that the idea has drawbacks, I was a bit perplexed that a comment arguing against it got strongly upvoted despite including claims that felt to me like problematic "rationality for rationality's sake": Instead of allowing people to only look at demotivating information at specific times, we declare it antithetical to the "core of rationality" to hide information whether or not it overall makes people accomplish their goals better.

The third instance was an exchange you had about conversational tone and (lack of) charity. Toward the end you said that you didn't like the way you phrased your initial criticism, but my quick impression (and I probably only skimmed the lengthy exchange and also don't remember details) was that I generally thought your points seemed pretty defensible, and the way your conversation partner commented would have also thrown me off. "Tone and degree of charity are very important too" is a perspective I'd like to see represented more among LW users. (But if I'm in the minority, that's fine and I don't object to communities keeping their defining features if the majority feels that they are benefitting.)

That doesn't feel true to me.

Maybe I expressed it poorly, but what I meant was just that rationality is not an end in itself. If I complain that some piece of advice is not working for me because it makes me (all-things-considered, long-term) less productive (towards the things that are most important to me) and less happy, and my conversation partner makes some unqualified statement to the degree of "but it's rational to follow this type of advice", I will start to suspect that they are misunderstanding what rationality is for.

Comment by lukas_gloor on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-20T11:01:46.396Z · score: 7 (4 votes) · LW · GW

I liked this post a lot and loved the additional comment about "Feeling and truth-seeking norms" you wrote here.

As a small data point: there have been at least three instances in the past ~three months where I was explicitly noticing certain norm-promoting behavior in the rationalist community (and Lesswrong in particular) that I found off-putting, and "truth-seeking over everything else" captures it really well.

Treating things as sacred can lead to infectiousness where items in the vicinity of the thing are treated as sacred too, even in cases where the link to it becomes increasingly indirect.

For instance, in the discussion about whether downvote notifications should be shown to users as often as upvote notifications, I saw the sentiment expressed that it would be against the "core of rationality" to ever "hide" (by which people really just meant make less salient) certain types of useful information. Maybe this was just an expression of a visceral sentiment and not something the person would 100% endorse, but just in case it was the latter: It is misguided to think of rationality in that way. "It is rational to do x regardless of how it affects people's quality of life and productivity" should never be an argument. Most people's life goals aren't solely about truth-seeking nor about always mastering unhelpful emotions.

I think I'm on board with locking in some core epistemic virtues related to truth-seeking "as though it were sacred". I think some version of that is going to be best overall for people's life goals. But it's an open question how large that core should be. The cluster of things I associate with "epistemic virtue" is large and fuzzy. I am pretty confident that it's good to treat the core of that cluster as sacred. (For instance, that might include principles like "don't lie, present arguments rather than persuade, engage productively and listen to others, be completely transparent about moderation decisions such as banning policies," etc.) I'm less confident it's good for things that are a bit less central to the cluster. I'm very confident we shouldn't treat some things in the outer layers as sacred (and doing that would kind of trigger me if I'm being honest).

I guess one could object to my stance by asking: Is it possible to treat only the clearest instances of the truth-seeking virtue cluster as sacred without slipping down the slope of losing all the benefits of having something be treated as sacred at all?

I'm not completely sure, but here are some reasons why I think it ought to be possible:

  • People seem to be intuitively good at dealing with fuzzy concepts. If Jill (in the OP) is transparent about conversations he's having like the one shown with John, I am optimistic that the vast majority of the audience could come to conclude that Jill is acting in the realm of what is reasonable, even if they would sometimes draw boundaries in slightly different places.
  • I feel like tradeoffs are often overstated. In cases where truth-seeking norms conflict with other very important things, the best solution is rarely to have a foundational discussion about what's more important to then kick out one of the two things. Rather, I have hope that usually one can come up with some alternative solution (such as e.g. moving discussions about veganism to a separate thread, and asking Jill to link to that separate thread with a short and discreet comment, as opposed to Jill riding her hobbyhorse on all the threads she wants to "derail").
  • Personally, I think there's just as much to lose from cultivating an overly large cluster of sacredness than from an overly small one. Goodhearting "rationality for rationality's sake" and evaporative cooling where people put off by certain community features start contributing less and less both seem like very real risks to me.
Comment by lukas_gloor on Matt Goldenberg's Short Form Feed · 2019-07-20T09:04:36.981Z · score: 3 (6 votes) · LW · GW

Excellent comment!

I know there's a strong idea around norms in the rationality community to go full courage (expressing your true beliefs) and have other people mind thmeselves and ignore the consequences (decoupling norms).

"Have other people mind themselves and ignore the consequences" comes in various degrees and flavors. In the discussions about decoupling norms I have seen (mostly in the context of Sam Harris), it appeared me that they (decoupling norms) were treated as the opposite of "being responsible for people uncharitably misunderstanding what you are saying." So I worry that presenting it as though courage = decoupling norms makes it harder to get your point across, out of worry that people might lump your sophisticated feedback/criticism together with some of the often not-so-sophisticated criticism directed at people like Sam Harris. No matter what one might think of Harris, to me at least he seems to come across as a lot more empathetic and circumspect and less "truth over everything else" than the rationalists whose attitude about truth-seeking's relation to other virtues I find off-putting.

Having made this caveat, I think you're actually right that "decoupling norms" can go too far, and that there's a gradual spectrum from "not feeling responsible for people uncharitably misunderstanding what you are saying" to "not feeling responsible about other people's feelings ever, unless maybe if a perfect utilitarian robot in their place would also have well-justified instrumental reasons to turn on facial expressions for being hurt or upset". I just wanted to make clear that it's compatible to think that decoupling norms are generally good as long as considerateness and tact also come into play. (Hopefully this would mitigate worries that the rationalist community would lose something important by trying to reward considerateness a bit more.)

Comment by lukas_gloor on Let's Read: Superhuman AI for multiplayer poker · 2019-07-14T20:26:02.606Z · score: 14 (4 votes) · LW · GW

Thanks for this summary!

In 2017 I commented on the two-player version here.

... if the player bets in [a winning] situation only when holding the best possible hand, then the opponents would know to always fold in response. To cope with this, Pluribus keeps track of the probability it would have reached the current situation with each possible hand according to its strategy. Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand, being careful to balance its strategy across all the hands so as to remain unpredictable to the opponent. Once this balanced strategy across all hands is computed, Pluribus then executes an action for the hand it is actually holding.

Human professional players are trying to approximate this level of balancedness as well, using computer programs ("solvers"). See this youtube video for an example of a hand with solver analysis. In order to get the solver analysis started, one needs to specify input hand ranges one expects people to have in the specific situations, as well as bet sizes for the solver to consider (more than just 2-3 bet sizes would be too much for the solver to handle). To specify those parameters, professionals can make guesses (sometimes based on data) about how other players play. Because the input parameters depend on human learned wisdom rather than worked out game theory, solvers can't quite be said to have solved poker.

So, like the computer, human players try to simplify the game tree in order to be able to approximate balanced play. However, this is much easier for computers. Pluribus knows its own counterfactuals perfectly, and it can make sure it always covers all the options for cards to have (in order to represent different board textures) and has the right number of bluffs paired with good hands for every state of the game given past actions.

It almost seems kind of easy to beat humans in this way, except that knowing how to simplify and then model the situations in the first place seemed to have been the bottleneck up until 2017.

Donk betting: some kind of uncommon play that's usually considered dumb (like a donkey). I didn't figure out what it actually means.

"Donk betting" has a bad reputation because it's a typical mistake amateur players make, doing it in the wrong type of situations with the wrong types of hands. You can only donk bet in some betting round if you're first to act, and a general weakness amateur players have is that they don't understand the value of being last to act (having more information). To at least somewhat mitigate the awfulness of being first to act, good players try to give out as little information as possible. If you played the previous street passively and your opponent displayed strength, you generally want to check because your opponent already expects you to be weaker, and so will do the betting for you often enough because they're still telling their story of having a stronger hand. If you donk bet when a new card improved you, you telegraph information and your opponent can play perfectly against that, folding their weak hands and continuing only with strong hands. If you check instead, you get more value from your opponent's bluffs, and you almost always still get to put in your raise after they bet for you, reopening the betting round for you.

However, there are instances where donk betting is clearly good: When a new card is much more likely to improve your range of hands compared to your opponent's. In certain situations a new card is terrible for one player and good for the other player. In those instances, you can expect thinking opponents to check after you even with most of their strong hands, because they became apprehensive of your range of hands having improved a lot. In that case, you sometimes want to bet out right away (both in some of the cases where you hit, as well as with bluffs).

However, Pluribus disagrees with the folk wisdom that “donk betting” (starting a round by betting when one ended the previous betting round with a call) is a mistake; Pluribus does this far more often than professional humans do.

It might just be that professional humans decide to keep the game tree simple by not developing donk bet strategies for situations where this is complicated to balance and only produces small benefits if done perfectly. But it could be that Pluribus found a more interesting reason to occasionally use donk bets in situations where professional players would struggle to see the immediate use. Unfortunately I couldn't find any discussion of hand histories illustrating the concept.

Comment by lukas_gloor on Discourse Norms: Justify or Retract Accusations · 2019-05-22T05:30:39.781Z · score: 1 (1 votes) · LW · GW

For in-person conversations (I know this was meant as a norm for public discourse): Personally I tend to have a hard time digging into my memories for "data points" when I have a negative or positive impression of some person. It's kind of the same thing with people asking you "What have you been working on the past week?" – I basically never remember anything immediately (even though I do work on stuff). This creates asymmetric incentives where it's easier to make negative judgments seem unjustified or at least costly to bring up, which can contribute to a culture where justified critical opinions almost never reach enough of a consensus to change something. I definitely think there should be norms similar to the one described in the post, but I also think that there are situations (e.g., if a person has a reliable track record or if they promise to write a paragraph with some bullet points later on once they had time to introspect) were the norm should be less strict than "back the judgment up immediately or retract it." And okay, probably one can manage to say a few words even on the spot because introspection is not that slow and opaque, but my point is simply that "This sounds unconvincing" is just as cheap a thing to say as cheap criticism, and the balance should be somewhere in between. So maybe instead of "justify" the norm should say something like "gesture at the type of reasons," and that should be the bare minimum and more transparency is often preferable. (Another point is that introspecting on intuitive judgments helps refine them, so that's something that people should do occasionally even if they aren't being put on the spot to back something up.)

Needless to say, lax norms around this can be terrible in social environments where some people tend to talk too negatively about others and where the charitable voices are less frequent, so I think it's one of those things where the same type of advice can sometimes be really good, and other times can be absolutely terrible.

Comment by lukas_gloor on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-25T21:26:31.088Z · score: 19 (9 votes) · LW · GW

I’m reluctant to reply because it sounds like you’re looking for rebuttals by explicit proponents of hard takeoff who have thought a great deal about takeoff speeds, and neither of that applies to me. But I could sketch some intuitions why reading the pieces by AI Impacts and by Christiano hasn't felt wholly convincing to me. (I’ve never run these intuitions past anyone and don’t know if they’re similar to cruxes held by proponents of hard takeoff who are more confident in hard takeoff than I am – therefore I hope people don't update much further against hard takeoff in case they find the sketch below unconvincing.) I found that it’s easiest for me to explain something if I can gesture towards some loosely related “themes” rather than go through a structured argument, so here are some of these themes and maybe people see underlying connections between them:

Culture overhang

Shulman and Sandberg have argued that one way to get hard takeoff is via hardware overhang: when a new algorithmic insight can be used immediately to its full potential, because much more hardware is available than one would have needed to overtake state of the art performance metric with the new algorithms. I think there’s a similar dynamic at work with culture: If you placed an AGI into the stone age, it would be inefficient at taking over the world even with appropriately crafted output channels because stone age tools (which include stone age humans the AGI could manipulate) are neither very useful nor reliable. It would be easier for an AGI to achieve influence in 1995 when the environment contained a greater variety of increasingly far-reaching tools. But with the internet being new, particular strategies to attain power (or even just rapidly acquire knowledge) were not yet available. Today, it is arguably easier than ever for an AGI to quickly and more-or-less single-handedly transform the world.

Snapshot intelligence versus intelligence as learning potential

There’s a sense in which cavemen are similarly intelligent as modern-day humans. If we time-traveled back into the stone age, found the couples with the best predictors for having gifted children, gave these couples access to 21st century nutrition and childbearing assistance, and then took their newborns back into today’s world where they’d grow up in a loving foster family with access to high-quality personalized education, there’s a good chance some of those babies would grow up to be relatively ordinary people of close to average intelligence. Those former(?) cavemen and cavewomen would presumably be capable of dealing with many if not most aspects of contemporary life and modern technology.

However, there’s also a sense in which cavemen are very unintelligent compared to modern-day humans. Culture, education, possibly even things like the Flynn effect, etc. – these really do change the way people think and act in the world. Cavemen are incredibly uneducated and untrained concerning knowledge and skills that are useful in modern, tool-rich environments.

We can think of this difference as the difference between the snapshot of someone’s intelligence at the peak of their development and their (initial) learning potential. Caveman and modern-day humans might be relatively close to each other in terms of the latter, but when considering their abilities at the peak of their personal development, the modern humans are much better at achieving goals in tool-rich environments. I sometimes get the impression that proponents of soft takeoffs underappreciate this difference when addressing comparisons between, for instance, early humans and chimpanzees (this is just a vague general impression which doesn’t apply to the arguments presented by AI impacts or by Paul Christiano).

How to make use of culture: The importance of distinguishing good ideas from bad ones

Both for productive engineers and creative geniuses, it holds that they could only have developed their full potential because they picked up useful pieces of insight from other people. But some people cannot tell the difference between high-quality information and low-quality information, or might make wrong use even of high-quality information, reasoning themselves into biased conclusions. An AI system capable of absorbing the entire internet but terrible at telling good ideas from bad ideas won't make too much of a splash (at least not in terms of being able to take over the world). But what about an AI system just slightly above some cleverness threshold for adopting an increasingly efficient information diet? Couldn’t it absorb the internet in a highly systematic way rather than just soaking in everything indiscriminately, learning many essential meta-skills on its way, improving how it goes about the task of further learning?

Small differences in learning potential have compounded benefits over time

If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. I’ll reach a point where I’m just sort of skimming things because I’m not motivated enough to understand complicated ideas deeply, whereas they find it rewarding to comprehend everything that gives them a better sense of where to go next on their intellectual journey. By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company. The point being: an initially small difference in dispositions becomes almost incomprehensibly vast over time.

Knowing how to learn strategically: A candidate for secret sauce??

(I realized that in this title/paragraph, the word "knowing" is meant both in the sense of "knowing how to do x" and "being capable of executing x very well." It might be useful to try to disentangle this some more.) The standard AI foom narrative sounds a bit unrealistic when discussed in terms of some AI system inspecting itself and remodeling its inner architecture in a very deliberate way driven by architectural self-understanding. But what about the framing of being good at learning how to learn? There’s at least a plausible-sounding story we can tell where such an ability might qualify as the “secret sauce" that gives rise to a discontinuity in the returns of increased AI capabilities. In humans – and admittedly this might be too anthropomorphic – I'd think about it in this way: If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. Assuming, for the sake of the comparison, that a copy clan of 19-year olds can produce highly beneficial research outputs this way, and a copy clan of 12-year olds can’t, what does the landscape look like in between? I don’t find it evident that the in-between is gradual. I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough at the meta-level and divide labor sensibly enough to stay open to reassessing their approach as time goes on and they learn new things. Maybe all of that is gradual, and there are degrees of dividing labor sensibly or of staying open to reassessing one’s approach – but that doesn’t seem evident to me. Maybe this works more as an on/off thing.

How could natural selection produce on/off abilities?

It makes sense to be somewhat suspicious about any hypotheses according to which the evolution of general intelligence made a radical jump in Homo sapiens, creating thinking that is "discontinuous" from what came before. If knowing how to learn is an on/off ability that plays a vital role in the ways I described above, how could it evolve?
We're certainly also talking culture, not just genes. And via the Baldwin effect, natural selection can move individuals closer towards picking up surprisingly complex strategies via learning from their environment. At this point at latest, my thinking becomes highly speculative. But here's one hypothesis: In its generalization, this effect is about learning how to learn. And maybe there is something like a "broad basin of attraction" (inspired by Christiano's broad basin of attraction for corrigibility) for robustly good reasoning / knowing how to learn. Picking up some of the right ideas initially and early on, combined with being good at picking up things in general, produces in people an increasingly better sense of how to order and structure other ideas, and over time, the best human learners start to increasingly resemble each other, having honed in on the best general strategies.

The mediocre success of self-improvement literature

For most people, the returns of self-improvement literature (by which I mean not just productivity advice, but also information on "how to be more rational," etc.) might be somewhat useful, but rarely life-changing. People don’t tend to "go foom" from reading self-improvement advice. Why is that, and how does it square with my hypothesis above, that “knowing how to learn” could be a highly valuable skill with potentially huge compounding benefits? Maybe the answer is that the bottleneck is rarely knowledge about self-improvement, but rather the ability to make the best use of such knowledge? This would support the hypothesis mentioned above: If the critical skill is finding useful information in a massive sea of both useful and not-so-useful information, that doesn’t necessarily mean that people will get better at that skill if we gave them curated access to highly useful information (even if it's information about how to find useful information, i.e., good self-improvement advice). Maybe humans don’t tend to go foom after receiving humanity's best self-improvement advice because too much of that is too obvious for people who were already unusually gifted and then grew up in modern society where they could observe and learn from other people and their habits. However, now imagine someone who had never read any self-improvement advice, and could never observe others. For that person, we might have more reason to expect them to go foom – at least compared to their previous baseline – after reading curated advice on self-improvement (or, if it is true that self-improvement literature is often somewhat redundant, even just from joining an environment where they can observe and learn from other people and from society). And maybe that’s the situation in which the first AI system above a certain critical capabilities threshold finds itself. The threshold I mean is (something like) the ability to figure out how to learn quickly enough to then approach the information on the internet like the hypothetical 19-year olds (as opposed to the 12-year olds) from the thought experiment above.

---

Hard takeoff without a discontinuity

(This argument is separate from all the other arguments above.) Here’s something I never really understood about the framing of the hard vs. soft takeoff discussion. Let’s imagine a graph with inputs such as algorithmic insights and compute/hardware on the x-axis, and general intelligence (it doesn’t matter for my purposes whether we use learning potential or snapshot intelligence) on the y-axis. Typically, the framing is that proponents of hard takeoff believe that this graph contains a discontinuity where the growth mode changes, and suddenly the returns (for inputs such as compute) are vastly higher than the outside view would have predicted, meaning that the graph makes a jump upwards in the y-axis. But what about hard takeoff without such a discontinuity? If our graph starts to be steep enough at the point where AI systems reach human-level research capabilities and beyond, then that could in itself allow for some hard (or "quasi-hard") takeoff. After all, we are not going to be sampling points (in the sense of deploying cutting-edge AI systems) from that curve every day – that simply wouldn't work logistically even granted all the pressures to be cutting-edge competitive. If we assume that we only sample points from the curve every two months, for instance, is it possible that for whatever increase in compute and algorithmic insights we’d get in those two months, the differential on the y-axis (some measure of general intelligence) could be vast enough to allow for attaining a decisive strategic advantage (DSA) from being first? I don’t have strong intuitions about what the offense-defense balance will shift to once we are close to AGI, but it at least seems plausible that it turns more towards offense, in which case arguably a lower differential is needed for attaining a DSA. In addition, based on the classical arguments put forward by researchers such as Bostrom and Yudkowsky, it also seems at least plausible to me that we are potentially dealing with a curve that is very steep around the human level. So, if one AGI project is two months ahead of another project, and we for the sake of argument assume that there are no inherent discontinuities in the graph in question, it’s still not evident to me that this couldn’t lead to something that very much looks like hard takeoff, just without an underlying discontinuity in the graph.

Comment by lukas_gloor on A theory of human values · 2019-03-14T00:40:05.947Z · score: 4 (2 votes) · LW · GW

Leaning on this, someone could write a post about the "infectiousness of realism" since it might be hard to reconcile openness to non-zero probabilities of realism with anti-realist frameworks? :P

For people who believe their actions matter infinitely more if realism is true, this could be modeled as an overriding meta-preference to act as though realism is true. Unfortunately if realism isn't true this could go in all kinds of directions depending on how the helpful AI system would expect to get into such a judged-to-be-wrong epistemic state.

Probably you were thinking of something like teaching AIs metaphilosophy in order to perhaps improve the procedure? This would be the main alternative I see, and it does feel more robust. I am wondering though whether we'll know by that point whether we've found the right way to do metaphilosophy (and how approaching that question is different from approaching whichever procedures philosophically sophisticated people would pick to settle open issues in something like the above proposals). It seems like there has to come a point where one has to hand off control to some in-advance specified "metaethical framework" or reflection procedure, and judged from my (historically overconfidence-prone) epistemic state it doesn't feel obvious why something like Stuart's anti-realism isn't already close to there (though I'd say there are many open questions and I'd feel extremely unsure about how to proceed regarding for instance "2. A method for synthesising such basic preferences into a single utility function or similar object," and also to some extent about the premise of squeezing a utility function out of basic preferences absent meta-preferences for doing that). Adding layers of caution sounds good though as long as they don't complicate things enough to introduce large new risks.

Comment by lukas_gloor on Why do you reject negative utilitarianism? · 2019-02-13T12:00:59.105Z · score: 25 (8 votes) · LW · GW

Ethical theories don't need to be simple. I used to have the belief that ethical theories ought to be simple/elegant/non-arbitrary for us to have a shot at them being the correct theory, a theory that intelligent civilizations with different evolutionary histories would all converge on. This made me think that NU might be that correct theory. Now I’m confident that this sort of thinking was confused: I think there is no reason to expect that intelligent civilizations with different evolutionary histories would converge on the same values, or that there is one correct set of ethics that they "should" converge on if they were approaching the matter "correctly". So, looking back, my older intuition feels confused now in a similar way as ordering the simplest food in a restaurant in expectation of anticipating what others would order if they also thought that the goal was that everyone orders the same thing. Now I just want to order the "food" that satisfies my personal criteria (and these criteria do happen to include placing value on non-arbitrariness/simplicity/elegance, but I’m a bit less single-minded about it). 

Your way of unifying psychological motivations down to suffering reduction is an "externalist" account of why decisions are made, which is different from the internal story people tell themselves. Why think all people who tell different stories are mistaken about their own reasons? The point "it is a straw man argument that NUs don’t value life or positive states“ is unconvincing, as others have already pointed out. I actually share your view that a lot of things people do might in some way trace back to a motivating quality in feelings of dissatisfaction, but (1) there are exceptions to that (e.g., sometimes I do things on auto-pilot and not out of an internal sense of urgency/need, and sometimes I feel agenty and do things in the world to achieve my reflected life goals rather than tend to my own momentary well-being), and (2) that doesn’t mean that whichever parts of our minds we most identify with need to accept suffering reduction as the ultimate justification of their actions. For instance, let’s say you could prove that a true proximate cause why a person refused to enter Nozick’s experience machine was that, when they contemplated the decision, they felt really bad about the prospect of learning that their own life goals are shallower and more self-centered than they would have thought, and *therefore* they refuse the offer. Your account would say: "They made this choice driven by the avoidance of bad feelings, which just shows that ultimately they should accept the offer, or choose whichever offer reduces more suffering all-things-considered.“ Okay yeah, that's one story to tell. But the person in question tells herself the story that she made this choice because she has strong aspirations about what type of person she wants to be. Why would your externally-imported justification be more valid (for this person's life) than her own internal justification?

Comment by lukas_gloor on Arguments for moral indefinability · 2019-02-12T16:27:05.460Z · score: 5 (4 votes) · LW · GW

I think I broadly agree with all the arguments to characterize the problem and to motivate indefinability as a solution, but I have a different (meta-)meta-level intuitions about how palatable indefinability would be, and as a result of that, I'd say I have been thinking about similar issues in a differently drawn framework. While you seem to advocate for "salvaging the notion of ’one ethics’“ while highlighting that we then need to live with indefinability, I am usually thinking of it in terms of: "Most of this is underdefined, and that’s unsettling at least in some (but not necessarily all) cases, and if we want to make it less underdefined, the notion of 'one ethics' has to give.“ Maybe one reason why I find indefinability harder to tolerate is because in my own thinking, the problem arises forcefully at an earlier/higher-order stage already, and therefore the span of views that "ethics" is indefinable about(?) is larger and already includes questions of high practical significance. Having said that, I think there are some important pragmatic advantages to an "ethics includes indefinability“ framework, and that might be reason enough to adopt it. While different frameworks tend to differ in the underlying intuitions they highlight or move into the background, I think there is more than one parsimonious framework in which people can "do moral philosophy“ in a complete and unconfused way. Translation between frameworks can be difficult though (which is one reason I started to write a sequence about moral reasoning under anti-realism, to establish a starting points for disagreements, but then I got distracted – it’s on hold now).

Some more unorganized comments (apologies for "lazy“ block-quote commenting): 

Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire.

This idea seems correct to me. And as you indicate later in the paragraph, we can add that it’s plausible that the "theoretical virtues“ are not well-specified either (e.g., there’s disagreement between people’s theoretical desiderata, or there’s vagueness in how to cash out a desideratum such as "non-arbitrariness"). 

My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions.

This recommendation makes sense to me (insofar as one can still do that), but I don’t think it’s completely obvious. Because both meta-level intuitions and object-level intuitions are malleable in humans, and because there’s no(t obviously a) principled distinction between these two types of intuitions, it’s an open question to what degree people want to adjust their meta-level intuitions in order to not have to bite the largest bullets.

If the only reason people were initially tempted to bite the bullets in question (e.g., accept a counterintuitive stance like the repugnant conclusion) was because they had a cached thought that "Moral theories ought to be simple/elegant“, then it makes a lot of sense to adjust this one meta-level intuition after the realization that it seems ungrounded. However, maybe "Moral theories ought to be simple/elegant“ is more than just a cached thought for some people:

Some moral realists buy the "wager" that their actions matter infinitely more in case moral realism is true. I suspect that an underlying reason why they find this wager compelling is that they have strong meta-level intuitions about what they want morality to be like, and it feels to them that it’s pointless to settle for something other than that.

I’m not a moral realist, but I find myself having similarly strong meta-level intuitions about wanting to do something that is "non-arbitrary" and in relevant ways "simple/elegant". I’m confused about whether that’s literally the whole intuition, or whether I can break it down into another component. But motivationally it feels like this intuition is importantly connected to what makes it easy for me to go "all-in“ for my ethical/altruistic beliefs.

A second reason to believe in moral indefinability is the fact that human concepts tend to be open texture: there is often no unique "correct" way to rigorously define them.

I strongly agree with this point. I think even very high-level concepts in moral philosophy or the philosophy of reason/self-interest are "open texture“ like that. In your post you seem to start with an assumption that people have a rough, shared sense of what "ethics“ is about. But if the fuzziness is already attacking at this very high level, it calls into question whether you can find a solution that seems satisfying to different people’s (fuzzy and underdetermined) sense of what the question/problem is even about. 

For instance, there is the narrow interpretations such as "ethics as altruism/caring/doing good“ (which I think roughly captures at least large parts of what you assume, and it also captures the parts I’m personally most interested in). There's also "ethics as cooperation or contract“. And maybe the two blend into each other. 

Then there’s the broader (I label it "existentialist“) sense in which ethics is about "life goals“ or "Why do I get up in the morning?“. And within this broader interpretation of it, you suddenly get narrower subdomains like "realism about rationality“ or "What makes up a person's self-interest?“ where the connection to the other narrower domains (e.g. "ethics as altruism“) are not always clear.

I think indefinability is a plausible solution (or meta-philosophical framework?) for all of these. But when the scope over which we observe indefinability becomes so broad, it illustrates why it might feel a bit frustrating for some people, because without clearly delineated concepts it can be harder to make progress, and so a framework in which indefinability plays a central role could in some cases obscure conceptual progress in subareas where one might be able to make such progress (at least at the "my personal morality“ level, though not necessarily at the level of a "consensus morality“). 

(I’m not sure I’m disagreeing with you BTW; probably I’m just adding thoughts and blowing up the scope of your post.)

I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs.

I agree. The second part of my comment here tries to talk about this as well. 

And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.

Yeah. I assume most of us are familiar with a deep sense of uncertainty about whether we found the right approach to ethical deliberation. And one can maybe avoid to feel this uncomfortable feeling of uncertainty by deferring to idealized reflection. But it’s not obvious that this lastingly solves the underlying problem: Maybe we’ll always feel uncertain whenever we enter the mode of "actually making a moral judgment“. If I found myself as a virtual person who is part of a moral reflection procedure such as Paul Christiano's indirect normativity, I wouldn’t suddenly know and feel confident in how to resolve my uncertainties. And the extra power, and the fact that life in the reflection procedure would be very different from the world I currently know, introduces further risks and difficulties. I think there are still reasons why one might want to value particularly-open-ended moral reflection, but maybe it's important that people don’t use the uncomfortable feeling of "maybe I’m doing moral philosophy wrong“ as their sole reason to value particularly-open-ended moral reflection. If the reality is that this feeling never goes away, then there seems something wrong with the underlying intuition that valuing particularly-open-ended moral reflection is by default the "safe" or "prudent" thing to do. (And I'm not saying it's wrong for people value particularly-open-ended moral reflection; I suspect that it depends on one's higher-order intuitions: For every perspective there's a place where the buck stops.)

From an anti-realist perspective, I claim that perpetual indefinability would be better.

It prevents fanaticism, which is a big plus. And it plausibly creates more agreement, which is also a plus in some weirder sense (there's a "non-identity problem" type thing about whether we can harm future agents by setting up the memetic environment such that they'll end up having less easily satisfiable goals, compared to an alternative where they'd find themselves in larger agreement and therefore with more easily satisfiable goals). A drawback is that it can mask underlying disagreements and maybe harm underdeveloped positions relative to the status quo.

That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or more like preferences or tastes

That’s a good description. I sometimes use the analogy of "morality is more like career choice than scientific inquiry“. 

I don't think that's a coincidence: psychologically, humans just aren't built to be maximisers, and so a true maximiser would be fundamentally adversarial.

This is another good instrumental/pragmatic argument why anti-realists interested in shaping the memetic environment where humans engage in moral philosophy might want to promote the framing of indefinability rather than "many different flavors of consequentialism, and (eventually) we should pick“. 

Comment by lukas_gloor on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-25T12:19:05.096Z · score: 19 (9 votes) · LW · GW
AlphaStar’s innovative league-based training process finds the approaches that are most reliable and least likely to go wrong.

"Go wrong" is still tied to the game's win condition. So while the league-based training process does find the set of agents whose gameplay is least exploitable (among all the agents they trained), it's not obvious how this relates to problems in AGI safety such as goal specification or robustness to capability gains. Maybe they're thinking of things like red teaming. But without more context I'm not sure how safety-relevant this is.

Comment by lukas_gloor on Why is so much discussion happening in private Google Docs? · 2019-01-12T11:33:59.470Z · score: 22 (10 votes) · LW · GW
2. The ability to comment on a specific line in a document, with the comment showing up in context.

Yeah, I really like how convenient that is.

Comment by lukas_gloor on Why is so much discussion happening in private Google Docs? · 2019-01-12T11:31:57.605Z · score: 9 (6 votes) · LW · GW

For me there's a huge difference between these two.

  • In gdocs I feel like it's more okay to write "unpolished" comments. I think that's mostly because the expectations are lower. Polishing my comments takes me 3-5x longer, which often takes away the motivation to comment at all.
  • In a public forum I worry more about provoking misleading impressions. For instance, in a gdoc shared with people who know me well, I'm not worried that a comment like "AIs might do [complex sequence of actions]" will get people to think that I have weirdly confident views about how the future might play out. In public conversations I'd experience a strong urge to qualify statements like that even though it feels tedious to do so.
Comment by lukas_gloor on Book Review: The Structure Of Scientific Revolutions · 2019-01-12T10:48:32.389Z · score: 1 (1 votes) · LW · GW
You need a lot of hindsight bias to say that it was clear from the get go which paradigms were going to win over the last century.

Sure. And I think Kuhn's main point as summarized by Scott really does give a huge blow to the naive view that you can just compare successful predictions to missed predictions, etc.

But to think that you cannot do better than chance at generating successful new hypotheses is obviously wrong. There would be way too many hypotheses to consider, and not enough scientists to test them. From merely observing science's success, we can conclude that there has to be some kind of skill (Yudkowksy's take on this is here and here, among other places) that good scientists employ to do better than chance at picking what to work on. And IMO it's a strange failure of curiosity to not want to get to the bottom of this when studying Kuhn or the history of science.

Comment by lukas_gloor on Book Review: The Structure Of Scientific Revolutions · 2019-01-11T13:40:09.024Z · score: 2 (2 votes) · LW · GW
When I hear scientists talk about Thomas Kuhn, he sounds very reasonable. [...] When I hear philosophers talk about Thomas Kuhn, he sounds like a madman.

Yes, this! I remember I was extremely confused by the discourse around Kuhn. I'm not sure whether for me the impression was split into scientists vs. non-scientists, but I definitely felt like there was something weird about it and there were too sides to it, one that sounded potentially reasonable, and one that sounded clearly like relativism.

When taking a course on the book, I concluded that both perspectives were appropriate. One thing that went too far into relativism was Kuhn's insistence that there is no way to tell in advance which paradigm is going to be successful. His description of this is that you pick "teams" initially for all kinds of not-truth-tracking reasons, and you only figure out many years later whether your new paradigm will be winning or not.

But I'm not sure Kuhn even was (at least in The Structure of Scientific Revolutions) explicitly saying "No, you cannot do better than chance at picking sides." Rather, the weird thing is that I remember feeling like he was not explicitly asking that question, that he was just brushing it under the carpet. Likewise the lecturer of the course, a Kuhn expert, seemed to only be asking the question "How does (human-)science proceed?", and never "How should science proceed?"

Comment by lukas_gloor on Will humans build goal-directed agents? · 2019-01-06T08:07:50.207Z · score: 3 (2 votes) · LW · GW
Suppose the agent you're trying to imitate is itself goal-directed. In order for the imitator to generalize beyond its training distribution, it seemingly has to learn to become goal-directed (i.e., perform the same sort of computations that a goal-directed agent would). I don't see how else it can predict what the goal-directed agent would do in a novel situation. If the imitator is not able to generalize, then it seems more tool-like than agent-like. On the other hand, if the imitatee is not goal-directed... I guess the agent could imitate humans and be not entirely goal-directed to the extent that humans are not entirely goal-directed. (Is this the point you're trying to make, or are you saying that an imitation of a goal-directed agent would constitute a non-goal-directed agent?)

I'm not sure these are the points Rohin was trying to make, but there seem to be at least two important points here:

  • Imitation learning applied to humans produces goal-directed behavior only insofar humans are goal-directed
  • Imitation learning applied to humans produces agents no more capable than humans. (I think IDA goes beyond this by adding amplification steps, which are separate. And IRL goes beyond this by trying to correct "errors" that the humans make.)

Regarding the second point, there's a safety-relevant sense in which a human-imitating agent is less goal-directed than the human. Because if you scale the human's capabilities, the human will become better at achieving its personal objectives. By contrast, if you scale the imitator's capabilities, it's only supposed to become even better at imitating the unscaled human.

Comment by lukas_gloor on What makes people intellectually active? · 2019-01-01T00:18:43.035Z · score: 30 (12 votes) · LW · GW

I believe for some people it's very important to have a moment of realization that one can get to the frontier of knowledge in a given field of interest. It feels intimidating if others are making contributions that seem decisively out of your league. Because people might intuitively underestimate how far you can get with focused reading and learning, it could be good to give tailored advice to people newer to (e.g.) AI risk for how/where they can make contributions that will feel encouraging. For illustration, a few years ago I was playing a computer game for fun for quite a while until I was by chance matched up with the one of the better competitive players and I almost won against them, getting lucky. That experience showed me that I'd have a shot if I actually tried, and it encouraged me to immediately start practicing with the aim of becoming competitive at that game. It changed my mindset over night. Similarly, I think there's a difference in mindset between "reading and talking about research topics for fun" and "reading and talking about research topics with the intent of seriously contributing".

I agree with others that a rewarding social environment and people in a similar range of competence you can bounce ideas back-and-forth with are extremely important. If you collaborate with people who are similarly driven to figure things out and discuss ideas with you, that automatically forces you think about your ideas for much longer and in more detail. By yourself you might stop thinking about a topic once you reach a roadblock, but if every morning you wake up to new messages by a collaborator adding criticism or new bits to your thinking, you're going to keep working on the topic.

I also suspect that people are sometimes too modest (or in the wrong mindset) to develop the habit of "taking stances". Some people know about a lot of different considerations and can tell you in detail what others have written, but they don't invest effort coming up with their own opinion – presumably because they don't consider themselves to be experts. Some of the community norms about not being overconfident might contribute to this failure mode, but the two things are distinct because people can try practicing taking stances with personal "pre-Aumann opinions", which they are free to largely ignore when deferring to the experts for an all-things-considered judgment.

Speculation about personality traits conducive to generating ideas: OCD was mentioned in the comments. There's also OCPD and hyperfocus. Carl Shulman's advice for researchers among other things mentions something about having a strong emotional reaction to people being wrong on the internet (in communities you care about) – I think this might be a symptom of being very invested in the ideas, and it can help further clarify one's thinking while trying to articulate fervently why something is wrong. Need for closure also seems relevant to me. It has its dangers because it can lead to one-sided thinking. But in me at least I'm often driven by feeling deeply unsatisfied with not having answers to questions that seem strategically important. And, anecdotally, I know some people with low need for closure who I consider to be phenomenal researchers in most important respects, but these people are less creative than I would be with their skills and backgrounds, and their obsessive focus maybe goes into greater width of research rather than zooming in on making progress on the "construction sites". Finally, I strongly agree with John Maxwell's point that a "temporary delusion" for thinking that one's ideas are really good is a great reinforcement mechanism (even though it often leads to embarrassment later on).

Comment by lukas_gloor on What is ambitious value learning? · 2018-12-30T01:51:23.342Z · score: 3 (2 votes) · LW · GW
I interpreted Wei's comment as saying that even your reflective life goals would be underdetermined -- presumably even now if you hear convincing moral argument A but not B, then you'd have different reflective life goals than if you hear B but not A.

Okay yeah, that also seems broadly correct to me.

I am hoping though that, as long as I'm not subjected to optimization pressures from outside that weren't crafted to be helpful, it's very rare that something I'd currently consider very important can end up either staying important or becoming completely unimportant merely based on order of new arguments encountered. And similarly I'm hoping that my value endpoints would still cluster decisively around the things I currently consider most important, – though that's where it becomes tricky to trade off goal preservation versus openness for philosophical progress.

Comment by lukas_gloor on What is ambitious value learning? · 2018-12-30T01:36:46.268Z · score: 1 (1 votes) · LW · GW

Thanks! I think I understand the intent of the rephrasing now.

What I meant with "obscure" is that both "true utility function" and "utility function that encodes the optimal actions to take for the best possible universe" have normative terminology in them that I don't know how to reduce or operationalize.

For instance, imagine I am looking at action sequences and ranking them. Presumably large portions of that process would feel like difficult judgment calls where I'd feel nervous about still making some kind of mistake. Both your phrasings (to my ears) carry the connotation that there is a "best" mistake model, one which is in a relevant sense independent from our own judgment, where we can learn things that will make us more and more confident that now we're probably not making mistakes anymore because of progress in finding the correct way of thinking about our values. That's the part that feels obscure to me because I think we'll always be in this unsatisfying epistemic situation where we're nervous about making some kind of mistake by the light of a standard that we cannot properly describe.

I do get the intuition for thinking in these terms, though. It feels conceivable that another discovery similar to what cognitive biases did could improve our thinking, and I definitely agree that we want a concept for staying open to this possibility. I'm just pointing out that non-operationalized normative concepts seem obscure. (Though maybe that's fine if we're treating them in the same way Yudkowsky treats "magic reality fluid" – as a placeholder for whatever comes once we're less confused about "measure".)

Comment by lukas_gloor on Humans can be assigned any values whatsoever… · 2018-12-28T08:55:52.175Z · score: 3 (2 votes) · LW · GW
This post comes from a theoretical perspective that may be alien to ML researchers; in particular, it makes an argument that simplicity priors do not solve the problem pointed out here, where simplicity is based on Kolmogorov complexity (which is an instantiation of the Minimum Description Length principle). The analog in machine learning would be an argument that regularization would not work.

Out of curiosity, is there an intuitive explanation as to why these are different? Is it mainly because ambitious value learning inevitably has to deal with lots of (systematic) mistakes in the data, whereas normally you'd make sure that the training data doesn't contain (many) obvious mistakes? Or are there examples in ML where you can retroactively correct mistakes imported from a flawed training set?

(I'm not sure "training set" is the right word for the IRL context. Applied to ambitious value learning, what I mean would be the "human policy".)

Update: Ah, it seems like the next post is all about this! :) My point about errors seems like it might be vaguely related, but the explanation in the next post feels more satisfying. It's a different kind of problem because you're not actually interested in predicting observable phenomena anymore, but instead are trying to infer the "latent variable" – the underlying principle(?) behind the inputs. The next post in the sequence also gives me a better sense of why people say that ML is typically "shallow" or "surface-level reasoning".

Comment by lukas_gloor on What is ambitious value learning? · 2018-12-28T06:39:24.476Z · score: 1 (1 votes) · LW · GW
Of course this is all assuming that there does exist a true utility function, but I think we can replace "true utility function" with "utility function that encodes the optimal actions to take for the best possible universe" and everything still follows through.

The replacement feels just as obscure to me as the original.

Comment by lukas_gloor on What is ambitious value learning? · 2018-12-28T06:19:31.681Z · score: 4 (2 votes) · LW · GW
But more generally, if you think that a different set of life experiences means that you are a different person with different values, then that's a really good reason to assume that the whole framework of getting the true human utility function is doomed. Not just ambitious value learning, _any_ framework that involves an AI optimizing some expected utility would not work.

This statement feels pretty strong, especially given that I find it trivially true that I'd be a different person under many plausible alternative histories. This makes me think I'm probably misinterpreting something. :)

At first I read your paragraph as the strong claim that if it's true that individual human values are underdetermined at birth, then ambitious value learning looks doomed. And I'd take it as proof for "individual human values are underdetermined at birth" if, replaying history, I'd now have different values (or a different probability distribution over values) if I had encountered Yudkowsky's writings before Singer's, rather than vice-versa. Or if I would be less single-minded about altruism had I encountered EA a couple of years later in life, after already taking on another self-identity.

But these points (especially the second example) seem so trivially true that I'm probably talking about a different thing. In addition, they're addressed by the solution you propose in your first paragraph, namely taking current-you as the starting point.

Another concern could be that "there is almost never a stable core of an individual human's values", i.e., that "even going forward from today, the values of Lukas or Rohin or Wei are going to be heavily underdetermined". Is that the concern? This seems like it could be possible for most people, but definitely not for all people. And undetermined values are not necessarily that bad (though I find it mildly disconcerting, personally). [Edit: Wei's comment and your reply to it sounds like this might indeed be the concern. :) Good discussion there!]

The fact that I have a hard time understanding the framework behind your statement is probably because I'm thinking in terms of a different part of my brain when I talk about "my values". I identify very much with my reflective life goals to a point that seems unusual. I don't identify much with "What Lukas's behavior, if you were to put him in different environments and then watch, would indirectly consistently tell you about the things he appears to want – e.g., 'values' like being held in high esteem by others, having a comfortable life, romance, having either some kind of overarching purpose or enough distractions to not feel bother by the lack of purpose, etc.". There is definitely a sense in which the code that runs me is caring about all these implicit goals. But that's not how I most want to see it. I also know that in all the environments that offer the options to self-modify into a more efficient pursuer of explicitly held personal ideals, I would make substantial use of the option to self-modify. And that seems relevant for the same reason that we wouldn't want to count cognitive biases as people's values.

(I should probably continue reading the sequence and then come back to this later if I still feel unclear about it.)

Comment by lukas_gloor on Cognitive Enhancers: Mechanisms And Tradeoffs · 2018-10-23T21:26:47.149Z · score: 6 (4 votes) · LW · GW
And what about the tradeoff? Is there one?

What you mention in your second possibility ("rote, robotic way") goes into a similar direction, but I'd be worried about something more specific: Difficulties at big-picture prioritization when it comes to selecting what to be interested in. I envy people who find it easy to delve into all kinds of subjects and absorb a wealth of knowledge. But those same people may then fail to be curious enough when they encounter a piece of information that really would be much more relevant than the information they usually encounter. Or they might spend their time on tasks that don't produce the most impact.

Admittedly I'm looking at this with a terribly utilitarianism-tainted lens. Probably finding it easy to be interested in many things is generally a huge plus.

But I do suspect that there's a tradeoff. If reading about the Battle of Cedar Creek felt 30% as interesting to our brains as reading cognitive science or Lesswrong or Peter Singer or whatever got people here hooked on these sorts of things, then maybe fewer of us would have gotten hooked.

Comment by lukas_gloor on Hufflepuff Cynicism on Hypocrisy · 2018-03-29T22:52:53.948Z · score: 12 (4 votes) · LW · GW

I think I'm talking about a different concept than you are talking about. Here's what I take to be hypocrisy that is probably/definitely bad:

When someone's brain is really good at selective remembering and selective forgetting, remembering things so they are convenient, and forgetting things that are inconvenient. And when the person is either unconsciously or only semi-consciously acting as an amplifier of opinions, sensing where a group is likely to go and then pushing (and often overshooting) the direction in order to be first to score points. This is where flip-flopping gets its bad reputation. At the extreme the person may fail to distinguish, in terms of mental motions, between what is their actual opinion vs. what opinion they expect to earn praise.

Some of these may not always come together but I think they often do, and the common theme is self-deception and little introspection. For instance, something many people do without noticing: Everyone's opinions fluctuate over time; sometimes you feel lukewarm about an idea, at other times you're an ardent supporter. If it later turns out that the idea was great, you remember mostly the times you supported it. If it turns out the idea was absolutely horrible, you're tempted to specifically remember this one 2-week window half a year before the idea fell out of fashion where you felt lukewarm about it and voiced doubts to someone (or were "almost" going to do that), and you then tell yourself and others that "you called it" even though, in reality, you totally failed to pay attention to your doubts.

Another example: You fail to understand or spot a good idea when you first hear it, then later once the context makes it more obvious that the idea was great, it occurs to you and you think it's entirely your own idea, so much so that you'd enthusiastically tell it to the person you first heard it from. (Often this is innocent, but if it happens an uncanny number of times maybe it's a reason to start paying attention.)

I think this type of hypocrisy hinders self growth, can prevent the right people from getting credit and amplifies group biases. So I'd say it's very bad. But norms against hypocrisy have to be careful because it's something that everyone might have to some degree, and the costs of enforcing norms need to be kept smaller than the actual problem. Keeping score or arguing over whose memory about something is right can create an atmosphere with effects just as bad as extreme hypocrisy itself. Sometimes hypocrisy is fueled by a desire to be held in high regard, and then being accused of hypocrisy may also worsen the mechanisms at work.

Comment by lukas_gloor on The Doomsday argument in anthropic decision theory · 2017-08-31T16:14:21.089Z · score: 3 (3 votes) · LW · GW

I believe that you're right about the historicity, but for me at least, any explanations of UDT I came across a couple of years ago seemed too complicated for me to really grasp the implications for anthropics, and ADT (and the appendix of Brian's article here) were the places where things first fell into place for my thinking. I still link to ADT these days as the best short explanation for reasoning about anthropics, though I think there may be better explanations of UDT now (suggestions?). Edit: I of course agree with giving credit to UDT being good practice.

Comment by lukas_gloor on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-22T19:26:21.826Z · score: 0 (0 votes) · LW · GW

Or are you worried more that the question won't be answered correctly by whatever will control our civilization?

Perhaps this, in case it turns out to be highly important but difficult to get certain ingredients – e.g. priors or decision theory – exactly right. (But I have no idea, it's also plausible that suboptimal designs could patch themselves well, get rescued somehow, or just have their goals changed without much fuss.)

Comment by lukas_gloor on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-22T08:43:44.410Z · score: 2 (2 votes) · LW · GW

Some people at MIRI might be thinking about this under nonperson predicate. (Eliezer's view on which computations matter morally is different from the one endorsed by Brian, though.) And maybe it's important to not limit FAI options too much by preventing mindcrime at all costs – if there are benefits against other very bad failure modes (or – cooperatively – just increased controllability for the people who care a lot about utopia-type outcomes), maybe some mindcrime in the early stages to ensure goal-alignment would be the lesser evil.

Comment by lukas_gloor on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-22T08:26:02.226Z · score: 2 (2 votes) · LW · GW

Interesting!

Further, while being quite a good article, you can read the summary, introduction and conclusion without encountering the idea that the author believes that s-risks are much greater than x-risks, as opposed to being just yet another risk to worry about.

I'm only confident about endorsing this conclusion conditional on having values where reducing suffering matters a great deal more than promoting happiness. So we wrote the "Reducing risks of astronomical suffering" article in a deliberately 'balanced' way, pointing out the different perspectives. This is why it didn't come away making any very strong claims. I don't find the energy-efficiency point convincing at all, but for those who do, x-risks are likely (though not with very high confidence) still more important, mainly because more futures will be optimized for good outcomes rather than bad outcomes, and this is where most of the value is likely to come from. The "pit" around the FAI-peak is in expectation extremely bad compared to anything that exists currently, but most of it is just accidental suffering that is still comparatively unoptimized. So in the end, whether s-risks or x-risks are more important to work on on the margin depends on how suffering-focused or not someone's values are.

Having said that, I totally agree that more people should be concerned about s-risks and it's concerning that the article (and the one on suffering-focused AI safety) didn't manage to convey this point well.

Comment by lukas_gloor on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-21T03:38:59.248Z · score: 4 (4 votes) · LW · GW

I sympathize with your feeling of alienation at the comment, and thanks for offering this perspective that seems outlandish to me. I don't think I agree with you re who the 'normies' are, but I suspect that this may not be a fruitful thing to even argue about.

Side note: I'm reminded of the discussion here. (It seems tricky to find a good way to point out that other people are presenting their normative views in a way that signals an unfair consensus, without getting into/accused of identify politics or having to throw around words like "loony bin" or fighting over who the 'normies' are.)

Comment by lukas_gloor on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-21T03:14:49.739Z · score: 6 (5 votes) · LW · GW

Those of us who sympathize with suffering-focused ethics have an incentive to encourage others to think about their values now, at least in crudely enough terms to take a stance on prioritizing preventing s-risks vs. making sure we get to a position where everyone can safely deliberate their values further and then everything gets fulfilled. Conversely, if one (normatively!) thinks the downsides of bad futures are unlikely to be much worse than the upsides of good futures, then one is incentivized to promote caution about taking confident stances on anything population-ethics-related, and instead value deeper philosophical reflection. The latter also has the upside of being good from a cooperation point of view: Everyone can work on the same priority (building safe AI that helps with philosophical reflection) regardless of one's inklings about how personal value extrapolation is likely to turn out.

(The situation becomes more interesting/complicated for suffering-focused altruists once we add considerations of multiverse-wide compromise via coordinated decision-making, which, in extreme versions at least, would call for being "updateless" about the direction of one's own values.)

Comment by lukas_gloor on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-21T02:16:04.376Z · score: 1 (1 votes) · LW · GW

A lot of people who disagree with veganism agree that factory farming is terrible. Like, more than 50% of the population I'd say.

Comment by lukas_gloor on New circumstances, new values? · 2017-06-09T13:33:52.651Z · score: 2 (2 votes) · LW · GW

But what happens when the low-intensity conversation and the brainwashing are the same thing?

That's definitely bad in cases where people explicitly care about goal preservation. But only self-proclaimed consequentialists do.

The other cases are more fuzzy. Memeplexes like rationality, EA/utilitarianism, religious fundamentalism, political activism, or Ayn Rand type stuff, are constantly 'radicalizing' people, turning them from something sort-of-agenty-but-not-really into self-proclaimed consequentialist agents. Whether that is in line with people's 'real' desires is to a large extent up for interpretation, though there are extreme cases where the answer seems clearly 'no.' Insofar as recruiting strategies are concerned, we can at least condemn propaganda and brain washing because they are negative-sum (but the lines might again be blurry).

It is interesting that people don't turn into self-proclaimed consequentialists on their own without the influence of 'aggressive' memes. This just goes to show that humans aren't agents by nature, and that an endeavor of "extrapolating your true consequentialist preferences" is at least partially about adding stuff that wasn't previously there rather than discovering something that was hidden. That might be fine, but we should be careful to not unquestioningly assume that this automatically qualifies as "doing people a favor." This, too, is up for interpretation to at least some extent. The argument for it being a favor is presented nicely here. The counterargument is that satisficers often seem pretty happy and who are we to maneuver them into a situation where they cannot escape their own goals and always live for the future instead of the now. (Technically people can just choose whichever consequentialist goal that is best fulfilled with satisficing, but I could imagine that many preference extrapolation processes are set up in a way that make this an unlikely outcome. For me at least, learning more about philosophy automatically closed some doors.)

Comment by lukas_gloor on Net Utility and Planetary Biocide · 2017-04-10T18:42:01.806Z · score: 2 (2 votes) · LW · GW

So is this discipline basically about ethics of imposing particular choices on other people (aka the "population")? That makes it basically the ethics of power or ethics of the ruler(s).

That's an interesting way to view it, but it seems accurate. Say God created the world, then contractualist ethics or ethics of cooperation didn't apply to him, but we'd get a sense of what his population ethical stance must have been.
No one ever gets asked whether they want to be born. This is one of the issues where there is no such thing as "not taking a stance;" how we act in our lifetimes is going to affect what sort of minds there will or won't be in the far future. We can discuss suggestions and try to come to a consensus of those currently in power, but future generations are indeed in a powerless position.

Comment by lukas_gloor on Net Utility and Planetary Biocide · 2017-04-10T17:21:04.140Z · score: 2 (2 votes) · LW · GW

I still don't understand what that means. Are you talking about believing that other people should have particular ethical views and it's bad if they don't?

I'm trying to say that other people are going to disagree with you or me about how to assess whether a given life is worth continuing or worth bringing into existence (big difference according to some views!), and on how to rank populations that differ in size and the quality of the lives in them. These are questions that the discipline of population ethics deals with, and my point is that there's no right answer (and probably also no "safe" answer where you won't end up disagreeing with others).

This^^ is all about a "morality as altruism" view, where you contemplate what it means to "make the world better for other beings." I think this part is subjective.

There is also a very prominent "morality as cooperation/contract" view, where you contemplate the implications of decision algorithms correlating with each other, and notice that it might be a bad idea to adhere to principles that lead to outcomes worse for everyone in expectation provided that other people (in sufficiently similar situations) follow the same principles. This is where people start with whatever goals/preferences they have and derive reasons to be nice and civil to others (provided they are on an equal footing) from decision theory and stuff. I wholeheartedly agree with all of this and would even say it's "objective" – but I would call it something like "pragmatics for civil society" or maybe "decision theoretic reasons for cooperation" and not "morality," which is the term I reserve for (ways of) caring about the well-being of others.

It's pretty clearly apparent that "killing everyone on earth" is not in most people's interest, and I appreciate that people are pointing this out to the OP. However, I think what the replies are missing is that there is a second dimension, namely whether we should be morally glad about the world as it currently exists, and whether e.g. we should make more worlds that are exactly like ours, for the sake of the not-yet-born inhabitants of these new worlds. This is what I compared to voting on what the universe's playlist of experience moments should be like.

But I'm starting to dislike the analogy. Let's say that existing people have aesthetic preferences about how to allocate resources (this includes things like wanting to rebuild galaxies into a huge replica of Simpsons characters because it's cool), and of these, a subset are simultaneously also moral preferences in that they are motivated by a desire to do good for others, and these moral preferences can differ in whether they count it as important to bring about new happy beings or not, or how much extra happiness is needed to altruistically "compensate" (if that's even possible) for the harm of a given amount of suffering, etc. And the domain where people compare each others' moral preferences and try to see if they can get more convergence through arguments and intuition pumps, in the same sense as someone might start to appreciate Mozart more after studying music theory or whatever, is population ethics (or "moral axiology").

Comment by lukas_gloor on Net Utility and Planetary Biocide · 2017-04-10T15:38:23.433Z · score: 1 (1 votes) · LW · GW

The survival instinct part, very probably, but the "constant misery" part doesn't look likely.

Agree, I meant to use the analogy to argue for "Natural selection made sure that even those beings in constant misery may not necessarily exhibit suicidal behavior." (I do hold the view that animals in nature suffer a lot more than they are happy, but that doesn't follow from anything I wrote in the above post.)

Are we talking about humans now? I thought the OP considered humans to be more or less fine, it's the animals that were the problem.

Right, but I thought your argument about sentient beings not committing suicide refers to humans primarily. At least with regard to humans, exploring why the appeal to low suicide rates may not show much seems more challenging. Animals not killing themselves could just be due to them lacking the relevant mental concepts.

I have no idea what this means.

It's a metaphor. Views on population ethics reflect what we want the "playlist" of all the universe's experience moments to be like, and there's no objective sense of "net utility being positive" or not. Except when you question-beggingly define "net utility" in a way that implies a conclusion, but then anyone who disagrees will just say "I don't think we should define utility that way" and you're left arguing over the same differences. That's why I called it "aesthetic" even though that feels like it doesn't give the seriousness of our moral intuitions due justice.

Ah. Well then, let's kill everyone who fails our aesthetic judgment..?

(And force everyone to live against their will if they do conform to it?) No; I specifically said not to do that. Viewing morality as subjective is supposed to make people more appreciative that they cannot go around completely violating the preferences of those they disagree with without the result being worse for everyone.

Comment by lukas_gloor on Net Utility and Planetary Biocide · 2017-04-10T12:59:21.252Z · score: 5 (5 votes) · LW · GW

Most sentient creatures can commit suicide. The great majority don't. You think they are all wrong?

(I don’t think this is about right or wrong. But we can try to exchange arguments and intuition pumps and see if someone changes their mind.)

Imagine a scientist that engineered artificial beings destined to a life in constant misery but equipped with an overriding desire to stay alive and conscious. I find that such an endeavor would not only be weird or pointless, but something I’d strongly prefer not to happen. Maybe natural selection is quite like that scientist; it made sure organisms don’t kill themselves not by making it easy for everyone to be happy, but by installing instinctual drives for survival.

Further reasons (whether rational or not) to not commit suicide despite having low well-being include fear of consequences in an afterlife, impartial altruistic desires to do something good in the world, “existentialist" desires to not kill oneself without having lived a meaningful life, near-view altruistic desires to not burden one’s family or friends, fear of dying, etc. People often end up not doing things that would be good for them and their goals due to trivial inconveniences, and suicide seems more “inconvenient" than most things people get themselves to do in pursuit of their interests. Besides, depressed people are not exactly known for high willpower.

Biases with affective forecasting and distorted memories could also play a role. (My memories from high school are pretty good even though when you’d travel back and ask me how I’m doing, most of the time the reply would be something like “I’m soo tired and don’t want to be here!.”)

Then there’s influence from conformity: I saw a post recently about a guy in Japan who regularly goes to a suicide hotspot to prevent people from jumping. Is he doing good or being an asshole? Most people seem to have the mentality that suicide is usually (or always even) bad for the person who does it. While there are reasons to be very careful with irreversible decisions – and certainly many suicides are impulsive and therefore at high risk of bias – it seems like there is an unreasonably strong anti-suicide ideology. Not to mention the religious influences on the topic.

All things considered, it wouldn’t surprise me if some people also just talk themselves out of suicide with whatever they manage to come up with, whether that is rational given their reflective goals or not. Relatedly, another comment here advocates to try change what you care about in order to avoid being a Debbie Downer to yourself and others: http://lesswrong.com/r/discussion/lw/ovh/net_utility_and_planetary_biocide/dqub

Also relevant is whether, when evaluating the value of a person’s life, are we going with overall life satisfaction or the average momentary well-being? Becoming a mother expectedly helps with the former but is bad for the latter – tough choice.

Caring substantially about anything other than one’s own well-being makes suicide the opposite of a “convergent drive” – agents whose goals include facets of the outside world will want to avoid killing themselves at high costs, because that would prevent them from further pursuit of these goals. We should therefore distinguish between “Is a person’s life net positive according to the person’s goals?” and “Is a life net positive in terms of all the experience moments it adds to the universe’s playlist?” The latter is not an empirical question; it’s more of an aesthetic judgment relevant to those who want to pursue a notion of altruism that is different from just helping others go after their preferences, and instead includes concern for (a particular notion of) well-being.

This will inevitably lead to “paternalistic” judgments where you want the universe’s playlist to be a certain way, conflicting with another agent’s goals. Suppose my life is very happy but I don’t care much for staying alive – then some would claim I have an obligation to continue living, and I’d be doing harm to their preferences if I’m not sufficiently worried about personal x-risks. So the paternalism goes both ways; it’s not just something that suffering-focused views have to deal with.

Being cooperative in the pursuit of one's goals gets rid of the bad connotations of paternalism. It is sensible to think that net utility is negative according to one's preferences for the playlist of experience moments, while not concluding that this warrants strongly violating other people's preferences.

Also relevant: SSC's "How Bad Are Things?".

Comment by lukas_gloor on What can go wrong with the following protocol for AI containment? · 2016-01-13T06:55:54.164Z · score: 2 (2 votes) · LW · GW

Ethically, I think one could justify all this. It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is, is not worse than not having been brought into the world in the first place?

At least some of them will tell you they had rather not been born. But maybe you'll want to equip these orcs with an even stronger drive for existence, so they never choose death over life even if you torture them; would that make it more ok? I suspect not, so something with the "Do they complain to having been created?" approach seems flawed imo. Creating beings with a strong preference for existence would make it too easy to legitimize doing with them whatever you want.

How about imagining beings who at any moment are intrinsically indifferent to whether they exist or not? They only won't complain as long as they don't suffer. Perhaps that's too extreme as well, but if it's only simple/elegant rules you're looking for, this one seems more acceptable to me than the torture-bots above.

Comment by lukas_gloor on The Triumph of Humanity Chart · 2015-10-27T14:22:36.002Z · score: 0 (4 votes) · LW · GW

you made that comment not because it was substantive or seriously detracted from the post, but because it was an ideological matter with which you disagreed with the author

I generally dislike it when people talk about moral views that way, even if they mention views I support. I might be less inclined to call it out in a case where I intuitively strongly agree, but I still do it some of the time. I agree it wasn't the main point of his post, I never denied that. In fact I wrote that I agree the developments are impressive. By that, I meant the graphs. Since when is it discouraged to point out minor criticism in a post? The fact that I singled out this particular post to make a comment that would maybe fit just as well elsewhere just happens to be a coincidence.

Taking it as an opportunity to try to start an ideological fight is just bad manners.

No one is even talking about arguments or intuition-pumps for or against any of the moral views mentioned. I wasn't "starting an ideological flight", I was making meta remark about the way people present moral views. If anything, I'd be starting an ideological fight about my metaethical views and what I consider to be a productive norm of value-related discourse on this site.

Comment by lukas_gloor on The Triumph of Humanity Chart · 2015-10-27T01:03:35.976Z · score: 0 (6 votes) · LW · GW

Maybe I'm wrong, but my guess is that if someone wrote "Life is neutral; some states are worse than death, and adding new happy people is nice but not important", that person would be called out, and the post would receive a large portion of downvotes. I'm not sure about the downvotes (personally I didn't even downvote the OP), but I think pointing out the somewhat controversial nature of such a blanket statement is definitely a good thing. Would you oppose this as well (similarly aggressively)?

We could talk about whether my view of what's controversial or not is biased. I would not object to someone saying "Murder is bad" without prefacing it with "Personally, I think", even though I'm sure most uncontrolled AIs will disagree with this for reasons I cannot find any faults in. But assuming that we're indeed talking about an issue where there's no consensus among EAs, then to me it seems epistemically appropriate to at least hint at this lack of consensus, just like you do when you talk about a scientific hypothesis that is controversial among experts. And it makes even more sense to hint at this, if some people don't even realize that there's a lack of consensus. For whatever reason, EAs that came to EA through LW care much more about preventing death than EAs that found to EA through e.g. Peter Singer's books. And I thought it might be interesting to LW-originating EAs that a significant fraction of EAs "from elsewhere" feel alienated by the way some issues are being discussed on LW. Whether they give a shit about it is a different question of course.

Comment by lukas_gloor on The Triumph of Humanity Chart · 2015-10-26T23:46:26.908Z · score: -1 (5 votes) · LW · GW

I at one point phrased it "comes with a doubling of the (larger) rest of the population" to make it more clear, but deleted it for a reason I have no introspective access to.

And if we got where we are in fractional terms by adding rich people without actually cutting into the number of poor people, that would be bad too, though not as bad as murdering them.

It would, obviously, if there are better alternatives. In consequentialism, everything where you have better viable alternatives is bad to some extent. What I meant is: If the only way to double the rest of the population is by also doubling the part that's in extreme poverty, then the OP's values implies that it would be a good thing. I'm not saying this view is crazy, I'm just saying that creating the impression that it's some sort of LW-consensus is mistaken. And in a latter point I added that it makes me, and probably also other people with different values, feel unwelcome. It's bad for an open dialogue on values.

Comment by lukas_gloor on The Triumph of Humanity Chart · 2015-10-26T18:33:22.022Z · score: 2 (14 votes) · LW · GW

I'm referring to the text, not the graph(s). The two paragraphs between the graphs imply

that doubling extreme poverty would be a good thing if it comes with a doubling of the rest of the population.

He does not preface any of it by saying "I think", he just presents it as obvious. Well, I know for a fact that there are many people who self-identify as rationalists to whom this is not obvious at all. It also alienates me that people here, according to the karma distributions, don't seem to get my point.

Comment by lukas_gloor on The Triumph of Humanity Chart · 2015-10-26T12:47:36.432Z · score: 0 (18 votes) · LW · GW

The developments you highlight are impressive indeed. But you're making it sound as though everyone should agree with your normative judgments. You imply that doubling extreme poverty would be a good thing if it comes with a doubling of the rest of the population. This view is not uncontroversial and many EAs would disagree with it. Please respect that other people will disagree with your value judgments.

Comment by lukas_gloor on Effective Altruism from XYZ perspective · 2015-07-09T09:14:13.723Z · score: 2 (2 votes) · LW · GW

I can't find the original post about the buck stopping after a bit of Googling. I'd like to keep looking into this!

The post I'm referring to is here, but I should note that EY used the phrase in a different context, and my view on terminal values does not reflect his view. My critique of the idea that all human values are complex is that it presupposes too narrow of an interpretation of "values". Let's talk about "goals" instead, defined as follows:

Imagine you could shape yourself and the world any way you like, unconstrained by the limits of what is considered feasible and what not, what would you do? Which changes would you make? The result describes your ideal world, it describes everything that is at all important to you. However, it does not yet describe how important these things are in relation to other things you consider important. So imagine that you had the same super-powers, but this time they are limited: You cannot make every change you had in mind, you need to prioritize some changes over others. Which changes would be most important to you? The outcome of this thought experiment approximates your goals. (This question is of course a very difficult one, and what someone says after thinking about it for five minutes might be quite different from what someone would choose if she had heard all the ethical arguments in the world and thought about the matter for a very long time. If you care about making decisions for good/informed reasons, you might want to refrain from committing too much to specific answers and instead give weight to what a better informed version of yourself would say after longer reflection.)

I took the definition from this blogpost I wrote a while back. The comment section there contains a long discussion on a similar issue where I elaborate on my view of terminal values.

Anyway, the way my definition of "goals" seems to differ from the interpretation of "values" in the phrase "human values are complex" is that "goals" allow for self-modification. If I could, I would self-modify into a utilitarian super-robot, regardless of whether it was still conscious or not. According to "human values are complex", I'd be making a mistake in doing so. What sort of mistake would I be making?

The situation is as follows: Unlike some conceivable goal-architectures we might choose for artificial intelligence, humans do not have a clearly defined goal. When you ask people on the street what their goals are in life, they usually can't tell you, and if they do tell you something, they'll likely revise it as soon as you press them with an extreme thought experiment. Many humans are not agenty. Learning about rationality and thinking about personal goals can turn people into agents. How does this transition happen? The "human values are complex" theory seems to imply that we introspect, find out that we care/have intuitions about 5+ different axes of value, and end up accepting all of them for our goals. This is probably how quite a few people are doing it, but they're victim of a gigantic typical mind fallacy if they think that's the only way to do it. Here's what happened to me personally (and incidentally, to about "20+" agents I know personally and to all the hedonistic utilitarians who are familiar with Lesswrong content and still keep their hedonistic utilitarian goals):

I started out with many things I like (friendship, love, self-actualization, non-repetitiveness, etc) plus some moral intuitions (anti-harm, fairness). I then got interested in ethics and figuring out the best ethical theory. I turned into a moral anti-realist soon, but still wanted to find a theory that incorporates my most fundamental intuitions. I realized that I don't care intrinsically about "fairness" and became a utiltiarian in terms of my other-regarding/moral values. I then had the decision to what extent I should invest into utilitarianism/altruism, and how much into values that are more about me specifically. I chose altruism, because I have a strong, OCD-like tendency for doing things either fully or not at all, and I thought saving for retirement, eating healthy etc is just as bothersome as trying to be altruistic, because I don't strongly self-identify with a 100-year-old version of me anyway, so might as well try to make sure that all future sentience will be suffering-free. I still take a lot of care about my long-term happiness and survival, but much less so than if I had the goal to live forever, and as I said I would instantly press the "self-modify into utilitarian robot" button, if there was one. I'd be curious to hear whether I am being "irrational" somewhere, whether there was a step involved that was clearly mistaken. I cannot imagine how that would be the case, and the matter seems obvious to me. So every time I read the link "human values are complex", it seems like an intellectually dishonest discussion stopper to me.

Comment by lukas_gloor on Effective Altruism from XYZ perspective · 2015-07-08T11:14:07.157Z · score: 3 (3 votes) · LW · GW

I get the impression that you're not well informed about EA and the diverse stances EAs have, and that you're singling out an idiosyncratic interpretation and giving it an unfair treatment.

Effective altruism is inefficient and socially suboptima.

The first link you cite talks about public good provision within the current economy. How do you conclude from this that e.g. the effective altruists focused on AI safety are being inefficient? And even if you're talking about e.g. donations to GiveWell's recommended charities, how does the first link establish that it's inefficient? Sick people in Africa usually tend to not be included in calculations about economical common goods, but EAs care about more than just their country's economy.

Effective Altruism isn’t utilitarian. It’s explicitly welfarist and given the complexity of individual value, probably undermines overall utility, including your own.

FYI, you're using highly idiosyncratic terminology here. Outside of LW, "utilitarianism" is the name for a family of consequentialist views that also include solely welfare-focused varieties like negative hedonistic utilitarianism or classical hedonistic utilitarianism.

In addition, you repeat the mantra that it's an objective fact that "human values are complex". That's misleading, what's complex is human moral intuitions. When you define your goal in life, no one forces you to incorporate every single intuition that you have. You may instead choose to regard some of your intuitions as more important than others, and thereby end up with a utility function of low complexity. Your terminal values are not discovered somewhere within you (how would that process work, exactly?), they are chosen. As EY would say, "the buck has to stop somewhere".

EA is prioritarian.

This claim is wrong, only about 5% of the EAs I know are prioritiarians (I have met close to 100 EAs personally). And the link you cite doesn't support that EAs are prioritarians either, it just argues that you get more QALYs from donating to AMF than from doing other things.

Comment by lukas_gloor on Moral Anti-Epistemology · 2015-05-02T23:29:35.014Z · score: 0 (0 votes) · LW · GW

The sad thing is it probably will (the rationalist's burden: aspiring to be more rational makes rationalizating harder, and you can't just tweak your moral map and your map of the just world/universe to fit your desired (self-)image).

What is it that counts, revealed preferences or stated preferences or preferences that are somehow idealized (if the person knew more, was smarter etc.)? I'm not sure the last option can be pinned down in a non-arbitrary way. This would leave us with revealed preferences and stated preferences, even though stated preferences are often contradictory or incomplete. It would be confused to think that one type of preferences are correct, whereas the others aren't. There are simply different things going on, and you may choose to focus on one or the other. Personally I don't intrinsically care about making people more agenty, but I care about it instrumentally, because it turns out that making people more agenty often increases their (revealed) concern for reducing suffering.

What does this make of the claim under discussion, that deontology could sometimes/often be a form of moral rationalizing? The point still stands, but it is qualified with a caveat, namely that it is only a rationalizing if we are talking about (informed/complete) stated preferences. For whatever that's worth. On LW, I assume it is worth a lot to most people, but there's no mistake being made if it isn't for someone.

Comment by lukas_gloor on Moral Anti-Epistemology · 2015-05-02T13:39:33.964Z · score: 0 (0 votes) · LW · GW

I think if you read all my comments here again, you will see enough qualifications in my points that suggest that I'm aware of and agree with the point you just made. My point on top of that is simply that often, people would consider these things to be biases under reflection, after they learn more.

Comment by lukas_gloor on Moral Anti-Epistemology · 2015-05-02T10:53:52.268Z · score: 0 (0 votes) · LW · GW

Good points. My entire post assumes that people are interested in figuring out what they would want to do in every conceivable decision-situation. That's what I''d call "doing ethics", but you're completely correct that many people do something very different. Now, would they keep doing what they're doing if they knew exactly what they're doing and not doing, i.e. if they were aware of the alternatives? If they were aware of concepts like agentyness? And if yes, what would this show?

I wrote down some more thoughts on this in this comment. As a general reply to your main point: Just because people act as though they are interested in x rather than y doesn't mean that they wouldn't rather choose y if they were more informed. And to me, choosing something because one is not optimally informed seems like a bias, which is why I thought the comparison/the term "moral anti-epistemology" has merits. However, under a more Panglossian interpretation of ethics, you could just say that people want to do what they do, and that this is perfectly fine. I depends on how much you value ethical reflection (there is quite a rabbit hole to go down to, actually, having to do with the question whether terminal values are internal or chosen).

Comment by lukas_gloor on Moral Anti-Epistemology · 2015-05-02T10:48:02.032Z · score: 0 (0 votes) · LW · GW

I wasn't suggesting giving up on ethics, I was suggesting giving up on utilitarianism.

What I wrote concerned giving up on caring about suffering, which is very closely related with utilitarianism.

I think there are other approaches that do better than utilitarianism at its weak areas.

Maybe according to your core intuitions, but not for me as far as I know.

but it does show that your theory doesn'thave any unique status as the default or only theory of de facto deontology.

But my main point was that deontology is too vague for a theory that specifies how you would want to act in every possible situation, and that it runs into big problems (and lots of "guesswork") if you try to make it less vague. Someone pointed out that I'm misunderstanding what people's ethical systems are intended to do. Maybe, but I think that's exactly my point: People don't even think about what they would want to do in every possible situation because they're more interested in protecting certain status quos rather than figuring out what it is that they actually want to accomplish. Is "protecting certain status quos" their true terminal value? Maybe, but how would they know if they know if this question doesn't even occur to them? This is exactly what I meant by moral anti-epistemology: you believe things and follow rules because the alternative is daunting/complicated and possibly morally demanding.

The best objection to my view is indeed that I'm putting arbitrary and unreasonable standards on what people "should" be thinking about. In the end, it also arbitrary what you decide to call a terminal value, and which definition of terminal values you find relevant. For instance, whether it needs to be something that people reach on reflection, or whether it is simply what people tell you they care about. Are people who never engage in deep moral reasoning making a mistake? Or are they simply expressing their terminal value of wanting to avoid complicated and potentially daunting things because they're satisficers? That's entirely up to your interpretation. I think that a lot of these people, if you were to nudge them towards thinking more about the situation, would at least in some respect be grateful for that, and this, to me, is reason to consider deontology as something irrational in respect to a conception of terminal values that takes into account a certain degree of reflection about goals.

Comment by lukas_gloor on The paperclip maximiser's perspective · 2015-05-01T09:32:31.293Z · score: 0 (0 votes) · LW · GW

Good point. May I ask, is "explicit utility function" standard terminology, and if yes, is there a good reference to it somewhere that explains it? It took me a long time until I realized the interesting difference between humans, who engage in moral philosophy and often can't tell you what their goals are, and my model of paperclippers. I also think that not understanding this difference is a big reason why people don't understand the orthagonality thesis.