Healing vs. exercise analogies for emotional work 2020-01-27T19:10:01.477Z · score: 41 (21 votes)
The two-layer model of human values, and problems with synthesizing preferences 2020-01-24T15:17:33.638Z · score: 67 (21 votes)
Under what circumstances is "don't look at existing research" good advice? 2019-12-13T13:59:52.889Z · score: 71 (21 votes)
A mechanistic model of meditation 2019-11-06T21:37:03.819Z · score: 106 (35 votes)
On Internal Family Systems and multi-agent minds: a reply to PJ Eby 2019-10-29T14:56:19.590Z · score: 38 (16 votes)
Book summary: Unlocking the Emotional Brain 2019-10-08T19:11:23.578Z · score: 183 (76 votes)
Against "System 1" and "System 2" (subagent sequence) 2019-09-25T08:39:08.011Z · score: 90 (27 votes)
Subagents, trauma and rationality 2019-08-14T13:14:46.838Z · score: 69 (36 votes)
Subagents, neural Turing machines, thought selection, and blindspots 2019-08-06T21:15:24.400Z · score: 63 (22 votes)
On pointless waiting 2019-06-10T08:58:56.018Z · score: 43 (22 votes)
Integrating disagreeing subagents 2019-05-14T14:06:55.632Z · score: 92 (26 votes)
Subagents, akrasia, and coherence in humans 2019-03-25T14:24:18.095Z · score: 93 (29 votes)
Subagents, introspective awareness, and blending 2019-03-02T12:53:47.282Z · score: 66 (25 votes)
Building up to an Internal Family Systems model 2019-01-26T12:25:11.162Z · score: 161 (61 votes)
Book Summary: Consciousness and the Brain 2019-01-16T14:43:59.202Z · score: 107 (37 votes)
Sequence introduction: non-agent and multiagent models of mind 2019-01-07T14:12:30.297Z · score: 91 (35 votes)
18-month follow-up on my self-concept work 2018-12-18T17:40:03.941Z · score: 58 (17 votes)
Tentatively considering emotional stories (IFS and “getting into Self”) 2018-11-30T07:40:02.710Z · score: 39 (11 votes)
Incorrect hypotheses point to correct observations 2018-11-20T21:10:02.867Z · score: 79 (32 votes)
Mark Eichenlaub: How to develop scientific intuition 2018-10-23T13:30:03.252Z · score: 79 (30 votes)
On insecurity as a friend 2018-10-09T18:30:03.782Z · score: 38 (20 votes)
Tradition is Smarter Than You Are 2018-09-19T17:54:32.519Z · score: 68 (24 votes)
nostalgebraist - bayes: a kinda-sorta masterpost 2018-09-04T11:08:44.170Z · score: 24 (8 votes)
New paper: Long-Term Trajectories of Human Civilization 2018-08-12T09:10:01.962Z · score: 34 (16 votes)
Finland Museum Tour 1/??: Tampere Art Museum 2018-08-03T15:00:05.749Z · score: 20 (6 votes)
What are your plans for the evening of the apocalypse? 2018-08-02T08:30:05.174Z · score: 24 (11 votes)
Anti-tribalism and positive mental health as high-value cause areas 2018-08-02T08:30:04.961Z · score: 26 (10 votes)
Fixing science via a basic income 2018-08-02T08:30:04.380Z · score: 30 (14 votes)
Study on what makes people approve or condemn mind upload technology; references LW 2018-07-10T17:14:51.753Z · score: 21 (11 votes)
Shaping economic incentives for collaborative AGI 2018-06-29T16:26:32.213Z · score: 47 (13 votes)
Against accusing people of motte and bailey 2018-06-03T21:31:24.591Z · score: 84 (29 votes)
AGI Safety Literature Review (Everitt, Lea & Hutter 2018) 2018-05-04T08:56:26.719Z · score: 37 (10 votes)
Kaj's shortform feed 2018-03-31T13:02:47.793Z · score: 13 (3 votes)
Helsinki SSC March meetup 2018-03-26T19:27:17.850Z · score: 12 (2 votes)
Is the Star Trek Federation really incapable of building AI? 2018-03-18T10:30:03.320Z · score: 30 (9 votes)
My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms 2018-03-08T07:37:54.532Z · score: 292 (115 votes)
Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk” 2018-02-12T12:30:04.401Z · score: 68 (20 votes)
On not getting swept away by mental content 2018-01-25T20:30:03.750Z · score: 24 (8 votes)
Papers for 2017 2018-01-04T13:30:01.406Z · score: 32 (8 votes)
Paper: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering 2018-01-03T14:39:18.024Z · score: 1 (1 votes)
Paper: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering 2018-01-03T13:57:55.979Z · score: 16 (6 votes)
Fixing science via a basic income 2017-12-08T14:20:04.623Z · score: 38 (11 votes)
Book review: The Upside of Your Dark Side: Why Being Your Whole Self–Not Just Your “Good” Self–Drives Success and Fulfillment 2017-12-04T13:10:06.995Z · score: 27 (8 votes)
Meditation and mental space 2017-11-06T13:10:03.612Z · score: 26 (7 votes)
siderea: What New Atheism says 2017-10-29T10:19:57.863Z · score: 12 (3 votes)
Postmodernism for rationalists 2017-10-17T12:20:36.139Z · score: 24 (1 votes)
Anti-tribalism and positive mental health as high-value cause areas 2017-10-17T10:20:03.359Z · score: 30 (10 votes)
You can never be universally inclusive 2017-10-14T11:30:04.250Z · score: 34 (10 votes)
Meaningfulness and the scope of experience 2017-10-05T11:30:03.863Z · score: 35 (14 votes)
Social Choice Ethics in Artificial Intelligence (paper challenging CEV-like approaches to choosing an AI's values) 2017-10-03T17:39:00.683Z · score: 8 (3 votes)


Comment by kaj_sotala on What will happen to supply chains in the era of COVID-19? · 2020-04-02T09:42:23.971Z · score: 8 (5 votes) · LW · GW

Finland has a National Emergency Supply Agency, which maintains its own stockpile of emergency supplies including food, fuel, and healthcare supplies; they recently released respirators from the supply in response to COVID-19. (They have previously released crop seeds in 2018 after an exceptionally bad growing season had caused a shortage, and fuel in 2005 after Hurricane Katrina caused a reduction in the availability of oil.) They also actively coordinate with the private sector. I am a little unclear about what exactly that collaboration involves, but according to the news, under normal circumstances Finland's largest grocery chains report on their supply situation to the agency twice a day.

The are also legal requirements for drug manufacturers and importers to stockpile medicine: there's e.g. a 10 month emergency supply of antibiotics, a 6 month supply of blood pressure and diabetes medication, and a 3 month supply of respiratory and cancer medicine, among others.

Comment by kaj_sotala on How special are human brains among animal brains? · 2020-04-01T14:51:13.216Z · score: 7 (5 votes) · LW · GW
Humans seem to have much higher degrees of consciousness and agency than other animals, and this may have emerged from our capacities for language. Helen Keller (who was deaf and blind since infancy, and only started learning language when she was 6) gave an autobiographical account of how she was driven by blind impetuses until she learned the meanings of the words “I” and “me”

Possible nitpick depending on how you define consciousness: this excerpt sounds like Keller was conscious in the global workspace sense, but rather lacked something like a phenomenal self-model.

Comment by kaj_sotala on How special are human brains among animal brains? · 2020-04-01T14:48:20.062Z · score: 4 (2 votes) · LW · GW
they also escape feral children, and might escape animals for mundane reasons, like their not having critical periods long enough to learn these aspects of language.

On the other hand, humans having learned language at late ages (e.g. Helen Keller having learned it at 6) suggests that learning language during the critical period isn't a necessary requirement. (the Wikipedia links claims that deaf people who learn language at a late age never master it completely, but assuming that it hasn't been edited by others, the quoted Keller excerpt gives a rather different impression).

Comment by kaj_sotala on What is the point of College? Specifically is it worth investing time to gain knowledge? · 2020-03-24T14:29:28.724Z · score: 2 (1 votes) · LW · GW
From what I understand,even if you were to were do a post-grad in Mechanical and work in research field,you would still NOT be using 90% of your education.

I'm not sure that this is the right way of thinking about it. It's hard to know in advance which parts of your education are going to be useful. If each unit of learning only has a 10% chance of being useful, studying 10 units worth of learning rather than just 1 unit gives you much higher chances of at least 1 of those units being what you need. In that case, the "unused" 9 still weren't wasted, because they increased your odds of knowing something valuable.

You ask

Imagine if you only took courses that would help you in the job(lets say the courses where 30-50% of course content(or any other parameter that you choose) directly helps you) and then look at the time req to train.

but this assumes that you know in advance which of the courses are going to help you with your job. If you've never taken them and don't understand their contents, you might not be able to know this.

Comment by kaj_sotala on Focusing · 2020-03-17T17:53:09.857Z · score: 2 (1 votes) · LW · GW

I haven't looked into him too much, but my impression is that he is reasonably respected as a psychologist, so on psychological topics I would trust his opinion no more and no less than that of any other person who seems to be reasonably respected by other psychologists.

Comment by kaj_sotala on March Coronavirus Open Thread · 2020-03-17T09:52:31.199Z · score: 5 (3 votes) · LW · GW

That probably depends on what your pre-COVID-19 model was?

If I rephrase the question as "how much should COVID-19 update expert models about risks from pandemics", then my impression is that things have proceeded roughly in line with the pre-existing models. The response procedures that are now being activated in several countries are based on plans that were originally made as a reaction to previous diseases such as SARS.

My own update is that although there has been some feet-dragging, overall the national responses have felt stronger and faster than I would have anticipated. The next time that there is a pandemic, such a response should hopefully be more routine and competent, so this makes me more optimistic about our ability to deal with future pandemics.

Comment by kaj_sotala on A game designed to beat AI? · 2020-03-17T05:41:47.934Z · score: 6 (6 votes) · LW · GW

Arimaa was an earlier attempt to do this. Developed in 2003, a computer beat humans in 2015. This site summarizes some of its anti-AI properties as

  • On average there are over 17,000 possible moves compared to about 30 for chess; this significantly limits how deep computers can think, but does not seem to affect humans.
  • Opening books are useless since the starting position is not fixed. There are over 64 million ways to start the game.
  • End game databases are not helpful since a game can end with all pieces still on the board.
  • Research papers on Arimaa suggest it is more of a strategic and positional game with less emphasis on tactics.
Comment by kaj_sotala on What is a School? · 2020-03-15T19:46:52.730Z · score: 3 (2 votes) · LW · GW
The concern that I have heard literally zero people mention is that closing the schools will prevent children from learning.

I don't know if a counterexample from Finland will count as one for your purposes, but for what it's worth, a notable Finnish politician wrote yesterday:

Jos näillä tiedoilla joutuisin päättämään, en sulkisi kouluja, koska lyhytaikainen sulkeminen ei hyödytä mitään ja pitkäaikainen vaarantaa lasten oppimisen. Puoli vuotta lyhempi peruskoulu? Ihan oikeasti?

Which roughly translates as:

If I had to make the decision given what we currently know, I wouldn't close the schools. A short-term closure wouldn't bring any benefit, and a long-term closure would endanger the children's learning. Making elementary school half a year shorter? Are you kidding me?
Comment by kaj_sotala on Rationalists, Post-Rationalists, And Rationalist-Adjacents · 2020-03-15T18:05:02.380Z · score: 8 (4 votes) · LW · GW

"Collaborative truth-seeking"?

There are rationalist-adjacents for whom collaborative truth-seeking on many topics would fail because they're not interested in zooming in so close on a belief. There are post-rationalists for whom collaborative truth-seeking would fail because they can just switch frames on the conversation any time they're feeling stuck. And to try to collaborate on truth-seeking with someone, only to have it fail in either of those ways, is an infuriating feeling for those of us who thought we could take it for granted in the community.
Comment by kaj_sotala on How effective are tulpas? · 2020-03-10T08:11:45.487Z · score: 11 (6 votes) · LW · GW

I'm not sure whether structural dissociation is the right model for tulpas; my own model has been that it is more related to the ability to model other people, in the way that if you know a friend very well you can guess roughly what they might answer to things that you would say, up to the point of starting to have conversations with them in your head. Fiction authors who put extensive effort of modeling their characters often develop spontaneous "tulpas" based on their characters, and I haven't heard of them being any worse off for it. Taylor, Hodges and Kohányi found that while these fiction writers tended to have higher-than-median scores on a test for dissociative experiences, the writers had low scores on the subscales that are particularly diagnostic for dissociative disorders:

The writers also scored higher than general population norms on the Dissociative Experiences Scale. The mean score across all 28 items on the DES in our sample of writers was 18.52 (SD = 16.07), ranging from a minimum of 1.43 to a maximum of 42.14. This mean is significantly higher from the average DES score of 7.8 found in a general population sample of 415 [27], t(48) = 8.05, p < .001. In fact, the writers' scores are closer to the average DES score for a sample of 61 schizophrenics (schizophrenic M = 17.7) [27]. Seven of the writers scored at or above 30, a commonly used cutoff for "normal scores" [29]. There was no difference between men's and women's overall DES scores in our sample, a finding consistent with results found in other studies of normal populations [26].

With these comparisons, our goal is to highlight the unusually high scores for our writers, not to suggest that they were psychologically unhealthy. Although scores of 30 or above are more common among people with dissociative disorders (such as Dissociative Identity Disorder), scoring in this range does not guarantee that the person has a dissociative disorder, nor does it constitute a diagnosis of a dissociative disorder [27,29]. Looking at the different subscales of the DES, it is clear that our writers deviated from the norm mainly on items related to the absorption and changeability factor of the DES. Average scores on this subscale (M = 26.22, SD = 14.45) were significantly different from scores on the two subscales that are particularly diagnostic for dissociative disorders: derealization and depersonalization subscale (M = 7.84, SD = 7.39) and the amnestic experiences subscale (M = 6.80, SD = 8.30), F(1,48) = 112.49, p < .001. These latter two subscales did not differ from each other, F(1, 48) = .656, p = .42. Seventeen writers scored above 30 on the absorption and changeability scale, whereas only one writer scored above 30 on the derealization and depersonalization scale and only one writer (a different participant) scored above 30 on the amnestic experiences scale.

A regression analysis using the IRI subscales (fantasy, empathic concern, perspective taking, and personal distress) and the DES subscales (absorption and changeability, arnnestic experiences, and derealization and depersonalization) to predict overall IIA was run. The overall model was not significant r^2 = .22, F(7, 41) = 1.63, p = .15. However, writers who had higher IIA scores scored higher on the fantasy subscale of IRI, b = .333, t(48) = 2.04, < .05 and marginally lower on the empathic concern subscale, b = -.351, t(48) = -1.82, p < .10 (all betas are standardized). Because not all of the items on the DES are included in one of the three subscales, we also ran a regression model predicting overall IIA from the mean score across DES items. Neither the r^2 nor the standardized beta for total DES scores was significant in this analysis.

That said, I have seen a case where someone made a tulpa with decidedly mixed results, so I agree that it can be risky.

Comment by kaj_sotala on Motte/bailey doctrine is often a byproduct of distributed argumentation · 2020-03-06T12:25:22.287Z · score: 2 (1 votes) · LW · GW

Agreed; I made a similar point here.

Comment by kaj_sotala on Feature suggestion: Could we get notifications when someone links to our posts? · 2020-03-06T07:51:17.340Z · score: 7 (4 votes) · LW · GW

Now I'm thinking that it would be a cool feature to have a page that listed all of your posts ordered by the number of pingbacks they have, and also showed all of those pingbacks on that page.

Comment by kaj_sotala on Does donating to EA make sense in light of the mere addition paradox ? · 2020-02-19T18:12:41.056Z · score: 4 (2 votes) · LW · GW
So basically, the idea here is that it actually makes intuitive moral sense for most EA donors to donate to EA causes ?

Not sure whether every EA would endorse this description, but it's how I think of it, yes.

Comment by kaj_sotala on Does donating to EA make sense in light of the mere addition paradox ? · 2020-02-19T15:59:45.846Z · score: 8 (4 votes) · LW · GW
I can see it intuitively making sense, but barring a comprehensive moral system that can argue for the value of all human life, it sems intuition is not enough. As in, it also intuitively make sense to put 10% of your income into low-yield bonds, so in case one of your family members or friends has a horrible (deadly or severely life-quality diminishing) problem you can help them.

Utilitarianism is not the only system that becomes problematic if you try to formalize it enough; the problem is that there is no comprehensive moral system that wouldn't either run into paradoxical answers, or be so vague that you'd need to fill in the missing gaps with intuition anyway.

Any decision that you make, ultimately comes down to your intuition (that is: decision-weighting systems that make use of information in your consciousness but which are not themselves consciously accessible) favoring one decision or the other. You can try to formulate explicit principles (such as utilitarianism) which explain the principles behind those intuitions, but those explicit principles are always going to only capture a part of the story, because the full decision criteria are too complex to describe.

So the answer to

So basically, I'm kinda stuck understanding under which moral presincts it actually makes sense to donate to EA charities ?

is just "the kinds where donating to EA charities makes more intuitive sense than not donating"; often people describe these kinds of moral intuitions as "utilitarian", but few people would actually endorse all of the conclusions of purely utilitarian reasoning.

Comment by kaj_sotala on Becoming Unusually Truth-Oriented · 2020-02-18T10:15:35.687Z · score: 3 (1 votes) · LW · GW

Could you just take the description of the technique and discuss it in the context of recalling non-dream-related memories? As you note yourself, exactly the same steps seem to work for e.g. recalling events from the previous day.

Comment by kaj_sotala on Becoming Unusually Truth-Oriented · 2020-02-18T10:12:50.025Z · score: 5 (2 votes) · LW · GW

FWIW, when I have done similar practice on real-life memories rather than dreams, I have sometimes checked my recollection of past events with other people who were there, and they have agreed with my account. Of course they could be influenced by my recollection, but I have sometimes recalled details which I have reason to believe that they would otherwise remember much better than me. For example, a friend showed me an episode of a TV series that she had seen several times before, but which I had not. The next day I used this kind of a technique to bring up details about the plot which I didn't remember initially, and she confirmed that I remembered them correctly.

So if the technique seems to provide accurate recall rather than confabulation in a non-dream context, it would seem like a reasonable default guess that it would provide accurate recall in a dream context as well.

Comment by kaj_sotala on [Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? · 2020-02-17T14:16:08.661Z · score: 4 (2 votes) · LW · GW

I thought that the discussion of various fields having different tradeoffs with regard to disclosing vulnerabilities, was particularly interesting:

The framework helps to explain why the disclosure of software vulnerabilities will often be beneficial for security. Patches to software are often easy to create, and can often be made in a matter of weeks. These patches fully resolve the vulnerability. The patch can be easily propagated: for downloaded software, the software is often automatically updated over the internet; for websites, the fix can take effect immediately. In addition, counterfactual possession is likely, because it is normally easier to find a software vulnerability (of which there is a constant supply) than to make a scientific discovery (see [3]). These factors combine to make a reasonable argument in favour of public disclosure of software vulnerabilities, at least after the vendor has been given time to prepare a patch.

Contrasting other fields will further bring into relief the comparatively defence-dominant character of software vulnerability knowledge. We can focus on the tractability of defensive solutions: for certain technologies, there is no low-cost, straightforward, effective defence.

First, consider biological research that provides insight into the manufacture of pathogens, such as a novel virus. A subset of viruses are very difficult to vaccinate for (there is still no vaccination for HIV) or otherwise prepare against. This lowers the defensive benefit of publication, by blocking a main causal pathway by which publication leads to greater protection. This contrasts with the case where an effective treatment can be developed within a reasonable time period, which could weigh in favour of publication [15].
Second, consider cases of hardware based vulnerabilities, such as with kinetic attacks or physical key security. Advances in drone hardware have enabled the disruption of airports and attacks on infrastructure such as oil facilities; these attacks presently lack a cheap, effective solution [18]. This arises in part from the large attack surface of physical infrastructure: the drone’s destination can be one of many possible points on the facility, and it can arrive there via a multitude of different trajectories. This means that the path of the drone cannot simply be blocked.

Moreover, in 2003 a researcher published details about a vulnerability in physical key systems [2]. Apartment buildings, offices, hotels and other large buildings often use systems where a single master-key can open all doors. The research showed how to derive the master-key from a single non-master key. The researcher wrote that there was “no simple or completely effective countermeasure that prevents exploitation of this vulnerability short of replacing a master keyed system with a non-mastered one” ([1]; see [2] for further discussion of counter-measures). The replacement of masterkey systems is a costly solution insofar as master-key systems are useful, and changes are very difficult to propagate: physical key systems distributed across the world would need to be manually updated

Finally, consider the policy question of whether one should have published nuclear engineering research, such as on uranium enrichment, in the 1960s. For countries like India and Pakistan, this would have increased, not decreased, their potential to destroy each others’ cities, due to the lack of defensive solutions: as with certain diseases, nuclear bombs cannot be adequately protected against. Moreover, for the minor protections against nuclear bombs that exist, these can be pursued without intricate knowledge as to how nuclear bombs are manufactured: there is low transferability of offensive into defensive knowledge. For example, a blueprint for the design of a centrifuge does not help one build a better defensive bunker. Overall, if both a potential defender and potential attacker are given knowledge that helps them build nuclear weapons, that knowledge is more useful for making an attack than protecting against an attack: the knowledge is offense-biased.

Differences across fields will shape the security value of publication, which can influence disclosure norms among security-minded scientists and policymakers. The Manhattan Project was more secretive than locksmiths and influenza researchers, who are in turn often more secretive than those finding vulnerabilities in software. Indeed, there was a culture clash between the researcher who published the flaw in the master-key system, above, who came from a computer security background, and the locksmiths who accused him of being irresponsible. The different disclosure cultures exist in the form of default practices, but also in common refrains - for example, language about the virtues of “studying”a problem, or the value of users being empowered by disclosure to “make decisions for themselves”. Such language embeds implicit answers to the framework given in this section, and therefore caution should be exercised when importing concepts and language from other fields.
Comment by kaj_sotala on The Catastrophic Convergence Conjecture · 2020-02-17T13:42:49.044Z · score: 7 (3 votes) · LW · GW
Human values are complicated and fragile

It's not clear to me whether you actually meant to suggest this as well, but this line of reasoning makes me wonder if many of our values are actually not that complicated and fragile after all, instead being to connected to AU considerations. E.g. self-determination theory's basic needs of autonomy, competence and relatedness seem like different ways of increasing your AU, and the boredom example might not feel catastrophic because of some highly arbitrary "avoid boredom" bit in the utility function, but rather because looping a single experience over and over isn't going to help you maintain your ability to avoid catastrophes. (That is, our motivations and values optimize for maintaining AU among other things, even if that is not the thing that those values feel like from the inside.)

Comment by kaj_sotala on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T08:47:19.144Z · score: 5 (2 votes) · LW · GW

See also Mercier & Sperber 2011 on confirmation bias:

... an absence of reasoning is to be expected when people already hold some belief on the basis of perception, memory, or intuitive inference, and do not have to argue for it. Say, I believe that my keys are in my trousers because that is where I remember putting them. Time has passed, and they could now be in my jacket, for example. However, unless I have some positive reason to think otherwise, I just assume that they are still in my trousers, and I don’t even make the inference (which, if I am right, would be valid) that they are not in my jacket or any of the other places where, in principle, they might be. In such cases, people typically draw positive rather than negative inferences from their previous beliefs. These positive inferences are generally more relevant to testing these beliefs. For instance, I am more likely to get conclusive evidence that I was right or wrong by looking for my keys in my trousers rather than in my jacket (even if they turn out not to be in my jacket, I might still be wrong in thinking that they are in my trousers). We spontaneously derive positive consequences from our intuitive beliefs. This is just a trusting use of our beliefs, not a confirmation bias (see Klayman & Ha 1987). [...]

One of the areas in which the confirmation bias has been most thoroughly studied is that of hypothesis testing, often using Wason’s rule discovery task (Wason 1960). In this task, participants are told that the experimenter has in mind a rule for generating number triples and that they have to discover it. The experimenter starts by giving participants a triple that conforms to the rule (2, 4, 6). Participants can then think of a hypothesis about the rule and test it by proposing a triple of their own choice. The experimenter says whether or not this triple conforms to the rule. Participants can repeat the procedure until they feel ready to put forward their hypothesis about the rule. The experimenter tells them whether or not their hypothesis is true. If it is not, they can try again or give up.

Participants overwhelmingly propose triples that fit with the hypothesis they have in mind. For instance, if a participant has formed the hypothesis “three even numbers in ascending order,” she might try 8, 10, 12. As argued by Klayman and Ha (1987), such an answer corresponds to a “positive test strategy” of a type that would be quite effective in most cases. This strategy is not adopted in a reflective manner, but is rather, we suggest, the intuitive way to exploit one’s intuitive hypotheses, as when we check that our keys are where we believe we left them as opposed to checking that they are not where it follows from our belief that they should not be. What we see here, then, is a sound heuristic rather than a bias.

This heuristic misleads participants in this case only because of some very peculiar (and expressly designed) features of the task. What is really striking is the failure of attempts to get participants to reason in order to correct their ineffective approach. It has been shown that, even when instructed to try to falsify the hypotheses they generate, fewer than one participant in ten is able to do so (Poletiek 1996; Tweney et al. 1980). Since the hypotheses are generated by the participants themselves, this is what we should expect in the current framework: The situation is not an argumentative one and does not activate reasoning. However, if a hypothesis is presented as coming from someone else, it seems that more participants will try to falsify it and will give it up much more readily in favor of another hypothesis (Cowley & Byrne 2005). The same applies if the hypothesis is generated by a minority member in a group setting (Butera et al. 1992). Thus, falsification is accessible provided that the situation encourages participants to argue against a hypothesis that is not their own. [...]

When one is alone or with people who hold similar views, one’s arguments will not be critically evaluated. This is when the confirmation bias is most likely to lead to poor outcomes. However, when reasoning is used in a more felicitous context – that is, in arguments among people who disagree but have a common interest in the truth – the confirmation bias contributes to an efficient form of division of cognitive labor.

When a group has to solve a problem, it is much more efficient if each individual looks mostly for arguments supporting a given solution. They can then present these arguments to the group, to be tested by the other members. This method will work as long as people can be swayed by good arguments, and the results reviewed in section 2 show that this is generally the case. This joint dialogic approach is much more efficient than one where each individual on his or her own has to examine all possible solutions carefully. The advantages of the confirmation bias are even more obvious given that each participant in a discussion is often in a better position to look for arguments in favor of his or her favored solution (situations of asymmetrical information). So group discussions provide a much more efficient way of holding the confirmation bias in check. By contrast, the teaching of critical thinking skills, which is supposed to help us overcome the bias on a purely individual basis, does not seem to yield very good results (Ritchart & Perkins 2005; Willingham 2008).
Comment by kaj_sotala on Demons in Imperfect Search · 2020-02-12T15:00:19.227Z · score: 5 (2 votes) · LW · GW


Why are boys and girls born in roughly equal numbers? (Leaving aside crazy countries that use artificial gender selection technologies.) To see why this is surprising, consider that 1 male can impregnate 2, 10, or 100 females; it wouldn't seem that you need the same number of males as females to ensure the survival of the species. This is even more surprising in the vast majority of animal species where the male contributes very little to raising the children—humans are extraordinary, even among primates, for their level of paternal investment. Balanced gender ratios are found even in species where the male impregnates the female and vanishes into the mist.

Consider two groups on different sides of a mountain; in group A, each mother gives birth to 2 males and 2 females; in group B, each mother gives birth to 3 females and 1 male. Group A and group B will have the same number of children, but group B will have 50% more grandchildren and 125% more great-grandchildren. You might think this would be a significant evolutionary advantage.

But consider: The rarer males become, the more reproductively valuable they become—not to the group, but to the individual parent. Every child has one male and one female parent. Then in every generation, the total genetic contribution from all males equals the total genetic contribution from all females. The fewer males, the greater the individual genetic contribution per male. If all the females around you are doing what's good for the group, what's good for the species, and birthing 1 male per 10 females, you can make a genetic killing by birthing all males, each of whom will have (on average) ten times as many grandchildren as their female cousins.

So while group selection ought to favor more girls, individual selection favors equal investment in male and female offspring.

Comment by kaj_sotala on What can the principal-agent literature tell us about AI risk? · 2020-02-12T10:59:23.527Z · score: 11 (6 votes) · LW · GW

Curated. This post represents a significant amount of research, looking into the question of whether an established area of literature might be informative to concerns about AI alignment. It looks at that literature, examines its relevance in light of the questions that have been discussed so far, and checks the conclusions with existing domain experts. Finally, it suggests further work that might provide useful insights to these kinds of questions.

I do have the concern that currently, the post relies a fair bit on the reader trusting the authors to have done a comprehensive search - the post mentions having done "extensive searching", but besides the mention of consulting domain experts, does not elaborate on how that search process was carried out. This is a significant consideration since a large part of the post's conclusions rely on negative results (there not being papers which examine the relevant assumptions). I would have appreciated seeing some kind of a description of the search strategy, similar in spirit to the search descriptions included in systematic reviews. This would have allowed readers to both reproduce the search steps, as well as notice any possible shortcomings that might have led to relevant literature being missed.

Nonetheless, this is an important contribution, and I'm very happy both to see this kind of work done, as well as it being written up in a clear form on LW.

Comment by kaj_sotala on How to Frame Negative Feedback as Forward-Facing Guidance · 2020-02-10T22:19:41.232Z · score: 10 (4 votes) · LW · GW
When you're thinking about how to tell Fred that he's talking too much in staff meetings, start by asking yourself what it would look like if Fred were exceptionally awesome at that instead of deficient. This helps you visualize a complete forward-to-backward axis. Then you can frame your message to Fred in terms of moving forward on the spectrum toward awesomeness.

This somewhat reminds me of the approach used in e.g. solution-focused brief therapy, which starts by getting the client to describe what would look like a better state to them, and then proceeds to figure out steps that would lead there:

In a specific situation, the counselor may ask,
"If you woke up tomorrow, and a miracle happened so that you no longer easily lost your temper, what would you see differently?" "What would the first signs be that the miracle occurred?"
The client, in this example, (a child) may respond by saying,
"I would not get upset when somebody calls me names."
The counselor wants the client to develop positive goals, or what they will do—rather than what they will not do—to better ensure success. So, the counselor may ask the client, "What will you be doing instead when someone calls you names?"


"Suppose tonight, while you slept, a miracle occurred. When you awake tomorrow, what would be some of the things you would notice that would tell you life had suddenly gotten better?"
The therapist stays with the question even if the client describes an "impossible" solution, such as a deceased person being alive, and acknowledges that wish and then asks "how would that make a difference in your life?"  Then as the client describes that he/she might feel as if they have their companion back again, the therapist asks "how would that make a difference?"  With that, the client may say, "I would have someone to confide in and support me."  From there, the therapist would ask the client to think of others in the client's life who could begin to be a confidant in a very small manner.
Comment by kaj_sotala on Did AI pioneers not worry much about AI risks? · 2020-02-10T17:55:37.762Z · score: 10 (6 votes) · LW · GW

The notion of AI risk might also have seemed less compelling in a period before widely networked computers. Sometimes people say that you could just a pull a plug on a computer that misbehaved, which seems a little silly in an era where physical installations can be damaged by hacking and where it can be impossible to get a software uploaded to the Internet removed... but it probably felt a lot more plausible in an era where networking was mostly limited to military systems at first, and university networks later.

Comment by kaj_sotala on Open & Welcome Thread - February 2020 · 2020-02-10T15:03:30.971Z · score: 11 (6 votes) · LW · GW

The syllabus also includes (either as required or optional reading) , , , , , and ; its "other resources" sections also include the following mentions: is a forum for the “Rationality community,” an informal network of bloggers who seek to call attention to biases and fallacies and apply reason more rigorously (sometimes to what may seem like extreme lengths).

Slate Star Codex is an anagram of “Scott Alexander,” the author of the tutorial recommended above and a prominent member of the “rationality community.” This deep and witty blog covers diverse topics in social science, medicine, events, and everyday life.

80,000 Hours,, an allusion to the number of hours in your career, is a non-profit that provides research and advice on how you can best make a difference through your career.
Comment by kaj_sotala on A Cautionary Note on Unlocking the Emotional Brain · 2020-02-09T09:15:48.502Z · score: 6 (3 votes) · LW · GW
I suspect maybe the difference is that in IFS they make a huge deal about honoring the 'parts' including exiles. In your terms this would be the unhelpful beliefs. You need ideally to fully accept that they are there for a reason and have good intentions. In IFS it is a common rookie mistake to try to shove 'bad' "parts" (in IFS terms) away prematurely and tell them to stop doing or believing that thing right away. If you do this they will often resist vehemently in open or in covert ways. Once you do get to know them, appreciate them, acknowledge their good intentions, they are then often very willing to form the intention to change, and in this case they will not resist.
So my suggestion would be to try to get to know the 'false' belief better and to acknowledge why it is there, the good it did, the good intention behind it - and with associated beliefs - there can be quite a complex structure of chained beliefs and practices. Only then do you ask it, are you happy with the current set-up? Would you like to change anything? Ask if you do really want to change the belief in every bone of your body. Usually at this point it is pretty easy to change and you are done.

This agrees with my experience.

So when everything else did not succeed entirely I tried the "nuclear option" - rewriting history. I implanted a belief that the very first time she exhibited her toxic behavior a group of parents stormed into the classroom, beat her up, threw her out of the school, and warned her never to set foot in a school again, which she never did (in the rewritten history). We reverted back to our previous teacher who was lovely. This worked, even though - at some level - I know it is false.

My model of "rewriting history" is that it still requires something that your mind believes could in principle have happened, and is a way of integrating those true beliefs in the form of an experience which an emotional part can believe in. Part of what's going on in such a memory is a fear that if this were to happen again, there would be no way to escape the situation, and you would be totally helpless. A parental intervention is something that could in principle have happened even back in that situation (even if its imagined concrete manifestation was a bit over-the-top), so once the "stuck" part of the mind has updated on it being at least possible to get out of the situation, it can relax a little and allow other relevant information to be updated.

I think that the feeling of total helplessness, in particular, is a big factor in trauma memories - the therapeutic literature seems to argue it, and it also makes sense in theoretical terms. If you get into a state where absolutely nothing you do can make any difference to the negative situation that you are in, that could get almost unboundedly bad. From that perspective, it's not surprising that upon detecting the potential for such a situation, some parts of the brain would become obsessed with bringing up the possibility of that situation, until a way had been found to avoid it.

There was a related excerpt in Unlocking the Emotional Brain, which talked imaginary re-enactment; taking some bad situation that you were in, and imagining how you yourself had gone about it differently. This would update your belief that there was nothing that you could have done in that situation, and make the memory less traumatic. But it notes that the therapist should ensure that an alternate way of acting would in fact have been possible:

As understood in terms of the therapeutic reconsolidation process, enacting the natural, self-protective response is de-traumatizing because the experience of the empowered response creates new knowings that disconfirm and dissolve the model and the feeling of being powerless that had formed in the original traumatic learning experience.
It is important to note that the re-enactment technique is appropriate only if the original situation in fact gave the client early signs of danger or trouble, so that in re-enacting, the client can respond sooner and more assertively and self-protectively, and in that way can experience the ability to avoid harm. An example of a trauma that is inappropriate for re-enactment is the experience of a bomb exploding. In that case there is no way to respond more self-protectively, so re-enactment would only be re-traumatizing. In such cases, different techniques of traumatic memory transformation are needed.

Given that quote, it is interesting that "someone else could have come in and helped me" re-enactment sometimes works. It is not entirely clear to me why and when it does (it doesn't always seem to help me), but one belief that people sometimes internalize from e.g. being terrorized by an authority figure is that they are worthless and deserved the terrorizing. If you know that parents generally would not accept this kind of a behavior from a teacher, then that belief contains the generalized belief of "no child deserves this kind of a treatment"; which, when applied to the original memory, may be a way of turning that abstract belief concrete and removing the belief that nobody would care about that happening to your childhood self. (Just wildly speculating here.)

Comment by kaj_sotala on A Cautionary Note on Unlocking the Emotional Brain · 2020-02-09T08:44:23.914Z · score: 13 (6 votes) · LW · GW

Someone I know has reported something similar. She had both negative and positive beliefs of another person, and felt that the negative beliefs were wrong. After trying to do reconsolidation, she found that the negative beliefs only got stronger. Not only was this an unwanted result, it also didn't feel more true, but also felt really distressing. She did get it eventually fixed, and is still using the technique, but is now more cautious about it.

Personally I haven't had this kind of an issue: I find that if I'm in a stance where I have already decided that a certain belief is wrong and am trying to force my brain to update on that, the update process just won't go through, or produces a brief appearance of going through but doesn't really change anything. This seems fortunate, since it forces me to switch to more of a mode of exploration: is this belief false, or might it in fact be true? (Note that UtEB also explicitly cautions against trying to explicitly argue against or disprove a belief.)

If you go through a belief update process and it feels like the wrong belief got confirmed, the fact that you feel like the wrong belief won means that there's still some other belief in your brain disagreeing with that winner. In those kinds of situations, if I am approaching this from a stance of open exploration, I can then ask "okay, so I did this update but some part of my mind still seems to disagree with the end result; what's the evidence behind that disagreement, and can I integrate that"?

In my experience, if I find myself really strongly insisting that a belief must be false and disproven, then that may actually be because a part of my mind thinks that it would be really bad for the belief to be true. Maybe it would be really unpleasant to believe in existential risk being a serious issue, and then I get blended with the part that really doesn't want it to be true. Then I try to prove x-risk concerns false, which repeatedly fails because the issue isn't them being false, the issue is me not wanting to believe them true. mr-hire has a good piece of advice relating to this:

For every belief schema you're working with, there's (at least) two belief schema's at play. There's the side that believes a particular thing, and then there's a side that wants you to question the belief in that thing. As a general rule, you should always start with the side that's more cognitively fused.
As an example, I was working with someone who was having issues going to bed on time, and wanted to change that. Before we started looking at the schema of "I should avoid ruminating by staying up late," We first examined the schema of "I should get more sleep."
By starting with the schema that you're more cognitively fused with, you avoid confirmation bias and end up with more accurate beliefs at the end.

Note also that it may be the case that you really want some belief to be false, and it is in fact false. But the above bit is good advice even in that situation: even if the belief is false, you are less likely to be able to update it if your mind is stuck on wanting to disprove it, because you need to experience it as genuinely true in order to make progress. As I've mentioned:

Something that been useful to me recently has been remembering that according to memory reconsolidation principles, experiencing an incorrect emotional belief as true is actually necessary for revising it. Then, when I get an impulse to push the wrong-feeling belief out of my mind, I instead take the objecting part or otherwise look for counterevidence and let the counterbelief feel simultaneously true as well. That has caused rapid updates the way Unlocking the Emotional Brain describes.
I think that basically the same kind of thing (don't push any part out of your mind without giving it a say) has already been suggested in IDC, IFS etc.; but in those, I've felt like the framing has been more along the lines of "consider that the irrational-seeming belief may still have an important point", which has felt hard to apply in cases where I feel very strongly that one of the beliefs is actually just false. Thinking in terms of "even if this belief is false, letting myself experience it as true allows it to be revised" has been useful for those situation

All of that said, I do agree that there is always the risk of more extensive integration actually leading to incorrect beliefs. In expectation, learning more about the world is going to make you smarter, but there's always the chance of buying into a crazy theory that makes you dumber and integrating your beliefs to be more consistent with it - or even buying into a correct theory that makes you dumber. But of course, if you don't try to learn or integrate your models more, you're not going to have very good models either.

Comment by kaj_sotala on A Cautionary Note on Unlocking the Emotional Brain · 2020-02-09T07:54:16.363Z · score: 5 (2 votes) · LW · GW

I would also be curious to hear of more specific examples (though of course they might be too personal to share).

Comment by kaj_sotala on The Curse Of The Counterfactual · 2020-02-07T20:50:33.525Z · score: 22 (7 votes) · LW · GW

Curated. I have been using the content in this post a fair bit since it was posted. In particular, I've gotten value out of the following pieces of theoretical discussion:

  • The notion that once the moral judgment kicks in, it only cares about punishing wrongdoers, even if this blocks thinking about the actual object-level issue; and that the punishment machinery does not actually motivate us to do anything.
  • The Nice Guy Paradigm, especially the "backwards chaining" component of it; "if I don't get what I want, then it's my fault".

These have helped explain several behaviors in people (both myself and others) which I would have found puzzling before. I've probably mentally referenced these concepts dozens of times since reading them.

Additionally, as I noted in the comments, the specific technique of The Work was very useful to me for all kinds of mindhacking. Among other things, it allowed me to figure out what exactly was going with one particular issue which I had failed to understand despite several years of working on it. I've since applied it as a general tool for investigating issues which might be in need of reconsolidation, and found it very effective.

A large part of the value that I got with regard to the work came not from the post itself, but from one of pjeby's comments below; in particular, I thought that this paragraph summarized a lot of the value of using The Work, and I've found myself generally agreeing with it:

The reason I've moved towards using the Work as a prime investigative tool is that it lets you walk the belief network really fast compared to other methods. Getting your brain to object to getting rid of a belief forces it to reveal what the next belief up the branch is with far less wasted movement.
Comment by kaj_sotala on Plausibly, almost every powerful algorithm would be manipulative · 2020-02-06T13:44:25.385Z · score: 25 (9 votes) · LW · GW

The dataset example reminds me of the "playing dead" example from The Surprising Creativity of Digital Evolution:

In research focused on understanding how organisms evolve to cope with high-mutation-rate environments [50], Ofria sought to disentangle the beneficial effects of performing tasks (which would allow an organism to execute its code faster and thus replicate faster) from evolved robustness to the harmful effect of mutations. To do so, he tried to disable mutations that improved an organism’s replication rate (i.e. its fitness). He configured the system to pause every time a mutation occurred, and then measured the mutant’s replication rate in an isolated test environment. If the mutant replicated faster than its parent, then the system eliminated the mutant; otherwise, the mutant would remain in the population. He thus expected that replication rates could no longer improve, thereby allowing him to study the effect of mutational robustness more directly. However, while replication rates at first remained constant, they later unexpectedly started again rising. After a period of surprise and confusion, Ofria discovered that he was not changing the inputs provided to the organisms in the isolated test environment. The organisms had evolved to recognize those inputs and halt their replication. Not only did they not reveal their improved replication rates, but they appeared to not replicate at all, in effect “playing dead” when presented with what amounted to a predator.

Ofria then took the logical step to alter the test environment to match the same random distribution of inputs as would be experienced in the normal (non-isolated) environment. While this patch improved the situation, it did not stop the digital organisms from continuing to improve their replication rates. Instead they made use of randomness to probabilistically perform the tasks that accelerated their replication. For example, if they did a task half of the time, they would have a 50% chance of slipping through the test environment; then, in the actual environment, half of the organisms would survive and subsequently replicate faster. In the end, Ofria eventually found a successful fix, by tracking organisms’ replication rates along their lineage, and eliminating any organism (in real time) that would have otherwise out-replicated its ancestors.
Comment by kaj_sotala on What Money Cannot Buy · 2020-02-03T11:32:36.232Z · score: 4 (2 votes) · LW · GW

Google Scholar seems to recommend new papers to me based on, I think, works that I have cited in my own previous publications. The recommendations seem about as decent as feels fair to expect from our current level of AI.

Comment by kaj_sotala on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-29T12:28:23.899Z · score: 5 (2 votes) · LW · GW
Fantastic post


Comment by kaj_sotala on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-29T12:04:40.489Z · score: 5 (2 votes) · LW · GW

Hmm... several thoughts about that.

One is that I don't think we really know what the player does value. I had some guesses and hand-waving in the post, but nothing that I would feel confident enough about to use as the basis for preference synthesis or anything like that. I'm not even certain that our values can be very cleanly split into a separate character and player, though I do think that the two-layer model is less wrong than the naive alternative.

In Sarah's original analogy, the player first creates the character; then the character acts based on the choices that the player has made beforehand. But I should have mentioned in the post that one aspect in which I think the analogy is wrong, is that the player keeps changing the character. (Maybe you could think of this as one of those games that give you the option to take back the experience points that you've used on your character and then lets you re-assign them...)

Part of normal learning and change is that when you have new experiences, the learning process which I've been calling the player is involved in determining how those experiences affect your desires and personality. E.g. the changes in values and preferences that many people experience after having their first child - that might be described as the work of the player writing the "parental values" attribute into the character sheet. Or someone who goes to college, uncertain of what they want to study, tries out a few different subjects, and then switches their major to something which they found surprisingly interesting and motivating - the player giving them a preference to study that thing.

Those examples seem complicated enough that it seems a little too simplified to say that the player values emotional states; to some extent it seems to, but it also seems to itself create emotional states as suit its purposes. Probably what it "values" can't be simplified into any brief verbal description; it's more like it's a godshatter with a thousand different optimization criteria, all being juggled together to create something like the character.

I read your original comment as suggesting that we give the player sufficient pleasure that it is content; and then we also satisfy the character's preferences. But

1. Assuming for the sake of argument that this was possible, it's not clear what "the player being content" would do to a person's development. One possibility is that they would stop growing and responding to changed circumstances at all, because the mechanisms that were updating their behavior and thinking were all in the player. (Maybe even up to the point of e.g. not developing new habits in response to having moved to a new home with different arrangements, or something similar.)

2. There's anecdotal evidence suggesting that the pursuit of pleasure is actually also one of those character-level things. In "Happiness is a chore", the author makes the claim that even if you give people a technique which would consistently make them happy, and people try it out and become convinced of this, they might still end up not using it - because although "the pursuit of happiness" is what the character thinks they are doing, it is actually not what the player is optimizing for. If it was, it might be in the player's power to just create the happiness directly. Compare e.g. pjeby's suggestion that things like happiness etc. are things that we feel by default, but the brain learns to activate systems which block happiness, because the player considers that necessary for some purpose:

So if, for example, we don't see ourselves as worthless, then experiencing ourselves as "being" or love or okayness is a natural, automatic consequence. Thus I ended up pursing methods that let us switch off the negatives and deal directly with what CT and IFS represent as objecting parts, since these objections are the constraint on us accessing CT's "core states" or IFS's self-leadership and self-compassion.

These claims also match my personal experience; I have, at various times, found techniques that I know would make me happy, but then I find myself just not using them. At one point I wrote "I have available to me some mental motions for reaching inside myself and accessing a source of happiness, but it would require a bit of an active effort, and I find that just being neutral is already good enough, so I can't be bothered." Ironically, I think I've forgotten what exactly that particular mental move was, because I ended up not using it very much...

There's also a thing in meditative traditions where people develop the ability to access some really pleasant states of mind ("jhanas"). But then, although some people do become "jhana-junkies" and mostly just want to hang out in them, a lot of folks don't. One friend of mine who knows how to access the jhanas was once asked something along the lines of "well, if you can access pure pleasure, why aren't you doing it all the time". That got him thoughtful, and then he afterwards mentioned something about pure bliss just getting kinda stale / boring after a while. Also, getting into a jhana requires some amount of effort and energy, and he figures that he might as well spend that effort and energy on something more meaningful than just pure pleasure.

3. "Not getting satisfied" seems like a characteristic thing of the player. The character thinks that they might get satisfied: "once I have that job that I've always wanted, then I'll be truly happy"... and then after a while they aren't anymore. If we model people's goals as setpoints, it seems like frequently when one setpoint has been reached (which the previous character would have been satisfied with), the player looks around and changes the character to give it a new target setpoint. (I saw speculation somewhere that this is an evolutionary hack for getting around the fact that the brain has only a limited range of utility that it can represent - by redefining the utility scale whenever you reach a certain threshold, you can effectively have an unbounded utility function even though your brain can only represent bounded utility. Of course, it comes with costs such as temporally inconsistent preferences.)

Comment by kaj_sotala on Healing vs. exercise analogies for emotional work · 2020-01-27T21:58:50.587Z · score: 7 (3 votes) · LW · GW

"Emotional work is endless boring cleaning" doesn't sound as attractive as either healing or exercise, though. :-)

Comment by kaj_sotala on Healing vs. exercise analogies for emotional work · 2020-01-27T21:58:08.038Z · score: 3 (1 votes) · LW · GW

Yeah, I was specifically thinking about persistent ongoing work more than occasional epiphanies, though of course sometimes the epiphanies can actually be transforming too, and ongoing work is likely to eventually produce them.

Comment by kaj_sotala on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-27T17:07:45.607Z · score: 3 (1 votes) · LW · GW

Hmm... it's hard for me to get what you mean from a comment this short, but just the fact that I seem to have a lot of difficulty connecting your comment with my own model suggests that I didn't communicate mine very well. Could you say more about how you understood it?

Comment by kaj_sotala on 2018 Review: Voting Results! · 2020-01-27T09:29:10.772Z · score: 12 (2 votes) · LW · GW

That's an interesting measure, let's plot that too. (Ranks reversed so that rank 1 is represented as 74, rank 2 as 73, and so on.)

Reverse rank vs. karma

Comment by kaj_sotala on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-26T16:11:16.480Z · score: 6 (3 votes) · LW · GW

Great comment, thanks!

Is it really "wrong"? It's a normative assumption ... we get to decide what values we want, right? As "I" am a character, I don't particularly care what the player wants :-P

Well, to make up a silly example, let's suppose that you have a conscious belief that you want there to be as much cheesecake as possible. This is because you are feeling generally unsafe, and a part of your brain has associated cheesecakes with a feeling of safety, so it has formed the unconscious prediction that if only there was enough cheesecake, then you would finally feel good and safe.

So you program the AI to extract your character-level values, it correctly notices that you want to have lots of cheesecake, and goes on to fill the world with cheesecake... only for you to realize that now that you have your world full of cheesecake, you still don't feel as happy as you were on some level expecting to feel, and all of your elaborate rational theories of how cheesecake is the optimal use of atoms start feeling somehow hollow.

Comment by kaj_sotala on Is the Reversal Test overrated? · 2020-01-25T19:59:40.437Z · score: 10 (7 votes) · LW · GW

Worth noting that the original paper mentions several potential reasons to prefer the status quo, which can in fact be valid arguments rather than bias. Your body temperature example is an instance of the first one, the argument from evolutionary adaptation:

Obviously, the Reversal Test does not show that preferring the status quo is always unjustified. In many cases, it is possible to meet the challenge posed by the Reversal Test and thus to defeat the suspicion of status quo bias. Let us examine some of the possible ways in which one could try to do this [...]

The Argument from Evolutionary Adaptation

For some biological parameters, one may argue on evolutionary grounds that it is likely that the current value is a local optimum. The idea is that we have adapted to live in a certain kind of environment, and that if a larger or a smaller value of the parameter had been a better adaptation, then evolution would have ensured that the parameter would have had this optimal value. For example, one could argue that the average ratio between heart size and body size is at a local optimum, because a suboptimal ratio would have been selected against. This argument would shift the burden of proof back on somebody who maintains that a particular person’s heart—or the average human heart-tobody-size ratio—is too large or too small. [...]

The Argument from Transition Costs

Consider the reluctance of the United States to move to the metric system of measurement units. While few would doubt the superiority of the metric system, it is nevertheless unclear whether the United States should adopt it. In cases like this, the transition costs are potentially so high as to overwhelm the benefits to be gained from the new situation. Those who oppose both increasing and decreasing some parameter can potentially appeal to such a rationale to explain why we should retain the status quo without having to insist that the status quo is (locally) optimal. [...]

The Argument from Risk

Even if it is agreed that we are probably not at a local optimum with respect to some parameter under consideration, one could still mount an argument from the risk against varying the parameter. If it is suspected that the potential gains from varying the parameter are quite low and the potential losses very high, it may be prudent to leave things as they are (fig. 2).
Comment by kaj_sotala on 2018 Review: Voting Results! · 2020-01-24T11:33:03.488Z · score: 19 (6 votes) · LW · GW

The "Click Here If You Would Like A More Comprehensive Vote Data Spreadsheet" link includes both vote totals and karma, making it easy to calculate the correlation using Google Sheet's CORRELATE function. Pearson correlation between karma and vote count is 0.355, or if we throw away the outlier of Affordance Widths that was heavily downvoted due to its author, 0.425.

Scatterplots with "Affordance Widths" removed:

Karma vs. Total

Total vs. karma

Comment by kaj_sotala on Player vs. Character: A Two-Level Model of Ethics · 2020-01-20T17:46:20.866Z · score: 12 (5 votes) · LW · GW

I didn't feel like I fully understood this post at the time when it was written, but in retrospect it feels like it's talking about essentially the same thing as Coherence Therapy does, just framed differently.

Any given symptom is coherently produced, in other words, by either (1) how the individual strives, without conscious awareness, to carry out strategies for safety or well-being; or (2) how the individual responds to having suffered violations of safety or well-being. This model of symptom production is squarely in accord with the constructivist view of the self as having profound if unrecognized agency in shaping experience and behavior. Coherence therapy is centrally focused on ushering clients into a direct, noninterpretive experience of their agency in generating the symptom.

Symptom coherence was also defined by Ecker and Hulley (2004) as a heuristic principle of mental functioning, as follows: The brain-mind-body system can purposefully produce any of its possible conditions or states, including any kind of clinical symptom, in order to carry out any purpose that it is capable of forming.

This principle of general coherence is, of course, quite foreign to the therapy field’s prevailing, pathologizing models of symptom production. Underscoring the paradigmatic difference, Ecker and Hulley (2004, p. 3), addressing trainees, comment:

You won’t fully grasp this methodology until you grasp the nimble, active genius of the psyche not only in constructing personal reality, but also in purposefully manifesting any one of its myriad possible states to carry out any of its myriad possible purposes. The client’s psyche is always coherent, always in control of producing the symptom—knowing why and when to produce it and when not to produce it.

-- Toomey & Ecker 2007 (sci-hub)

Comment by kaj_sotala on Reality-Revealing and Reality-Masking Puzzles · 2020-01-20T12:00:04.630Z · score: 49 (13 votes) · LW · GW
Over the last 12 years, I’ve chatted with small hundreds of people who were somewhere “in process” along the path toward “okay I guess I should take Singularity scenarios seriously.” From watching them, my guess is that the process of coming to take Singularity scenarios seriously is often even more disruptive than is losing a childhood religion. Among many other things, I have seen it sometimes disrupt:

I feel like I was hit by most of these disruptions myself, and eventually managed to overcome them. But the exact nature of how exactly I overcame them, suggests to me that there might be one more piece to the puzzle which hasn't been mentioned here.

A concept which I've seen thrown around in a few places is that of an "exile-driven life"; "exile" referring to the Internal Family Systems notion of strong painful feelings which a person is desperate to keep buried. Your life or some aspect of your life being exile-driven, means that keeping those painful feelings suppressed is one of the primary motivations behind your choices. The alcoholic who drinks to make their feelings of shame go away is exile-driven, but one can also have an exile-driven career that looks successful from the outside, or an exile-driven relationship where someone is primarily in the relationship for the sake of e.g. getting validation from their partner, and gets desperate whenever they don't get enough of it.

In retrospect, it looks to me like most of my disruptions - such as losing the belief of having a right to rest etc. - were ultimately linked to strong feelings of moral obligation, guilt, and worthlessness which have also popped up in other contexts. For example, it has happened more than once that a friend has gotten very depressed and suicidal, and then clutched onto me for help; and this has triggered exactly the same kind of reasoning as the various Singularity scenarios. "What right do I have to rest when this other person is much more badly off", and other classic codependency symptoms. (Looking at that list of codependency symptoms actually makes for a very interesting parallel to "Singularity disorder", now that I think of it.)

Now, I do agree that there's something to the "eliminating antibodies" framing - in each of those cases, there have been related thoughts about consequentialism and (this was particularly toxic) heroic responsibility saying that yes, if I don't manage to help this person, then their suffering and possibly death is my fault.

But the "eliminating antibodies" framing is something that suggests that this is something that could happen to anyone. And maybe it could: part of my recovery involved starting to explicitly reject excessive consequentialism and utilitarianism in my thinking. Still, it wasn't until I found ways to address the underlying emotional flaws themselves, that the kinds of failure modes that you described also started fixing themselves more thoroughly.

So at least my own experience was less of "eliminating these antibodies caused me to overgeneralize factual beliefs", as "there were pre-existing parts of my mind that believed that I was worthless, and all the rationalist stuff handed them even more evidence that they could use for making that case, eliminating existing defenses against the belief". If I hadn't had those pre-existing vulnerabilities, I suspect that I wouldn't have been disrupted to the same extent.

Qiaochu and others have been making the observation that the rationalist community seems to have a large share of people who are traumatized; it's been remarked that self-improvement communities in general attract the walking wounded. At my IFS training, it was remarked that manager parts that are struggling to keep exiles in bay tend to be really strongly attracted into any systems which offer a promise of control and predictability, such as what you might get from the original Sequences - "here are the mathematically correct ways of reasoning and acting, just follow these instructions and you're doing as well as a human can!". There's the thought that if only you can work yourself hard enough, and follow the dictates of this new system faithfully enough, then the feelings of guilt and worthlessness will stop. But since consequentialism is more demanding than what any human is ever capable of, you can never say "okay, now I've done enough and can rest", and those feelings of worthlessness will just continue to recur.

This would suggest that not only are there pre-existing vulnerabilities that make some people more susceptible to being disrupted by rationalist memes, those are also exactly the same kinds of people who frequently get drawn to rationalist memes, since in the view of some of their parts, the "disruption" is actually a way to redeem themselves.

Comment by kaj_sotala on Why Do You Keep Having This Problem? · 2020-01-20T11:07:07.426Z · score: 11 (7 votes) · LW · GW

Closely related to this is the issue where people try to do something, fail, and figure that they will "try harder" the next time. Frequently this just means that they will fail again, because their understanding of what they are doing wrong isn't sufficiently gears-level to allow them to isolate the bug in question; "I will try harder" tends to mean "I don't know why exactly I failed, so I'll just try again and hope that it works this time around".

I did some peer coaching at one point, and a common thing was that one of us would make a plan in order to do Y; a week later, the plan had failed and Y remained undone. The one doing the coaching would then ask what went wrong and how the other person could fix that failure, producing a revised plan for next week. Often, drilling down would produce something specific, such as "I had planned to get exercise by going out on a run, but then I was busy on a few days and it rained on the rest". Then you could ask yourself what kind of a plan would avoid those failure modes, and generate a less fragile approach.

That makes "why do I keep having this problem" and "what have I tried before and why hasn't it worked" very useful questions, and might help reframe failure not as failure, but as progress towards solving the goal. Yes, you didn't succeed at the goal right away, but you got more information about what works and what doesn't, making you better-positioned to solve it the next time around.

This is also good to combine with Murphyjitsu - after forming a new plan, imagining that plan to have failed and asking yourself how surprising that possibility would feel. If it wouldn't feel very surprising at all, ask your brain what it expects the failure to have been caused by, and plan around that.

It's also worth noting that sometimes you go through a few iterations of this and new bugs just keep popping up, or alternatively it feels like you can't imagine anything in particular that would go wrong, but your inner simulator still expects this to fail. That might point to there being some emotional issue, such as your brain predicting that success will be dangerous for some reason. That's then another thing that you can try to tackle.

Comment by kaj_sotala on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T13:57:58.787Z · score: 8 (3 votes) · LW · GW

I want a similarly clear-and-understood generalization of the “reasoning vs rationalizing” distinction that applies also to processes to spread across multiple heads. I don’t have that yet. I would much appreciate help toward this.

I feel like Vaniver's interpretation of self vs. no-self is pointing at a similar thing; would you agree?

I'm not entirely happy with any of the terminology suggested in that post; something like "seeing your preferences realized" vs. "seeing the world clearly" would in my mind be better than either "self vs. no-self" or "design specifications vs. engineering constraints".

In particular, Vaniver's post makes the interesting contribution of pointing out that while "reasoning vs. rationalization" suggests that the two would be opposed, seeing the world clearly vs. seeing your preferences realized can be opposed, mutually supporting, or orthogonal. You can come to see your preferences more realized by deluding yourself, but you can also deepen both, seeing your preferences realized more because you are seeing the world more clearly.

In that ontology, instead of something being either reality-masking or reality-revealing, it can

  • A. Cause you to see your preferences more realized and the world more clearly
  • B. Cause you to see your preferences more realized but the world less clearly
  • C. Cause you to see your preferences less realized but the world more clearly
  • D. Cause you to see your preferences less realized and the world less clearly

But the problem is that a system facing a choice between several options has no general way to tell whether some option it could take is actually an instance of A, B, C or D or if there is a local maximum that means that choosing one possiblity increases one variable a little, but another option would have increased it even more in the long term.

E.g. learning about the Singularity makes you see the world more clearly, but it also makes you see that fewer of your preferences might get realized than you had thought. But then the need to stay alive and navigate the Singularly successfully, pushes you into D, where you are so focused on trying to invest all your energy into that mission that you fail to see how this prevents you from actually realizing any of your preferences... but since you see yourself as being very focused on the task and ignoring "unimportant" things, you think that you are doing A while you are actually doing D.

Comment by kaj_sotala on Toon Alfrink's sketchpad · 2020-01-17T09:51:32.506Z · score: 3 (1 votes) · LW · GW

Well, it sounds to me like it's more of a heterarchy than a hierarchy, but yeah.

Comment by kaj_sotala on Toon Alfrink's sketchpad · 2020-01-16T19:25:23.623Z · score: 3 (1 votes) · LW · GW

This BBC article discusses it a bit:

There is a further problem with Maslow's work. Margie Lachman, a psychologist who works in the same office as Maslow at his old university, Brandeis in Massachusetts, admits that her predecessor offered no empirical evidence for his theory. "He wanted to have the grand theory, the grand ideas - and he wanted someone else to put it to the hardcore scientific test," she says. "It never quite materialised."

However, after Maslow's death in 1970, researchers did undertake a more detailed investigation, with attitude-based surveys and field studies testing out the Hierarchy of Needs.

"When you analyse them, the five needs just don't drop out," says Hodgkinson. "The actual structure of motivation doesn't fit the theory. And that led to a lot of discussion and debate, and new theories evolved as a consequence."

In 1972, Clayton Alderfer whittled Maslow's five groups of needs down to three, labelled Existence, Relatedness and Growth. Although elements of a hierarchy remain, "ERG theory" held that human beings need to be satisfied in all three areas - if that's not possible then their energies are redoubled in a lower category. So for example, if it is impossible to get a promotion, an employee might talk more to colleagues and get more out of the social side of work.

More sophisticated theories followed. Maslow's triangle was chopped up, flipped on its head and pulled apart into flow diagrams.

Of course, this doesn't really contradict your point of there being separable, factorable goals. AFAIK, the current mainstream model of human motivation and basic needs is self-determination theory, which explicitly holds that there exist three separate basic needs:

Autonomy: people have a need to feel that they are the masters of their own destiny and that they have at least some control over their lives; most importantly, people have a need to feel that they are in control of their own behavior.
Competence: another need concerns our achievements, knowledge, and skills; people have a need to build their competence and develop mastery over tasks that are important to them.
Relatedness (also called Connection): people need to have a sense of belonging and connectedness with others; each of us needs other people to some degree

Comment by kaj_sotala on Ascetic aesthetic · 2020-01-15T14:29:13.846Z · score: 4 (2 votes) · LW · GW

Well there's that too. I liked this take on it:

Everything is experienced through a series of filters. Filters are created by instinct and experience… The entire world looks and feels COMPLETELY different for each person, because no ones’ set of filters is the same.

The “deeper” the filter, the more influence it has over your perception of reality. Evolutionarily developed filters (instinct) occurred from millions of years of selection, and are very deep. Filters originated in your childhood that survive into adulthood are generally deep, etc. Someone calling you a mean word adds a filter that might last an hour.

These filters are stacked on top of each other like a house of cards. The deeper the filter, the more influential. But every filter effects one’s perception of reality… There are deep filters based on physical brain chemistry, instinct based on human evolution, etc. Layers of filter can be peeled away to see reality in a more “pure” way. Peeling away layers allows one to see things(reality) in a way they didn’t before, and reconsider “their” reality. Note: peeling away layers does not necessarily mean one will then go on to form a truer view of reality.

Some of the filters defining what's beautiful and what's ugly are going to be learned and others innate, with the learned ones being formed on the basis of the innate ones.

Comment by kaj_sotala on Circling as Cousin to Rationality · 2020-01-14T20:00:38.365Z · score: 6 (2 votes) · LW · GW

That makes sense, thanks!

Comment by kaj_sotala on Ascetic aesthetic · 2020-01-14T19:32:33.809Z · score: 9 (4 votes) · LW · GW

I generally agree, but I think that there's also a sense in which aesthetics comes from facts. See Propagating Facts into Aesthetics (which is exactly about that), Identities are (Subconscious) Strategies (one's sense of identity often includes lots of aesthetic considerations as well), and Book summary: Unlocking the Emotional Brain (the kind of emotional learning described there probably drives many of these aesthetics).

Comment by kaj_sotala on Circling as Cousin to Rationality · 2020-01-14T12:03:57.016Z · score: 6 (2 votes) · LW · GW

Would be curious to hear how well you and Vaniver think that my recent post on meditation makes the case for "meditation as a form of empiricism".

Comment by kaj_sotala on Key Decision Analysis - a fundamental rationality technique · 2020-01-12T16:34:42.586Z · score: 5 (3 votes) · LW · GW

I find that the greatest challenge in starting to employ something like this, is learning to recognize the things that count as decisions to be recorded. To the extent that they are not too private, could you share more examples of the kinds of decisions that you have used this on?