(Humor) AI Alignment Critical Failure Table 2020-08-31T19:51:18.266Z · score: 25 (9 votes)
nostalgebraist: Recursive Goodhart's Law 2020-08-26T11:07:46.690Z · score: 56 (17 votes)
Collection of GPT-3 results 2020-07-18T20:04:50.027Z · score: 84 (35 votes)
Are there good ways to find expert reviews of popular science books? 2020-06-09T14:54:23.102Z · score: 25 (6 votes)
Three characteristics: impermanence 2020-06-05T07:48:02.098Z · score: 52 (20 votes)
On the construction of the self 2020-05-29T13:04:30.071Z · score: 47 (18 votes)
From self to craving (three characteristics series) 2020-05-22T12:16:42.697Z · score: 39 (18 votes)
Craving, suffering, and predictive processing (three characteristics series) 2020-05-15T13:21:50.666Z · score: 57 (24 votes)
A non-mystical explanation of "no-self" (three characteristics series) 2020-05-08T10:37:06.591Z · score: 70 (30 votes)
A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble 2020-05-05T19:09:44.484Z · score: 97 (34 votes)
Stanford Encyclopedia of Philosophy on AI ethics and superintelligence 2020-05-02T07:35:36.997Z · score: 42 (18 votes)
Healing vs. exercise analogies for emotional work 2020-01-27T19:10:01.477Z · score: 41 (21 votes)
The two-layer model of human values, and problems with synthesizing preferences 2020-01-24T15:17:33.638Z · score: 71 (23 votes)
Under what circumstances is "don't look at existing research" good advice? 2019-12-13T13:59:52.889Z · score: 73 (22 votes)
A mechanistic model of meditation 2019-11-06T21:37:03.819Z · score: 109 (38 votes)
On Internal Family Systems and multi-agent minds: a reply to PJ Eby 2019-10-29T14:56:19.590Z · score: 38 (16 votes)
Book summary: Unlocking the Emotional Brain 2019-10-08T19:11:23.578Z · score: 189 (84 votes)
System 2 as working-memory augmented System 1 reasoning 2019-09-25T08:39:08.011Z · score: 93 (30 votes)
Subagents, trauma and rationality 2019-08-14T13:14:46.838Z · score: 75 (42 votes)
Subagents, neural Turing machines, thought selection, and blindspots 2019-08-06T21:15:24.400Z · score: 68 (26 votes)
On pointless waiting 2019-06-10T08:58:56.018Z · score: 44 (23 votes)
Integrating disagreeing subagents 2019-05-14T14:06:55.632Z · score: 97 (31 votes)
Subagents, akrasia, and coherence in humans 2019-03-25T14:24:18.095Z · score: 104 (36 votes)
Subagents, introspective awareness, and blending 2019-03-02T12:53:47.282Z · score: 72 (29 votes)
Building up to an Internal Family Systems model 2019-01-26T12:25:11.162Z · score: 170 (71 votes)
Book Summary: Consciousness and the Brain 2019-01-16T14:43:59.202Z · score: 120 (48 votes)
Sequence introduction: non-agent and multiagent models of mind 2019-01-07T14:12:30.297Z · score: 95 (42 votes)
18-month follow-up on my self-concept work 2018-12-18T17:40:03.941Z · score: 58 (17 votes)
Tentatively considering emotional stories (IFS and “getting into Self”) 2018-11-30T07:40:02.710Z · score: 40 (12 votes)
Incorrect hypotheses point to correct observations 2018-11-20T21:10:02.867Z · score: 86 (35 votes)
Mark Eichenlaub: How to develop scientific intuition 2018-10-23T13:30:03.252Z · score: 81 (31 votes)
On insecurity as a friend 2018-10-09T18:30:03.782Z · score: 38 (20 votes)
Tradition is Smarter Than You Are 2018-09-19T17:54:32.519Z · score: 67 (25 votes)
nostalgebraist - bayes: a kinda-sorta masterpost 2018-09-04T11:08:44.170Z · score: 24 (8 votes)
New paper: Long-Term Trajectories of Human Civilization 2018-08-12T09:10:01.962Z · score: 34 (16 votes)
Finland Museum Tour 1/??: Tampere Art Museum 2018-08-03T15:00:05.749Z · score: 20 (6 votes)
What are your plans for the evening of the apocalypse? 2018-08-02T08:30:05.174Z · score: 24 (11 votes)
Anti-tribalism and positive mental health as high-value cause areas 2018-08-02T08:30:04.961Z · score: 26 (10 votes)
Fixing science via a basic income 2018-08-02T08:30:04.380Z · score: 30 (14 votes)
Study on what makes people approve or condemn mind upload technology; references LW 2018-07-10T17:14:51.753Z · score: 21 (11 votes)
Shaping economic incentives for collaborative AGI 2018-06-29T16:26:32.213Z · score: 47 (13 votes)
Against accusing people of motte and bailey 2018-06-03T21:31:24.591Z · score: 88 (29 votes)
AGI Safety Literature Review (Everitt, Lea & Hutter 2018) 2018-05-04T08:56:26.719Z · score: 37 (10 votes)
Kaj's shortform feed 2018-03-31T13:02:47.793Z · score: 13 (3 votes)
Helsinki SSC March meetup 2018-03-26T19:27:17.850Z · score: 12 (2 votes)
Is the Star Trek Federation really incapable of building AI? 2018-03-18T10:30:03.320Z · score: 30 (9 votes)
My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms 2018-03-08T07:37:54.532Z · score: 303 (126 votes)
Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk” 2018-02-12T12:30:04.401Z · score: 68 (20 votes)
On not getting swept away by mental content 2018-01-25T20:30:03.750Z · score: 25 (9 votes)
Papers for 2017 2018-01-04T13:30:01.406Z · score: 32 (8 votes)


Comment by kaj_sotala on Against Victimhood · 2020-09-18T20:24:58.655Z · score: 4 (3 votes) · LW · GW

I like this post, but can't help but to notice that I expect it to be unhelpful to the people who would most need it. For someone deeply mired in victim mentality, they won't be convinced by an intellectual argument for why victim mentality is bad, since to them their victimhood feels like just how things are.

On the other hand, I guess it's unavoidable that the people who would most need to hear a particular advice are incapable of hearing it, and this post can still be helpful to people who might otherwise slightly lean that way. (Personally seeing some people's victim mentality has given me a strong incentive to never be like that myself, and it has felt helpful, so I expect that this post will also be helpful to some.)

Comment by kaj_sotala on Open & Welcome Thread - September 2020 · 2020-09-18T19:52:53.660Z · score: 5 (3 votes) · LW · GW
This is not a reason to ban him, or anyone. Being disliked is not a reason for punishment.

The traditional guidance for up/downvotes has been "upvote what you would like want to see more of, downvote what you would like to see less of". If this is how votes are interpreted, then heavy downvotes imply "the forum's users would on average prefer to see less content of this kind". Someone posting the kind of content that's unwanted on a forum seems like a reasonable reason to bar that person from the forum in question.

I agree with "being disliked is not a reason for punishment", but people also have the right to choose who they want to spend their time with, even if someone who they preferred not to spend time with viewed that as being punished. In my book, banning people from a private forum is more like "choosing not to invite someone to your party again, after they previously caused others to have a bad time" than it is like "punishing someone".

Comment by kaj_sotala on Why haven't we celebrated any major achievements lately? · 2020-09-16T16:52:16.317Z · score: 2 (1 votes) · LW · GW
I suddenly realized Notepad.exe now has a "New Window" menu option, which simply spawns another instance of Notepad.

Oh I hadn't noticed that, thanks! No longer need to go through the Taskbar for that. :D

Comment by kaj_sotala on Spoiler-Free Review: Orwell · 2020-09-16T13:55:41.913Z · score: 4 (2 votes) · LW · GW

I’d rank the game as lower Tier 3 – it’s good at its job, but not essential.

Which games in the genre would rank as Tier 1?

Comment by kaj_sotala on Book Review: Working With Contracts · 2020-09-16T11:38:19.922Z · score: 4 (2 votes) · LW · GW
I agree with this heuristic in general; when I say they don't seem to be good at this, I do mean that they don't seem to be good at it. It's entirely possible that there's some underlying purpose.

Fair enough!

First, modern contract law is relatively new; the uniform commercial code, for instance, only came along in 1952.

Huh, interesting. That's surprising to me; I expected contracts to have a sufficiently long history that there wouldn't be any recent major innovations. In retrospect, I realize a long history alone isn't enough to assume that: mathematics is also ancient but has seen its fair share of recent-ish innovations anyway.

Second, it took half a century for software-writing to come as far as it has, and the incentives for scalable legibility just don't seem as sharp in contracts - so it should take even longer. At the end of the day, most contracts operate in an environment where people are invested in reputations and relationships; an oversight which could be abused usually isn't, an accidental breach can usually be worked out with the counterparty, and so forth. It's not like a computer which just executes whatever code is in front of it. (And even today, plenty of software engineers do throw patches on top of patches - it just seems more commonly understood in the software world that this is bad practice.)

I feel like this is part of what I was gesturing at with the "on the standards of a software developer" bit. If the nature of the domain is such that "throwing patches upon patches" actually works fine most of the time, then I wouldn't say that lawyers are bad at what they do for relying on that. One could flip it around and say that they're good at what they do, for not wasting effort on optimizations that largely wouldn't make a difference.

Comment by kaj_sotala on Open & Welcome Thread - September 2020 · 2020-09-16T11:28:29.725Z · score: 4 (3 votes) · LW · GW

Often expressing any understanding towards the motives of a "bad guy" is taken as signaling acceptance for their actions. There was e.g. controversy around the movie Downfall for this:

Downfall was the subject of dispute by critics and audiences in Germany before and after its release, with many concerned of Hitler's role in the film as a human being with emotions in spite of his actions and ideologies.[40][30][49] The portrayal sparked debate in Germany due to publicity from commentators, film magazines, and newspapers,[25][50] leading the German tabloid Bild to ask the question, "Are we allowed to show the monster as a human being?".[25]
It was criticized for its scenes involving the members of the Nazi party,[23] with author Giles MacDonogh criticizing the portrayals as being sympathetic towards SS officers Wilhelm Mohnke and Ernst-Günther Schenck,[51] the former of whom was accused of murdering a group of British prisoners of war in the Wormhoudt massacre.[N 1]
Comment by kaj_sotala on Book Review: Working With Contracts · 2020-09-15T18:55:25.110Z · score: 5 (3 votes) · LW · GW
Compared to (good) software developers, lawyers do not seem to be very good at this; they tend to throw patches on top of patches, creating more corner cases rather than fewer.

Do we actually know that they're not good at it? I realize that their solution looks bad when evaluated on the standards of a software developer, but for something like contract law that has been around for a really long time, my heuristic would be to assume that a seemingly bad solution has some hidden purpose behind it.

Comment by kaj_sotala on Against boots theory · 2020-09-14T19:43:32.151Z · score: 8 (4 votes) · LW · GW
After all, in reality, even a pair of good work boots is only going to last 6-12 months.

From siderea's post that was linked in the OP:

For most of my adult life, I had bought my winter boots by going to Filene's Basement – the original one, in downtown Boston – in August. This pretty reliably turned up something tolerable I could wear for about $20 (in 1990s dollars).
Then, one year, I failed to acquire boots in August. I don't recall the reason. I don't remember if I foolishly thought, "Well, last year's boots are still sound" and didn't bother to go, or whether I went and didn't find anything suitable. So I entered the autumn with only the previous year's boots.
Unsurprisingly, after winter had already started, my worn, cheap boots failed me. Alas, like trying to harvest apples in spring, I could not find suitable boots at Filene's Basement in that season.
tn3270 gallantly offered to take me boot shopping. He drove me to the Burlington Mall, and walked me into the first place that seemed to have the sort of product I described wanting. [...]
"These boots," I said gesturing at what I was trying on, on my feet, "cost $200. Given that I typically buy a pair for $20 every year, that means these boots have to last 10 years to recoup the initial investment."
That was on January 17, 2005. They died earlier this month – that is in the first week of December, 2018. So: almost but not quite 14 years.
So, purely as an investment, they returned a bit under $80, which is a 40% ROI.
Comment by kaj_sotala on Social Capital Paradoxes · 2020-09-14T14:08:27.460Z · score: 13 (3 votes) · LW · GW
Why do free-market societies have higher social capital? How can this be fit into a larger picture in which horizontal transmission structures / few-shot interactions incentivize less cooperative strategies?

You mentioned that in small-town culture, there's a lot of iterated interaction, but people are slow to trust outsiders. That seems to suggest that there's high social capital within a small group, but low social capital with outsiders. As small-towners will prefer to only interact with people who are known to be trustworthy, they will not have the opportunity to come to trust outsiders in general.

In contrast, if you live in a large market economy where your social environment incentivizes few-shot interaction and most people turn out to cooperate even in few-shot interactions (possibly because their brains are still running strategies that were evolved for iterated interaction), then you will learn that most people are generally reliable. While you might never develop the level of extreme high trust with a few select people that you'd get in a small town, you will have much more trust for random strangers than the small-towners would. That might translate to higher social capital overall.

Comment by kaj_sotala on What's Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers · 2020-09-13T19:31:03.325Z · score: 6 (3 votes) · LW · GW
Even the crème de la crème of economics journals barely manage a ⅔ expected replication rate.

Is a two-thirds replication rate necessarily bad? This is an honest question, since I don't know what the optimal replication rate would be. Seems worth noting that a) a 100% replication rate seems too high, since it would indicate that people were only doing boring experiments that were certain to replicate b) "replication rate" seems to mean "does the first replication attempt succeed", and some fraction of replication attempts will fail due to random chance even if the effect is genuine.

Comment by kaj_sotala on ‘Ugh fields’, or why you can’t even bear to think about that task (Rob Wiblin) · 2020-09-12T06:01:26.001Z · score: 6 (3 votes) · LW · GW

I don't recall encountering one at least, I think psychologists would just say something like "X feels aversive".

Comment by kaj_sotala on Open & Welcome Thread - September 2020 · 2020-09-11T12:53:19.994Z · score: 4 (2 votes) · LW · GW

So which simulacrum level are ants on when they are endlessly following each other in a circle?

Comment by kaj_sotala on Emotional valence vs RL reward: a video game analogy · 2020-09-04T14:19:14.594Z · score: 4 (2 votes) · LW · GW
What does it correspond to in the brain

lukeprog talks a bit about this in "The Neuroscience of Pleasure".

Comment by kaj_sotala on If there were an interactive software teaching Yudkowskian rationality, what concepts would you want to see it teach? · 2020-09-03T07:27:44.292Z · score: 2 (1 votes) · LW · GW
What's a MOOC?

Massive Open Online Course.

Comment by kaj_sotala on When can Fiction Change the World? · 2020-08-26T20:47:07.505Z · score: 0 (2 votes) · LW · GW

I don't think that "we manage to find a smart way to avoid a disaster, though we almost lose anyway" implies "being smart automatically means that we win".

Comment by kaj_sotala on nostalgebraist: Recursive Goodhart's Law · 2020-08-26T20:34:04.106Z · score: 4 (2 votes) · LW · GW

Could you elaborate on that? The two posts seem to be talking about different things as far as I can tell: e.g. nostalgebraist doesn't say anything about the Optimizer's Curse, whereas your post relies on it.

I do see that there are a few paragraphs that seem to reach similar conclusions (both say that overly aggressive optimization of any target is bad), but the reasoning used for reaching that conclusion seems different.

(By the way, I don't quite get your efficiency example? I interpret it as saying that you spent a lot of time and effort on optimizations that didn't pay themselves back. I guess you might mean something like "I had a biased estimate of how much time my optimizations would save, so I chose expensive optimizations that turned out to be less effective than I thought." But the example already suggests that you knew beforehand that the time saved would be on the order of a minute or so, so I'm not sure how the example is about Goodhart's Curse.)

Comment by kaj_sotala on nostalgebraist: Recursive Goodhart's Law · 2020-08-26T17:20:39.943Z · score: 6 (3 votes) · LW · GW

But "COVID-19 cases decreasing" is probably not your ultimate goal: more likely, it's an instrumental goal for something like "prevent humans from dying" or "help society" or whatever... in other words, it's a proxy for some other value. And if you walk back the chain of goals enough, you are likely to arrive at something that isn't well defined anymore.

Comment by kaj_sotala on The two-layer model of human values, and problems with synthesizing preferences · 2020-08-25T20:55:42.977Z · score: 2 (1 votes) · LW · GW

Good question. I think that at least some approaches to no-self do break down the mechanisms by which the appearance of a character is maintained, but the extent to which it actually gives insight to the nature of the player (as opposed to giving insight to the non-existence of the character) is unclear to me.

Comment by kaj_sotala on When can Fiction Change the World? · 2020-08-24T14:59:08.098Z · score: 2 (1 votes) · LW · GW

Nice post!

Related excerpt from Misinformation and Its Correction: Continued Influence and Successful Debiasing on people's tendency to pick up beliefs from fiction (note that this is a pre-replication crisis social psychology paper, so take it with a grain of salt):

A related but perhaps more surprising source of misinformation is literary fiction. People extract knowledge even from sources that are explicitly identified as fictional. This process is often adaptive, because fiction frequently contains valid information about the world. For example, non-Americans’ knowledge of U.S. traditions, sports, climate, and geography partly stems from movies and novels, and many Americans know from movies that Britain and Australia have left-hand traffic. By definition, however, fiction writers are not obliged to stick to the facts, which creates an avenue for the spread of misinformation, even by stories that are explicitly identified as fictional. A study by Marsh, Meade, and Roediger (2003) showed that people relied on misinformation acquired from clearly fictitious stories to respond to later quiz questions, even when these pieces of misinformation contradicted common knowledge. In most cases, source attribution was intact, so people were aware that their answers to the quiz questions were based on information from the stories, but reading the stories also increased people’s illusory belief of prior knowledge. In other words, encountering misinformation in a fictional context led people to assume they had known it all along and to integrate this misinformation with their prior knowledge (Marsh & Fazio, 2006; Marsh et al., 2003).

The effects of fictional misinformation have been shown to be stable and difficult to eliminate. Marsh and Fazio (2006) reported that prior warnings were ineffective in reducing the acquisition of misinformation from fiction, and that acquisition was only reduced (not eliminated) under conditions of active on-line monitoring—when participants were instructed to actively monitor the contents of what they were reading and to press a key every time they encountered a piece of misinformation (see also Eslick, Fazio, & Marsh, 2011). Few people would be so alert and mindful when reading fiction for enjoyment. These links between fiction and incorrect knowledge are particularly concerning when popular fiction pretends to accurately portray science but fails to do so, as was the case with Michael Crichton’s novel State of Fear. The novel misrepresented the science of global climate change but was nevertheless introduced as “scientific” evidence into a U.S. Senate committee (Allen, 2005; Leggett, 2005)
Comment by kaj_sotala on Mesa-Search vs Mesa-Control · 2020-08-19T05:13:52.694Z · score: 6 (3 votes) · LW · GW

It sounds a bit absurd: you've already implemented a sophisticated RL algorithm, which keeps track of value estimates for states and actions, and propagates these value estimates to steer actions toward future value. Why would the learning process re-implement a scheme like that, nested inside of the one you implemented? Why wouldn't it just focus on filling in the values accurately?

I've thought of two possible reasons so far.

  1. Perhaps your outer RL algorithm is getting very sparse rewards, and so does not learn very fast. The inner RL could implement its own reward function, which gives faster feedback and therefore accelerates learning. This is closer to the story in Evan's mesa-optimization post, just replacing search with RL.
  2. More likely perhaps (based on my understanding), the outer RL algorithm has a learning rate that might be too slow, or is not sufficiently adaptive to the situation. The inner RL algorithm adjusts its learning rate to improve performance.

Possibly obvious, but just to point it out: both of these seem like they also describe the case of genetic evolution vs. brains.

Comment by kaj_sotala on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-19T05:01:12.045Z · score: 4 (2 votes) · LW · GW

Good point, I wasn't thinking of social effects changing the incentive landscape.

Comment by kaj_sotala on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-19T04:55:08.261Z · score: 4 (2 votes) · LW · GW

That seems like a reasonable paraphrase, at least if you include the qualification that the "quickly" is relative to the amount of structure that the inner layer has accumulated, so might not actually happen quickly enough to be useful in all cases.

For example, it seems plausible to me that the inner layer might come to optimize for its proxy estimations of outer reward more than for outer reward itself, and that those two things could become decoupled.

Sure, e.g. lots of exotic sexual fetishes look like that to me. Hmm, though actually that example makes me rethink the argument that you just paraphrased, given that those generally emerge early in an individual's life and then generally don't get "corrected".

Comment by kaj_sotala on Does crime explain the exceptional US incarceration rate? · 2020-08-17T14:22:20.828Z · score: 10 (6 votes) · LW · GW
Related to this is that so far we’ve basically taken the homicide rate as exogenous, but of course there’s reverse causality. Having a large chunk of the population in prison will affect the murder rate. [...] Another way out for them is that maybe all the countries with similar homicide rates should imprison people as much as the US, but their institutions don’t function well enough.

Note that some people make the reverse argument: that a high imprisonment rate makes things worse, especially if the sentences are long and prison conditions are harsh and tending towards punishment rather than rehabilitation. People in prison end up socialized into interacting with other prisoners, which gets first-timers into a stronger criminal mindset. Once they get out, they might not have many opportunities available other than going back into crime.

At least this article notes that e.g. Finland has a low incarceration rate as well as a low recidivism rate, though the report that it cites for the recidivism figure explicitly concludes that the rates are not directly comparable between countries, so take that with a grain of salt.

Comment by kaj_sotala on Building up to an Internal Family Systems model · 2020-08-16T20:57:25.979Z · score: 4 (3 votes) · LW · GW

Happy to hear that the post was useful to you!

After identifying a part that I want to work with, I immediately intellectualize that part and build a predictive model of what the part may possibly respond to some inquiries that I have in mind

First piece of advice: don't do that. :-) I feel pretty comfortable saying that this approach is guaranteed not to produce any results. Intellectualizing parts will basically only give you the kind of information that you could produce by intellectual analysis, and for intellectual analysis you don't need IFS in the first place. Even if your guesses are right, they will not produce the kind of emotional activation that's necessary for change.

A few thoughts on what to do instead...

Is Procrastination its own part? Maybe so. I'll give him a character. I had a roommate ("John") who had a lot of issues with procrastination, so his visual image feels appropriate.

It sounds (correct me if I'm wrong) like you are giving the part a visual appearance by thinking of the nature of the problem, and choosing an image which seems suitably symbolic of it; then you try to interact with that image.

In that case, you are picking a mental image, but the image isn't really "connected" to the part, so the interaction is not going to work. What you want to do is to first get into contact with the part, and then let a visual image emerge on its own. (An important note: parts don't have to have a visual appearance! I expect that one could do IFS even if one had aphantasia. If you try to get a visual appearance and nothing comes up, don't force it, just work with what you do have.)

So I would suggest doing something like this:

  • Think of some concrete situation in which you usually procrastinate. If you have a clear memory of a particular time, let that memory come back to mind. Or you could imagine that you are about to do something that you've usually been procrastinating on. Or you could just pick something that you've been procrastinating on and try doing it right now, to get that procrastination response.
  • Either way, what you are going for are the kinds of thoughts, feelings, and bodily sensations that are normally associated with you procrastinating. Pay particular attention to any sensations in your body. Whatever it is that you are experiencing, try describing it out loud. For example: "when I think of working on my project, I get an unpleasant feeling in my back... it's a kind of nervous energy. And when I try to focus my thoughts on what I'm supposed to do, I... my attention just keeps kind of sliding off to something else."
    • The ellipses in that example are to highlight that there's no rush. Take your time settling into the sensations. Often, if you start with a coarse description, such as "an unpleasant feeling", you might get more details if you just keep your attention on it and see whether you could describe it more precisely: "... it's a kind of nervous energy".
    • You're not thinking about parts yet. You're just imagining yourself in a situation and then describing whatever sensations and thoughts are coming up.
    • If you find yourself describing everything very quickly, you are probably not paying attention to the actual sensations. If you find yourself pausing, looking for the right word, finding a word that's almost it but having an even better one lurking on the tip of your tongue... then you're much more likely doing it right.
    • Sometimes you don't get bodily sensations, but you might get various thoughts, mental images, or desires. That's fine too. Describe them in a similar way.
    • If you find yourself being too impatient to do this properly, working with a friend whose only job is to sit there and listen often helps. You can think of yourself as doing your best to communicate the experience to your friend.
  • Once you have a good handle on the sensations, you can let your attention rest on them and ask yourself, "if these sensations had a visual appearance, what would it be?".
    • Don't try to actively come up with an answer. Just keep your attention on the sensations, ask yourself the question, and see if any visual image emerges on its own. If you get a sense of something but it's vague, you can try saying a few words of what you do manage to make out and see if that brings out additional details.
    • "Ask yourself" here doesn't mean that you would need to address any external entity, or do anything else special. Rather, just... kind of let your mind wonder about the question, and see if any answer emerges.
    • The image doesn't need to look like anything in particular. It doesn't need to be a human, or even a living being. Though it can be! But it can be a swirling silver vortex, or a wooden duck, or whatever feels right.
    • If no visual image emerges, don't sweat it, and don't try to force one. Just stay with the sensations.
  • At this point, you can see if you could give this bundle of sensations and (maybe) images a name. Again, don't think about it too intellectually, just see if there would be anything that fits your experience. If you had a nervous energy in your back, maybe it's called "nervousness". If the mental image you got was of a swirling silver vortex in your back, maybe it's "silver vortex".
  • Now you can start doing things like seeing if you could communicate with this part, check how you feel towards it, etc.
    • When you are asking the part questions, its answers don't need to actually be any kind of mental speech. For instance, if you ask it what it is trying to do, you might get a vague intuition, a flash of memory, or a mental image. The answer might feel cryptic at first. If so, you can again describe it out loud, and wait to see if more details emerge.
      • If you think you have a hunch of what it's about, you can try asking the part whether you've understood correctly. Asking verbally is one way, but you can also just kind of... hold up your current understanding against the part, and see whether you get a sense of it resonating.
        • If the part tells you that you did understand it correctly, you can then use the same approach to ask it whether you've understood everything about this, or whether there are still more pieces that you are missing.
      • Generally avoid the temptation to go into intellectual analysis to figure out what this is about. (You can ask any intellectualizing parts to move aside.) Often there's an emotional logic which will make sense in retrospect, but which is impossible to figure out on an intellectual level beforehand. If you - say - get a particular memory which you recognize but don't understand how it's related to this topic, just stay with the memory, maybe describe it out loud, and see whether more details would emerge.
      • It's okay if you don't figure it out during one session. Let your brain process it.
  • You might arrive at something like a "classic IFS" situation, where a part has a distinct anthropomorphic appearance and you are literally having a conversation with it. Or your parts might be nothing like this, and be just a bundle of sensations whose "answers" consist of more sensations and memories coming to your mind. Either one is fine.
  • Throughout the process, the main thing is to work with that which comes naturally, and not try to force anything. (If you do feel a desire to force things into a particular shape, or guide the process to happen in a particular way, that's a part. See what it's trying to do and whether it would be willing to move aside.)
Comment by kaj_sotala on Adele Lopez's Shortform · 2020-08-16T07:05:27.581Z · score: 4 (2 votes) · LW · GW

most rural villages have a lot less privacy than most of us would be used to (because everyone knows you and talks about you).

A lot of people who have moved to cities from such places seem to mention this as exactly the reason why they wanted out.

That said, this is often because the others are judgmental etc., which wouldn't need to be the case with an AGI.

Comment by kaj_sotala on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-14T18:49:52.667Z · score: 2 (1 votes) · LW · GW
Or e.g. that it always leads to the organism optimizing for a set of goals which is unrecognizably different from the base objective. I don't think you see these things, and I'm interested in figuring out how evolution prevented them.

As I understand it, Wang et al. found that their experimental setup trained an internal RL algorithm that was more specialized for this particular task, but was still optimizing for the same task that the RNN was being trained on? And it was selected exactly because it did that very goal better. If the circumstances changed so that the more specialized behavior was no longer appropriate, then (assuming the RNN's weights hadn't been frozen) the feedback to the outer network would gradually end up reconfiguring the internal algorithm as well. So I'm not sure how it even could end up with something that's "unrecognizably different" from the base objective - even after a distributional shift, the learned objective would probably still be recognizable as a special case of the base objective, until it updated to match the new situation.

The thing that I would expect to see from this description, is that humans who were e.g. practicing a particular skill might end up becoming overspecialized to the circumstances around that skill, and need to occasionally relearn things to fit a new environment. And that certainly does seem to happen. Likewise for more general/abstract skills, like "knowing how to navigate your culture/technological environment", where older people's strategies are often more adapted to how society used to be rather than how it is now - but still aren't incapable of updating.

Catastrophic misalignment seems more likely to happen in the case of something like evolution, where the two learning algorithms operate on vastly different timescales, and it takes a very long time for evolution to correct after a drastic distributional shift. But the examples in Wang et al. lead me to think that in the brain, even the slower process operates on a timescale that's on the order of days rather than years, allowing for reasonably rapid adjustments in response to distributional shifts. (Though it's plausible that the more structure there is in a need of readjustment, the slower the reconfiguration process will be - which would fit the behavioral calcification that we see in e.g. some older people.)

Comment by kaj_sotala on Tagging FAQ · 2020-08-13T14:27:36.894Z · score: 4 (2 votes) · LW · GW
Further, since tag names are often changed, you can cause a tag link to also use the current name with the url parameter:

This sentence is ambiguous, since "current name" could refer either to the name the tag had when the link was created (current now, at the time of link creation), or it could refer to the tag's latest name (current whenever the link happens to be viewed).

There's a paragraph below that clarifies that this should be interpreted as "latest", but I missed that part earlier and got it the wrong way around, and have misused the parameter a few times as a result.

Comment by kaj_sotala on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-12T14:36:33.404Z · score: 20 (11 votes) · LW · GW
That said, if mesa-optimization is a standard feature[4] of brain architecture, it seems notable that humans don’t regularly experience catastrophic inner alignment failures.

What would it look like if they did?

Comment by kaj_sotala on AlphaStar: Impressive for RL progress, not for AGI progress · 2020-08-09T06:44:04.434Z · score: 10 (4 votes) · LW · GW

I have been wanting to curate this for a long time. As AlphaStar seemed really powerful at the time, it was useful to read an analysis of where it goes wrong: I felt that the building placement was an excellent concrete example of what a lack of causal reasoning really means. Not only is it useful for thinking about AlphaStar, the same weaknesses apply to GPT, which we have been discussing a lot now: it only takes a bit of playing around with say AI Dungeon before this becomes very obvious.

Comment by kaj_sotala on Tags Discussion/Talk Thread · 2020-08-07T12:38:38.675Z · score: 3 (2 votes) · LW · GW

A tag that I'm about to create would have the following description:

________ is a strategy for dealing with confusing questions or points of disagreement, such as "do humans have free will" or "when a tree falls in a forest with no-one to hear, does it make a sound". Rather than trying to give an answer in the form of "yes", "no", or "the question is incoherent", one seeks to understand the cognitive algorithm that gave rise to the confusion, so that at the end there is nothing left to explain.

Eliezer originally called this strategy "dissolving the question" (in the first linked post), but an important part of it is thinking in terms of "how an algorithm feels from the inside" (the second linked post), and I tend to think of these interchangeably. "Dissolving the Question" says, among other things:

What kind of cognitive algorithm, as felt from the inside, would generate the observed debate about "free will"?

In fact, until I looked up the relevant posts, I remembered the name of the strategy as being "dissolving the algorithm" rather than "dissolving the question".

Given these considerations, should the tag be called:

  • Dissolving the Question
  • Dissolving the Cognitive Algorithm
  • How an Algorithm Feels From Inside
  • Something else?
Comment by kaj_sotala on Tags Discussion/Talk Thread · 2020-08-07T10:51:17.570Z · score: 8 (4 votes) · LW · GW

I'd say that the appropriate number of tags for a post is "as many or few as seem to match the contents of the post".

I expect that many posts that have just a few tags are that way simply because the tagger didn't happen to think of the other possible ones. Or someone might be focusing on some particular tag and tagging all relevant posts with that, without also looking at those posts to see what other tags might be suitable.

Comment by kaj_sotala on Tags Discussion/Talk Thread · 2020-08-07T06:36:30.091Z · score: 4 (2 votes) · LW · GW

That's a reasonable point. After this discussion, I think that I do lean towards just renaming it after all.

Comment by kaj_sotala on Tags Discussion/Talk Thread · 2020-08-06T21:44:24.826Z · score: 3 (2 votes) · LW · GW

I think that people knowing what the tag means right away, is potentially a problem if that instant understanding is slightly wrong. E.g. if people only look at the tag's name (which is what they'll generally do if they have no reason to explicitly look up the description) they might feel that some posts that fit CI but not Tribalism are mis-tagged and downvote the tag relevance. Coalitional Instincts being less self-explanatory has the advantage that people are less likely to assume they know what it means without looking at the tag description.

Comment by kaj_sotala on Tags Discussion/Talk Thread · 2020-08-06T15:57:48.296Z · score: 5 (3 votes) · LW · GW
Merge Blues and Greens and Coalitional Instincts; they're about basically the same thing. I don't like either name; "Tribalism" would probably be better. Blues and Greens is jargon that's not used enough, and coalitional instincts is too formal.

I don't have an opinion on the Blues and Greens merge - I wouldn't expect anyone to be specifically interested in posts that happen to use that particular analogy - but I would somewhat lean towards keeping the Coalitional Instincts term.

I considered several terms for that tag, including "Tribalism", but I feel like there's an underlying concept cluster that's worth carving out and which is better described by Coalitional Instincts. Though this feels like a somewhat subtle difference in what I feel that "Tribalism" connotes, and if others disagree with me on those connotations, then I'm certainly willing to switch.

Basically, it feels to me like "Tribalism" generally refers to a somewhat narrow set of behaviors, whereas Coalitional Instincts includes those but also includes the underlying psychological mechanisms and somewhat broader behaviors. For example, Eliezer's post Professing and Cheering includes this excerpt:

But even the concept of “religious profession” doesn’t seem to cover the pagan woman’s claim to believe in the primordial cow. If you had to profess a religious belief to satisfy a priest, or satisfy a co-religionist—heck, to satisfy your own self-image as a religious person—you would have to pretend to believe much more convincingly than this woman was doing. As she recited her tale of the primordial cow, she wasn’t even trying to be persuasive on that front—wasn’t even trying to convince us that she took her own religion seriously. I think that’s the part that so took me aback. I know people who believe they believe ridiculous things, but when they profess them, they’ll spend much more effort to convince themselves that they take their beliefs seriously.
It finally occurred to me that this woman wasn’t trying to convince us or even convince herself. Her recitation of the creation story wasn’t about the creation of the world at all. Rather, by launching into a five-minute diatribe about the primordial cow, she was cheering for paganism, like holding up a banner at a football game. A banner saying Go Blues isn’t a statement of fact, or an attempt to persuade; it doesn’t have to be convincing—it’s a cheer.
That strange flaunting pride . . . it was like she was marching naked in a gay pride parade.1It wasn’t just a cheer, like marching, but an outrageous cheer, like marching naked—believing that she couldn’t be arrested or criticized, because she was doing it for her pride parade.
That’s why it mattered to her that what she was saying was beyond ridiculous. If she’d tried to make it sound more plausible, it would have been like putting on clothes.

To me, "believing in something because it is outrageous, and not even trying to make it legible according to the other people's epistemology" is a phenomenon that's covered by coalitional instincts - it's holding up the standards of explanation of your group and flaunting the fact that you don't care about the other side's. But it doesn't quite fit tribalism as I usually understand it? I feel like "tribalism" usually refers to things that are more explicitly about "explicitly attacking anything that is seen to support the other side", and not so much about "simply being proud about your side".

But I'm not completely certain of that myself, and if others disagree with this assessment, then I'd be willing to change the name to tribalism.

Comment by kaj_sotala on Tagging Open Call / Discussion Thread · 2020-08-06T15:36:52.869Z · score: 2 (1 votes) · LW · GW

I agree with this; I wouldn't expect Empiricism to have posts like e.g. Is Science Slowing Down? or Ed Boyden on the State of Science.

Comment by kaj_sotala on Property as Coordination Minimization · 2020-08-04T21:03:12.364Z · score: 15 (8 votes) · LW · GW

A related argument for property rights as solving a coordination problem is David Friedman's A Positive Account of Property Rights. In a nutshell, his argument is that:

  • Continuous bargaining is time-consuming and hard. Suppose that you and I are given a dollar to divide between ourselves. It is my incentive to argue for getting as large of a share as possible, and it is your incentive to argue that you get as large of a share as possible. We could in principle spend a lot of time on this. For instance, I might say that I'm going to insist on getting 70 cents and that I will settle for nothing less. You suspect that actually, I am willing to settle on 60 cents, so you refuse to let the deal go through until I lower my demand.
  • However, there exists a unique proposal for sharing the dollar: that both of us get 50 cents. What's more, both of us know (and know each other to know) that the 50-50 split is a unique coordination point for situations like this one. As a consequence, if I say that I'm going to insist on getting 50 cents and will settle for nothing less, then this is a believable claim.
    • As Friedman says: "... suppose there is one outcome that is seen as unique. A player who proposes that outcome may be perceived as offering, not a choice between that outcome, another slightly different, another different still, . . . but a choice between that outcome and continued bargaining. A player who says that he insists on the unique outcome and will not settle for anything less may be believable, where a similar statement about a different outcome would not be. He can convincingly argue that he will stand by his proposed outcome because, once he gives it up, he has no idea where he will end up or how high the costs of getting there will be."
  • Extending this, let's imagine that we live in a Hobbesian society with no rule of law. Even without courts to enforce contracts, it can still be beneficial for us to formally make a contract. A contract establishes another unique commitment point: I expect you to uphold your part of the bargain, and you expect me to uphold mine. If you violate the contract, I can make a credible commitment to retaliating until the previous contract has been upheld. If not for the contract, we would again be in the position of endless bargaining, where I can threaten to hurt you if you don't do what I want, but you might doubt my commitment to this threat.
  • Extending this further gets a system of norms about property in general: we could have endless negotiations about who gets to use what, but if an accepted system of property rights exists, this coordination cost is eliminated. This is similar, though slightly different, from your argument: you focus on the fact that property allows a single decision-maker to act without needing to consult other people. Friedman's argument is that property rights save us from paying the transaction cost of endless negotiation to establish who gets to use what.
    • To use your example of a landlord, suppose that we didn't have a system of laws and norms establishing impersonal property. Now, someone who owned a house could offer to rent it out to other people, but once those people were in the house, they could suddenly decide that they didn't want to pay the rent anymore. The (ex-)landlord would then be forced to figure out whether paying the cost of forcefully evicting them would be worth it, and have an incentive to bluff that (s)he did, and the people in the house would need to figure out whether the threat was credible... and so on again.
I believe I have now resolved the apparent paradox of contracting out of the Hobbesian jungle. The process of contracting changes the situation because it establishes new Schelling points, which in turn affect the strategic situation and its outcome. The same analysis can be used from the other side to explain what constitutes civil society. The laws and customs of civil society are an elaborate network of Schelling points. If my neighbor annoys me by growing ugly flowers, I do nothing. If he dumps his garbage on my lawn, I retaliate—possibly in kind. If he threatens to dump garbage on my lawn, or play a trumpet fanfare at 3 A.M. every morning, unless I pay him a modest tribute I refuse—even if I am convinced that the available legal defenses cost more than the tribute he is demanding.
If a policeman arrests me—even for a crime I did not commit—I go along peacefully. If he tries to rob my house, I fight, even if the cost of doing so is more than the direct cost of letting him rob me. Each of us knows what behavior by everyone else is within the rules and what behaviour implies unlimited demands, the violation of the Schelling point, and the ultimate return to the Hobbesian jungle. The latter behaviour is prevented by the threat of conflict even if (as in the British defense of the Falklands) the direct costs of surrender are much lower than the direct costs of conflict.
Comment by kaj_sotala on What are examples of 'scientific' studies that contradict what you believe about yourself? · 2020-08-03T15:15:46.721Z · score: 16 (8 votes) · LW · GW

All of this seems to assume that knowing things about the plot will reduce all enjoyment of the story; my experience is closer to something like "consuming a work for the first time is a different kind of experience from re-consuming something that I already know". Spoilers can damage the first-time enjoyment, while not affecting the later occasions.

That said, it does also feel to me that I don't reconsume works very much personally, and this feels at least partially because consuming them a second time does feel much less interesting than the first time. Some people clearly like re-consuming works more often, so it may be that some people prefer the first-time experience more than others.

Comment by kaj_sotala on Tagging Open Call / Discussion Thread · 2020-08-02T21:55:29.873Z · score: 4 (2 votes) · LW · GW

I dislike Probability Calibration because I dislike leading adjectives/modifiers and prefer the main thing to be the first word in the noun phrase (some languages like Hebrew do this). I expect people to be looking for the core thing, e.g. Relationships, "R", and if you put modifiers in front, e.g. "Business", "Personal/Interpersonal", "Romantic", "Conceptual", you then require someone to guess which modifier you used, and also split up Relationship tags from being adjacent in an alphabetical list.

There is that, but at the same time, you probably wouldn't want tags like Experiences (Anticipated) or Induction (Solomonoff). I don't have any principled argument for this, but to me "Probability Calibration" feels more like one of those examples. It being put alphabetically close to "Probability" may also be good.

(I also keep feeling confused by Relationships (Interpersonal) each time I see it, though that's probably in part because there's no other 'Relationships', so I just think 'well what other relationships could you even mean' and then don't find another Relationships that it would be contrasted with.)

Comment by kaj_sotala on Tagging Open Call / Discussion Thread · 2020-08-02T20:25:04.304Z · score: 2 (1 votes) · LW · GW

"Probability Calibration" rather than "Calibration (Probability)" feels like a more natural name for the tag, while keeping the disambiguation.

Comment by kaj_sotala on "Can you keep this confidential? How do you know?" · 2020-07-29T16:57:36.883Z · score: 6 (3 votes) · LW · GW

Secrets seem necessary for maintaining people's privacy, and privacy seems like it's necessary for thinking clearly.

Comment by kaj_sotala on Billionaire Economics · 2020-07-29T15:49:35.635Z · score: 4 (2 votes) · LW · GW

Housing first schemes seem to pay for themselves, though [1, 2].

Comment by kaj_sotala on Tagging Open Call / Discussion Thread · 2020-07-29T13:24:11.855Z · score: 4 (2 votes) · LW · GW
Alternatively, we have an automatically updating spreadsheet (every 5 min) that tracks the tags on the most viewed posts according to our data and their current tags.

Note that this spreadsheet seems to open by default to the "sorted by karma" tab, and you have to manually switch to the "sorted by view rank" tab (spent half a minute wondering whether the link was incorrect before happening to look at the tab list).

(Also: Oh wow, our most viewed post in one with four karma?)

Comment by kaj_sotala on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-28T20:39:44.340Z · score: 16 (9 votes) · LW · GW

GPT-generated spam seems like a worse problem for things like product reviews, than for a site like LW where comments are generally evaluated by the quality of their content. If GPT produces low-quality comments it'll be downvoted, if it produces high-quality comments then great.

Comment by kaj_sotala on Hierarchy of Evidence · 2020-07-26T18:54:08.399Z · score: 2 (1 votes) · LW · GW
And by what metric do you separate the competent experts from the non-competent experts?

There's no hard-and-fast rule, obviously, just as there's no hard-and-fast rule for figuring out which meta-analyses you can trust (for problems with meta-analyses, see e.g. [1, 2, 3, 4]). But if the experts explicitly discuss the reasons behind their opinions and e.g. why they think that one particular meta-analysis is decent but another one is flawed, you can try try to evaluate how reasonable their claims sound.

Comment by kaj_sotala on Reveal Culture · 2020-07-25T08:18:37.128Z · score: 7 (4 votes) · LW · GW

This is great! I particularly like the bit about superficially adopting one set of norms without realizing that you are still running with a different set of assumptions; it's a thing that seems to pop up in lots of different contexts. E.g. running with the assumption that you always need to look like a fast learner, and then trying to adopt norms of behaviors that look less like this, in order to demonstrate how fast you learned that this is bad. Some Christians also talk about how some seemingly humble behaviors are actually manifesting pride over how humble you are.

Another example that I've heard is that of stereotypical hippie communes that decided to found themselves on principles of free love and communal property, without realizing how many monogamous and personal owenership -type assumptions their subconscious was actually still operating on.

But in practice, most people who think they’re “doing Tell Culture” have been holding assumptions that look much more like Ask Culture.

This seems very plausible to me, but since I have not been exposed to a lot of Tell Culture "in the wild", a concrete example would be useful.

Comment by kaj_sotala on $1000 bounty for OpenAI to show whether GPT3 was "deliberately" pretending to be stupider than it is · 2020-07-22T21:35:07.602Z · score: 24 (6 votes) · LW · GW
I don't feel at all tempted to do that anthropomorphization, and I think it's weird that EY is acting as if this is a reasonable thing to do.

"It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart" seems obviously incorrect if it's explicitly phrased that way, but e.g. the "Giving GPT-3 a Turing Test" post seems to implicitly assume something like it:

This gives us a hint for how to stump the AI more consistently. We need to ask questions that no normal human would ever talk about.

Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.

Q: How many eyes does my foot have?
A: Your foot has two eyes.

Q: How many eyes does a spider have?
A: A spider has eight eyes.

Q: How many eyes does the sun have?
A: The sun has one eye.

Q: How many eyes does a blade of grass have?
A: A blade of grass has one eye.
Now we’re getting into surreal territory. GPT-3 knows how to have a normal conversation. It doesn’t quite know how to say “Wait a moment… your question is nonsense.” It also doesn’t know how to say “I don’t know.”

The author says that this "stumps" GPT-3, which "doesn't know how to" say that it doesn't know. That's as if GPT-3 was doing its best to give "smart" answers, and just was incapable of doing so. But Nick Cammarata showed that if you just give GPT-3 a prompt where nonsense answers are called out as such, it will do just that.

Comment by kaj_sotala on Collection of GPT-3 results · 2020-07-20T15:37:33.891Z · score: 2 (1 votes) · LW · GW

The free version appears to be GPT-2, given that they specifically mention having GPT-3 on the premium side (note that you'll have to explicitly enable it in the settings after getting premium):

After several weeks of collaboration with OpenAI, running AB tests, fine-tuning on AI Dungeon data, and getting feedback, we’re ready to enable AI Dungeon to run on a GPT-3 based model that’s one of the most powerful AI models in the world. We’re calling the AI Dungeon version of this new model “Dragon”. It’s available now for premium users.

Note that there's a one-week free trial for the premium version.

Comment by kaj_sotala on Praise of some popular LW articles · 2020-07-20T09:16:36.705Z · score: 6 (3 votes) · LW · GW

Cool to have an old article of mine analyzed in this much detail. :) The comparison to another article was interesting, too.

In retrospect, I think that the "it's okay to be a little irrational" thing works by allowing the irrational behavior or thought to be experienced more completely rather than being pushed down, thus allowing for memory reconsolidation.

Something that the authors emphasize is that when the target schema is activated, there should be no attempt to explicitly argue against it or disprove it, as this risks pushing it down. Rather, the belief update happens when one experiences their old schema as vividly true, while also experiencing an entirely opposite belief as vividly true. It is the juxtaposition of believing X and not-X at the same time, which triggers an inbuilt contradiction-detection mechanism in the brain and forces a restructuring of one’s belief system to eliminate the inconsistency.
Comment by kaj_sotala on My Dating Plan ala Geoffrey Miller · 2020-07-17T15:04:31.500Z · score: 5 (4 votes) · LW · GW

My personal experience with dating apps has been pretty negative. It feels like it's very hard to effectively filter for people I'd actually be interested in, requiring an enormous amount of dates. That wouldn't necessarily be a bad thing, but dates also feel like a really unnatural and awkward situation to me, since both of you are basically trying to evaluate whether the other person is "good enough" to see again while also trying to act and feel natural. (Basically, "there's mutual knowledge that you're both evaluating each other's compatibility toward your mating goals" is exactly the reason why I don't like dates.)

So far my experience has been that the organic way of gradually making friends and then possibly turning the friendship into a relationship feels much better and natural. And if you don't get a partner, you'll still get a friend. But that's also a pretty slow method, so I'm kind of stumped about what to do since both approaches seem to have major flaws.

Comment by kaj_sotala on My Dating Plan ala Geoffrey Miller · 2020-07-17T14:51:40.627Z · score: 4 (4 votes) · LW · GW

I don't think that having dating advice here is necessarily a bad thing, but I also had a negative reaction to the title.