Rob B's Shortform Feed

robbbb

Rob B's Shortform Feed

post by Rob Bensinger (RobbBB) · 2019-05-10T23:10:14.483Z · LW · GW · 79 comments

79 comments

This is a repository for miscellaneous short things I want to post. Other people are welcome to make top-level comments here if they want. (E.g., questions for me you'd rather discuss publicly than via PM; links you think will be interesting to people in this comment section but not to LW as a whole; etc.)

79 comments

Comments sorted by top scores.

comment by Rob Bensinger (RobbBB) · 2021-06-08T00:49:35.133Z · LW(p) · GW(p)

Shared with permission, a google doc exchange confirming Eliezer still finds the arguments for alignment optimism, slower takeoffs, etc. unconvincing:

Daniel Filan: I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is "from Yudkowsky/Bostrom to What Failure Looks Like ~~part 2~~ part 1") and I still don't totally get why.
Eliezer Yudkowsky: My bitter take: I tried cutting back on talking to do research; and so people talked a bunch about a different scenario that was nicer to think about, and ended up with their thoughts staying there, because that's what happens if nobody else is arguing them out of it.

That is: this social-space's thought processes are not robust enough against mildly adversarial noise, that trying a bunch of different arguments for something relatively nicer to believe, won't Goodhart up a plausible-to-the-social-space argument for the thing that's nicer to believe. If you talk people out of one error, somebody else searches around in the space of plausible arguments and finds a new error. I wasn't fighting a mistaken argument for why AI niceness isn't too intractable and takeoffs won't be too fast; I was fighting an endless generator of those arguments. If I could have taught people to find the counterarguments themselves, that would have been progress. I did try that. It didn't work because the counterargument-generator is one level of abstraction higher, and has to be operated and circumstantially adapted too precisely for the social-space to be argued into it using words.

You can sometimes argue people into beliefs. It is much harder to argue them into skills. The negation of Robin Hanson's rosier AI scenario was a belief. Negating an endless stream of rosy scenarios is a skill.

Caveat: this was a private reply I saw and wanted to share (so people know EY's basic epistemic state, and therefore probably the state of other MIRI leadership). This wasn't an attempt to write an adequate public response to any of the public arguments put forward for alignment optimism or non-fast takeoff, etc., and isn't meant to be a replacement for public, detailed, object-level discussion. (Though I don't know when/if MIRI folks plan to produce a proper response, and if I expected such a response soonish I'd probably have just waited and posted that instead.)

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-06-23T13:54:54.816Z · LW(p) · GW(p)

FWIW, I think Yudkowsky is basically right here and would be happy to explain why if anyone wants to discuss. I'd likewise be interested in hearing contrary perspectives.

Replies from: riceissa, matthew-barnett, TAG

↑ comment by riceissa · 2021-07-28T01:23:03.838Z · LW(p) · GW(p)

Which of the "Reasons to expect fast takeoff" from Paul's post do you find convincing, and what is your argument against what Paul says there? Or do you have some other reasons for expecting a hard takeoff?

I've seen this post [LW · GW] of yours, but as far as I know, you haven't said much about hard vs soft takeoff in general.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-07-28T09:06:03.953Z · LW(p) · GW(p)

It's a combination of not finding Paul+Katja's counterarguments convincing (AI Impacts has a slightly different version of the post, I think of this as the Paul+Katja post since I don't know how much each of them did), having various other arguments that they didn't consider, and thinking they may be making mistakes in how they frame things and what questions they ask. I originally planned to write a line-by-line rebuttal of the Paul+Katja posts, but instead I ended up writing a sequence of posts [? · GW] that collectively constitute my (indirect) response. If you want a more direct response, I can put it on my list of things to do, haha... sorry... I am a bit overwhelmed... OK here's maybe some quick (mostly cached) thoughts:

1. What we care about is point of no return, NOT GDP doubling in a year or whatever.

2. PONR seems not particularly correlated with GDP acceleration time or speed, and thus maybe Paul and I are just talking past each other -- he's asking and answering the wrong questions.

3. Slow takeoff means shorter timelines, so if our timelines are independently pretty short, we should update against slow takeoff. My timelines are independently pretty short. (See my other sequence.) Paul runs this argument in the other direction I think; since takeoff will be slow, and we aren't seeing the beginnings of it now, timelines must be long. (I don't know how heavily he leans on this argument though, probably not much. Ajeya does this too, and does it too much I think.) Also, concretely, if crazy AI stuff happens in <10 years, probably the EMH has failed in this domain and probably we can get AI by just scaling up stuff and therefore probably takeoff will be fairly fast (at least, it seems that way extrapolating from GPT-1, GPT-2, and GPT-3. One year apart, significantly qualitatively and quantitatively better. If that's what progress looks like when we are entering the "human range" then we will cross it quickly, it seems.)

4. Discontinuities totally do sometimes happen. I think we shouldn't expect them by default, but they aren't super low-prior either; thus, we should do gears-level modelling of AI rather than trying to build a reference class or analogy to other tech.

5. Most of Paul+Katja's arguments seem to be about continuity vs. discontinuity, which I think is the wrong question to be asking. What we care about is how long it takes (in clock time, or perhaps clock-time-given-compute-and-researcher-budget-X, given current and near-future ideas/algorithms) for AI capabilities to go from "meh" to "dangerous." THEN once we have an estimate of that, we can use that estimate to start thinking about whether this will happen in a distributed way across the whole world economy, or in a concentrated way in a single AI project, etc. (Analogy: We shouldn't try to predict greenhouse gas emissions by extrapolating world temperature trends, since that gets the causation backwards.)

6. I think the arguments Paul+Katja makes aren't super convincing on their own terms. They are sufficient to convince me that the slow takeoff world they describe is possible and deserves serious consideration (more so than e.g. Age of Em or CAIS) but not overall convincing enough for me to say "Bostrom and Yudkowsky were probably wrong." I could go through them one by one but I think I'll stop here for now.

Replies from: riceissa

↑ comment by riceissa · 2021-07-28T23:28:19.795Z · LW(p) · GW(p)

Thanks! My understanding of the Bostrom+Yudkowsky takeoff argument goes like this: at some point, some AI team will discover the final piece of deep math needed to create an AGI; they will then combine this final piece with all of the other existing insights and build an AGI, which will quickly gain in capability and take over the world. (You can search "a brain in a box in a basement" on this page or see here for some more quotes.)

In contrast, the scenario you imagine seems to be more like (I'm not very confident I am getting all of this right): there isn't some piece of deep math needed in the final step. Instead, we already have the tools (mathematical, computational, data, etc.) needed to build an AGI, but nobody has decided to just go for it. When one project finally decides to go for an AGI, this EMH failure allows them to maintain enough of a lead to do crazy stuff (conquistadors, persuasion tools, etc.), and this leads to DSA. Or maybe the EMH failure isn't even required, just enough of a clock time lead to be able to do the crazy stuff.

If the above is right, then it does seem quite different from Paul+Katja, but also different from Bostrom+Yudkowsky, since the reason why the outcome is unipolar is different. Whereas Bostrom+Yudkowsky say the reason one project is ahead is because there is some hard step at the end, you instead say it's because of some combination of EMH failure and natural lag between projects.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-07-29T09:38:41.395Z · LW(p) · GW(p)

Ah, this is helpful, thanks -- I think we just have different interpretations of Bostrom+Yudkowsky. You've probably been around before I was and read more of their stuff, but I first got interested in this around 2013, pre-ordered Superintelligence and read it with keen interest, etc. and the scenario you describe as mine is what I always thought Bostrom+Yudkowsky believed was most likely, and the scenario you describe as theirs -- involving "deep math" and "one hard step at the end" is something I thought they held up as an example of how things could be super fast, but not as what they actually believed was most likely.

From what I've read, Yudkowsky did seem to think there would be more insights and less "just make blob of compute bigger" about a decade or two ago, but he's long since updated towards "dear lord, people really are just going to make big blobs of inscrutable matrices, the fools!" and I don't think this counts as a point against his epistemics because predicting the future is hard and most everyone else around him did even worse, I'd bet.

Replies from: riceissa

↑ comment by riceissa · 2021-07-30T07:36:52.329Z · LW(p) · GW(p)

Ok I see, thanks for explaining. I think what's confusing to me is that Eliezer did stop talking about the deep math of intelligence sometime after 2011 and then started talking about big blobs of matrices as you say starting around 2016, but as far as I know he has never gone back to his older AI takeoff writings and been like "actually I don't believe this stuff anymore; I think hard takeoff is actually more likely to be due to EMH failure and natural lag between projects". (He has done similar things for his older writings that he no longer thinks is true, so I would have expected him to do the same for takeoff stuff if his beliefs had indeed changed.) So I've been under the impression that Eliezer actually believes his old writings are still correct, and that somehow his recent remarks and old writings are all consistent. He also hasn't (as far as I know) written up a more complete sketch of how he thinks takeoff is likely to go given what we now know about ML. So when I see him saying things like what's quoted in Rob's OP, I feel like he is referring to the pre-2012 "deep math" takeoff argument. (I also don't remember if Bostrom gave any sketch of how he expects hard takeoff to go in Superintelligence; I couldn't find one after spending a bit of time.)

If you have any links/quotes related to the above, I would love to know!

(By the way, I was was a lurker on LessWrong starting back in 2010-2011, but was only vaguely familiar with AI risk stuff back then. It was only around the publication of Superintelligence that I started following along more closely, and only much later in 2017 that I started putting in significant amounts of my time into AI safety and making it my overwhelming priority. I did write several timelines [LW · GW] though, and recently did a pretty thorough reading of AI takeoff arguments for a modeling project, so that is mostly where my knowledge of the older arguments comes from.)

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-07-30T08:13:43.255Z · LW(p) · GW(p)

For all I know you are right about Yudkowsky's pre-2011 view about deep math. However, (a) that wasn't Bostrom's view AFAICT, and (b) I think that's just not what this OP quote is talking about. From the OP:

I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is "from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1") and I still don't totally get why.

It's Yudkowsky/Bostrom, not Yudkowsky. And it's WFLLp1, not p2. Part 2 is the one where the AIs do a treacherous turn; part 1 is where actually everything is fine except that "you get what you measure" and our dumb obedient AIs are optimizing for the things we told them to optimize for rather than for what we want.

I am pretty confident that WFLLp1 is not the main thing we should be worrying about; WFLLp2 is closer, but even it involves this slow-takeoff view (in the strong sense, in which economy is growing fast before the point of no return) which I've argued against. I do not think that the reason people shifted from "yudkowsky/bostrom" (which in this context seems to mean "single AI project builds AI in the wrong way, AI takes over world" and to WFLLp1 is that people rationally considered all the arguments and decided that WFLLp1 was on balance more likely. I think instead that probably some sort of optimism bias was involved, and more importantly win by default (Yud + Bostrom stopped talking about their scenarios and arguing for them, whereas Paul wrote a bunch of detailed posts laying out his scenarios and arguments, and so in the absence of visible counterarguments Paul wins the debate by default). Part of my feeling about this is that it's a failure on my part; when Paul+Katja wrote their big post on takeoff speeds I disagreed with it and considered writing a big point-by-point response, but never did, even after various people posted questions asking "has there been any serious response to Paul+Katja?"

Replies from: riceissa

↑ comment by riceissa · 2021-08-10T20:27:26.863Z · LW(p) · GW(p)

Re (a): I looked at chapters 4 and 5 of Superintelligence again, and I can kind of see what you mean, but I'm also frustrated that Bostrom seems really non-committal in the book. He lists a whole bunch of possibilities but then doesn't seem to actually come out and give his mainline visualization/"median future". For example he looks at historical examples of technology races and compares how much lag there was, which seems a lot like the kind of thinking you are doing, but then he also says things like "For example, if human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without even touching the intermediary rungs." which sounds like the deep math view. Another relevant quote:

Building a seed AI might require insights and algorithms developed over many decades by the scientific community around the world. But it is possible that the last critical breakthrough idea might come from a single individual or a small group that succeeds in putting everything together. This scenario is less realistic for some AI architectures than others. A system that has a large number of parts that need to be tweaked and tuned to work effectively together, and then painstakingly loaded with custom-made cognitive content, is likely to require a larger project. But if a seed AI could be instantiated as a simple system, one whose construction depends only on getting a few basic principles right, then the feat might be within the reach of a small team or an individual. The likelihood of the final breakthrough being made by a small project increases if most previous progress in the field has been published in the open literature or made available as open source software.

Re (b): I don't disagree with you here. (The only part that worries me is, I don't have a good idea of what percentage of "AI safety people" shifted from one view to the other, whether were were also new people with different views coming in to the field, etc.) I realize the OP was mainly about failure scenarios, but it did also mention takeoffs ("takeoffs won't be too fast") and I was most curious about that part.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-08-10T21:35:52.531Z · LW(p) · GW(p)

I also wish I knew what Bostrom's median future was like, though I perhaps understand why he didn't put it in his book -- the incentives all push against it. Predicting the future is hard and people will hold it against you if you fail, whereas if you never try at all and instead say lots of vague prophecies, people will laud you as a visionary prophet.

Re (b) cool, I think we are on the same page then. Re takeoff being too fast--I think a lot of people these days think there will be plenty of big scary warning shots and fire alarms that motivate lots of people to care about AI risk and take it seriously. I think that suggests that a lot of people expect a fairly slow takeoff, slower than I think is warranted. Might happen, yes, but I don't think Paul & Katja's arguments are that convincing that takeoff will be this slow. It's a big source of uncertainty for me though.

↑ comment by Matthew Barnett (matthew-barnett) · 2021-07-28T02:45:50.494Z · LW(p) · GW(p)

I'd personally like to find some cruxes between us some time, though I don't yet know the best format to do that. I think I'll wait to see your responses to Issa's question first.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-07-28T09:07:12.912Z · LW(p) · GW(p)

Likewise! I'm up for a video call if you like. Or we could have a big LW thread, or an email chain. I think my preference would be a video call. I like Walled Garden, we could do it there and invite other people maybe. IDK.

Replies from: Raemon

↑ comment by Raemon · 2021-11-08T01:50:46.748Z · LW(p) · GW(p)

Did this ever happen?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-11-08T11:20:05.375Z · LW(p) · GW(p)

I don't think so? It's possible that it did and I forgot.

↑ comment by TAG · 2021-06-23T15:40:45.503Z · LW(p) · GW(p)

You can sometimes argue people into beliefs. It is much harder to argue them into skills. The negation of Robin Hanson’s rosier AI scenario was a belief. Negating an endless stream of rosy scenarios is a skill

A belief can be a negation in the sense of a contradiction , whilst not being a negation in the sense of a disproof. I dont think EY disproved RH's position. I dont think he is confident he did himself, since his summary was called "what I believe if not why I believe it". And I dont think lack of time was the problem, since the debate was immense.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-06-23T16:33:39.086Z · LW(p) · GW(p)

Interesting, yeah I wonder why he titled it that. Still though it seems like he is claiming here to have disproved RH's position to some extent at least. I for one think RH's position is pretty implausible, for reasons Yudkowsky probably mentioned (I don't remember exactly what Yud said).

Replies from: TAG

↑ comment by TAG · 2021-06-23T19:10:11.824Z · LW(p) · GW(p)

Why the "seems"? A master rationalist should be able to state things clearly , surely?

comment by Rob Bensinger (RobbBB) · 2019-09-23T16:45:59.284Z · LW(p) · GW(p)

Rolf Degen, summarizing part of Barbara Finlay's "The neuroscience of vision and pain":

Humans may have evolved to experience far greater pain, malaise and suffering than the rest of the animal kingdom, due to their intense sociality giving them a reasonable chance of receiving help.

From the paper:

Several years ago, we proposed the idea that pain, and sickness behaviour had become systematically increased in humans compared with our primate relatives, because human intense sociality allowed that we could ask for help and have a reasonable chance of receiving it. We called this hypothesis ‘the pain of altruism’ [68]. This idea derives from, but is a substantive extension of Wall’s account of the placebo response [43]. Starting from human childbirth as an example (but applying the idea to all kinds of trauma and illness), we hypothesized that labour pains are more painful in humans so that we might get help, an ‘obligatory midwifery’ which most other primates avoid and which improves survival in human childbirth substantially ([67]; see also [69]). Additionally, labour pains do not arise from tissue damage, but rather predict possible tissue damage and a considerable chance of death. Pain and the duration of recovery after trauma are extended, because humans may expect to be provisioned and protected during such periods. The vigour and duration of immune responses after infection, with attendant malaise, are also increased. Noisy expression of pain and malaise, coupled with an unusual responsivity to such requests, was thought to be an adaptation.

We noted that similar effects might have been established in domesticated animals and pets, and addressed issues of ‘honest signalling’ that this kind of petition for help raised. No implication that no other primate ever supplied or asked for help from any other was intended, nor any claim that animals do not feel pain. Rather, animals would experience pain to the degree it was functional, to escape trauma and minimize movement after trauma, insofar as possible.

Finlay's original article on the topic: "The pain of altruism".

Replies from: RobbBB, RobbBB, rudi-c

↑ comment by Rob Bensinger (RobbBB) · 2019-09-23T16:56:36.999Z · LW(p) · GW(p)

[Epistemic status: Thinking out loud]

If the evolutionary logic here is right, I'd naively also expect non-human animals to suffer more to the extent they're (a) more social, and (b) better at communicating specific, achievable needs and desires.

There are reasons the logic might not generalize, though. Humans have fine-grained language that lets us express very complicated propositions about our internal states. That puts a lot of pressure on individual humans to have a totally ironclad, consistent "story" they can express to others. I'd expect there to be a lot more evolutionary pressure to actually experience suffering, since a human will be better at spotting holes in the narratives of a human who fakes it (compared to, e.g., a bonobo trying to detect whether another bonobo is really in that much pain).

It seems like there should be an arms race across many social species to give increasingly costly signals of distress, up until the costs outweigh the amount of help they can hope to get. But if you don't have the language to actually express concrete propositions like "Bob took care of me the last time I got sick, six months ago, and he can attest that I had a hard time walking that time too", then those costly signals might be mostly or entirely things like "shriek louder in response to percept X", rather than things like "internally represent a hard-to-endure pain-state so I can more convincingly stick to a verbal narrative going forward about how hard-to-endure this was".

↑ comment by Rob Bensinger (RobbBB) · 2020-07-01T16:09:29.610Z · LW(p) · GW(p)

[Epistemic status: Piecemeal wild speculation; not the kind of reasoning you should gamble the future on.]

Some things that make me think suffering (or 'pain-style suffering' specifically) might be surprisingly neurologically conditional and/or complex, and therefore more likely to be rare in non-human animals (and in subsystems of human brains, in AGI subsystems that aren't highly optimized to function as high-fidelity models of humans, etc.):

1. Degen and Finlay's social account of suffering above.

2. Which things we suffer from seems to depend heavily on mental narratives and mindset. See, e.g., Julia Galef's Reflections on Pain, from the Burn Unit.

Pain management is one of the main things hypnosis appears to be useful for. Ability to cognitively regulate suffering is also one of the main claims of meditators, and seems related to existential psychotherapy's claim that narratives are more important for well-being than material circumstances.

Even if suffering isn't highly social (pace Degen and Finlay), its dependence on higher cognition suggests that it is much more complex and conditional than it might appear on initial introspection, which on its own reduces the probability of its showing up elsewhere: complex things are relatively unlikely a priori, are especially hard to evolve, and demand especially strong selection pressure if they're to evolve and if they're to be maintained.

(Note that suffering introspectively feels relatively basic, simple, and out of our control, even though it's not. Note also that what things introspectively feel like is itself under selection pressure. If suffering felt complicated, derived, and dependent on our choices, then the whole suite of social thoughts and emotions related to deception and manipulation would be much more salient, both to sufferers and to people trying to evaluate others' displays of suffering. This would muddle and complicate attempts by sufferers to consistently socially signal that their distress is important and real.)

3. When humans experience large sudden neurological changes and are able to remember and report on them, their later reports generally suggest positive states more often than negative ones. This seems true of near-death experiences and drug states, though the case of drugs is obviously filtered: the more pleasant and/or reinforcing drugs will generally be the ones that get used more.

Sometimes people report remembering that a state change was scary or disorienting. But they rarely report feeling agonizing pain, and they often either endorse having had the experience (with the benefit of hindsight), or report having enjoyed it at the time, or both.

This suggests that humans' capacity for suffering (especially more 'pain-like' suffering, as opposed to fear or anxiety) may be fragile and complex. Many different ways of disrupting brain function seem to prevent suffering, suggesting suffering is the more difficult and conjunctive state for a brain to get itself into; you need more of the brain's machinery to be in working order in order to pull it off.

4. Similarly, I frequently hear about dreams that are scary or disorienting, but I don't think I've ever heard of someone recalling having experienced severe pain from a dream, even when they remember dreaming that they were being physically damaged.

This may be for reasons of selection: if dreams were more unpleasant, people would be less inclined to go to sleep and their health would suffer. But it's interesting that scary dreams are nonetheless common. This again seems to point toward 'states that are further from the typical human state are much more likely to be capable of things like fear or distress, than to be capable of suffering-laden physical agony.'

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2020-07-06T00:06:39.595Z · LW(p) · GW(p)

Devoodooifying Psychology says "the best studies now suggest that the placebo effect is probably very weak and limited to controlling pain".

↑ comment by Rudi C (rudi-c) · 2020-07-02T10:05:12.205Z · LW(p) · GW(p)

How is the signal being kept “costly/honest” though? Is the pain itself the cost? That seems somewhat weird ...

comment by Rob Bensinger (RobbBB) · 2019-09-08T02:58:32.070Z · LW(p) · GW(p)

Facebook comment I wrote in February, in response to the question 'Why might having beauty in the world matter?':

I assume you're asking about why it might be better for beautiful objects in the world to exist (even if no one experiences them), and not asking about why it might be better for experiences of beauty to exist.

[... S]ome reasons I think this:

1. If it cost me literally nothing, I feel like I'd rather there exist a planet that's beautiful, ornate, and complex than one that's dull and simple -- even if the planet can never be seen or visited by anyone, and has no other impact on anyone's life. This feels like a weak preference, but it helps get a foot in the door for beauty.

(The obvious counterargument here is that my brain might be bad at simulating the scenario where there's literally zero chance I'll ever interact with a thing; or I may be otherwise confused about my values.)

2. Another weak foot-in-the-door argument: People seem to value beauty, and some people claim to value it terminally. Since human value is complicated and messy and idiosyncratic (compare person-specific ASMR triggers or nostalgia triggers or culinary preferences) and terminal and instrumental values are easily altered and interchanged in our brain, our prior should be that at least some people really do have weird preferences like that at least some of the time.

(And if it's just a few other people who value beauty, and not me, I should still value it for the sake of altruism and cooperativeness.)

3. If morality isn't "special" -- if it's just one of many facets of human values, and isn't a particularly natural-kind-ish facet -- then it's likelier that a full understanding of human value would lead us to treat aesthetic and moral preferences as more coextensive, interconnected, and fuzzy. If I can value someone else's happiness inherently, without needing to experience or know about it myself, it then becomes harder to say why I can't value non-conscious states inherently; and "beauty" is an obvious candidate. My preferences aren't all about my own experiences, and they aren't simple, so it's not clear why aesthetic preferences should be an exception to this rule.

4. Similarly, if phenomenal consciousness is fuzzy or fake, then it becomes less likely that our preferences range only and exactly over subjective experiences (or their closest non-fake counterparts). Which removes the main reason to think unexperienced beauty doesn't matter to people.

Combining the latter two points, and the literature on emotions like disgust and purity which have both moral and non-moral aspects, it seems plausible that the extrapolated versions of preferences like "I don't like it when other sentient beings suffer" could turn out to have aesthetic aspects or interpretations like "I find it ugly for brain regions to have suffering-ish configurations".

Even if consciousness is fully a real thing, it seems as though a sufficiently deep reductive understanding of consciousness should lead us to understand and evaluate consciousness similarly whether we're thinking about it in intentional/psychologizing terms or just thinking about the physical structure of the corresponding brain state. We shouldn't be more outraged by a world-state under one description than under an equivalent description, ideally.

But then it seems less obvious that the brain states we care about should exactly correspond to the ones that are conscious, with no other brain states mattering; and aesthetic emotions are one of the main ways we relate to things we're treating as physical systems.

As a concrete example, maybe our ideal selves would find it inherently disgusting for a brain state that sort of almost looks conscious to go through the motions of being tortured, even when we aren't the least bit confused or uncertain about whether it's really conscious, just because our terminal values are associative and symbolic. I use this example because it's an especially easy one to understand from a morality- and consciousness-centered perspective, but I expect our ideal preferences about physical states to end up being very weird and complicated, and not to end up being all that much like our moral intuitions today.

Addendum: As always, this kind of thing is ridiculously speculative and not the kind of thing to put one's weight down on or try to "lock in" for civilization. But it can be useful to keep the range of options in view, so we have them in mind when we figure out how to test them later.

Replies from: jimrandomh, RobbBB

↑ comment by jimrandomh · 2019-09-09T23:56:10.224Z · LW(p) · GW(p)

Somewhat more meta level: Heuristically speaking, it seems wrong and dangerous for the answer to "which expressed human preferences are valid?" to be anything other than "all of them". There's a common pattern in metaethics which looks like:

1. People seem to have preference X

2. X is instrumentally valuable as a source of Y and Z. The instrumental-value relation explains how the preference for X was originally acquired.

3. [Fallacious] Therefore preference X can be ignored without losing value, so long as Y and Z are optimized.

In the human brain algorithm, if you optimize something instrumentally for awhile, you start to value it terminally. I think this is the source of a surprisingly large fraction of our values.

↑ comment by Rob Bensinger (RobbBB) · 2019-09-08T20:18:23.896Z · LW(p) · GW(p)

Old discussion of this on LW: https://www.lesswrong.com/s/fqh9TLuoquxpducDb/p/synsRtBKDeAFuo7e3 [? · GW]

comment by Rob Bensinger (RobbBB) · 2022-06-29T20:55:49.986Z · LW(p) · GW(p)

Collecting all of the quantitative AI predictions I know of MIRI leadership making on Arbital (let me know if I missed any):

Aligning an AGI adds significant development time: Eliezer 95%
Almost all real-world domains are rich: Eliezer 80%
Complexity of value: Eliezer 97%, Nate 97%
Distant superintelligences can coerce the most probable environment of your AI: Eliezer 66%
Meta-rules for (narrow) value learning are still unsolved: Eliezer 95%
Natural language understanding of "right" will yield normativity: Eliezer 10%
Relevant powerful agents will be highly optimized: Eliezer 75%
Some computations are people: Eliezer 99%, Nate 99%
Sufficiently optimized agents appear coherent: Eliezer 85%

Some caveats:

Arbital predictions range from 1% to 99%.
I assume these are generally ~5 years old. Views may have shifted.
By default, I assume that the standard caveats for probabilities like these apply: I treat these as off-the-cuff ass numbers [LW · GW] unless stated otherwise, products of 'thinking about the problem on and off for years and then querying my gut about what it expects to actually see', more so than of building Guesstimate models or trying to hard to make sure all the probabilities are perfectly coherent.

Inconsistencies are flags 'something is wrong here', but ass numbers are vague and unreliable enough that they're to be expected to some degree. Similarly, ass numbers are often unstable hour-to-hour and day-to-day.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2022-06-29T20:56:23.866Z · LW(p) · GW(p)

On my model, the point of ass numbers isn't to demand perfection of your gut (e.g., of the sort that would be needed to avoid multiple-stage fallacies when trying to conditionalize a lot), but to:

Communicate with more precision than English-language words like 'likely' or 'unlikely' allow. Even very vague or uncertain numbers will, at least some of the time, be a better guide than natural-language terms that weren't designed to cover the space of probabilities (and that can vary somewhat in meaning from person to person).
At least very vaguely and roughly bring your intuitions into contact with reality, and with each other, so you can more readily notice things like 'I'm miscalibrated', 'reality went differently than I expected', 'these two probabilities don't make sense together', etc.

It may still be a terrible idea to spend too much time generating ass numbers [LW · GW], since "real numbers" are not the native format human brains compute probability with, and spending a lot of time working in a non-native format may skew your reasoning.

(Maybe there's some individual variation here?)

But they're at least a good tool to use sometimes, for the sake of crisper communication, calibration practice (so you can generate non-awful future probabilities when you need to), etc.

comment by Rob Bensinger (RobbBB) · 2022-07-04T11:21:49.328Z · LW(p) · GW(p)

Suppose most people think there's a shrew in the basement, and Richard Feynman thinks there's a beaver. If you're pretty sure it's not a shrew, two possible reactions include:

- 'Ah, the truth is probably somewhere in between these competing perspectives. So maybe it's an intermediate-sized rodent, like a squirrel.'

- 'Ah, Feynman has an absurdly good epistemic track record, and early data does indicate that the animal's probably bigger than a shrew. So I'll go with his guess and say it's probably a beaver.'

But a third possible response is:

- 'Ah, if Feynman's right, then a lot of people are massively underestimating the rodent's size. Feynman is a person too, and might be making the same error (just to a lesser degree); so my modal guess will be that it's something bigger than a beaver, like a capybara.'

In particular, you may want to go more extreme than Feynman if you think there's something systematically causing people to underestimate a quantity (e.g., a cognitive bias -- the person who speaks out first against a bias might still be affected by it, just to a lesser degree), or systematically causing people to make weaker claims than they really believe (e.g., maybe people don't want to sound extreme or out-of-step with the mainstream view).

Replies from: jimrandomh, thomas-kwa

↑ comment by jimrandomh · 2022-07-04T20:29:18.432Z · LW(p) · GW(p)

In particular, you may want to go more extreme than Feynman if you think there's something systematically causing people to underestimate a quantity (e.g., a cognitive bias -- the person who speaks out first against a bias might still be affected by it, just to a lesser degree), or systematically causing people to make weaker claims than they really believe (e.g., maybe people don't want to sound extreme or out-of-step with the mainstream view).

This is true! But I think it's important to acknowledge that this depends a lot on details of Feynman's reasoning process, and it doesn't go in a consistent direction. If Feynman is aware of the bias, he may have already compensated for it in his own estimate, so compensating on his behalf would be double-counting the adjustment. And sometimes the net incentive is to overestimate, not to underestimate, because you're trying to sway the opinion of averagers, or because being more contrarian gets attention, or because shrew-thinkers feel like an outgroup.

In the end, you can't escape from detail. But if you were to put full power into making this heuristic work, the way to do it would be to look at past cases of Feynman-vs-world disagreement (broadening the "Feynman" and "world" categories until there's enough training data), and try to get a distribution empirically.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2022-07-04T22:32:23.092Z · LW(p) · GW(p)

Endorsed!

↑ comment by Thomas Kwa (thomas-kwa) · 2022-08-05T17:48:41.284Z · LW(p) · GW(p)

Have you seen this ever work for an advance prediction? It seems like you need to be in a better epistemic position than Feynman, which is pretty hard.

comment by Rob Bensinger (RobbBB) · 2022-04-06T20:22:58.184Z · LW(p) · GW(p)

From Twitter:

I am not sure I can write out the full AI x-risk scenario.
1. AI quickly becomes super clever
2. Alignment is hard, like getting your great x10 grandchildren to think you're a good person
3. The AI probably embarks on a big project which ignores us and accidentally kills us
Where am I wrong? Happy to be sent stuff to read.

I replied:

"1. AI quickly becomes super clever"
My AI risk model (which is not the same as everyone's) more specifically says:
1a. We'll eventually figure out how to make AI that's 'generally good at science' -- like how humans can do sciences that didn't exist when our brains evolved.
1b. AGI / STEM AI will have a large, fast, and discontinuous impact. Discontinuous because it's a new sort of intelligence (not just AlphaGo 2 or GPT-5); large and fast because STEM is powerful, plus humans suck at STEM and aren't cheap software that scales as you add hardware.
(Warning: argument is compressed for Twitter character count. There are other factors too, like recursive self-improvement.)
"2. Alignment is hard, like getting your great x10 grandchildren to think you're a good person"
I'd say it's hard like building a large/complex, novel software system that exhibits some strong robustness/security properties, on the first try, in the face of adverse optimization.
My recommended reading would be:
https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/
https://intelligence.org/2017/11/26/security-mindset-and-the-logistic-success-curve/
Goal stability over time is part of the problem, but not the core problem. The core problem (for ML) is 'ML models are extremely opaque, and there's no way to robustly get any complex real-world goal into a sufficiently opaque system'. The goal isn't instilled in the first place.
"3. The AI probably embarks on a big project which ignores us and accidentally kills us"
Rather: which deliberately kills us because (1) we're made of atoms that can be used for the project, and (2) we're a threat. (E.g., we could build a rival superintelligence.)

comment by Rob Bensinger (RobbBB) · 2021-03-08T00:15:19.040Z · LW(p) · GW(p)

Chana Messinger, replying to Brandon Bradford:

I find this very deep
"Easy to make everything a conspiracy when you don't know how anything works."
Everything literally is a conspiracy (in some nonstandard technical sense), and if you don't know how anything works, then it's a secret conspiracy.
How does water get to your faucet? How many people are responsible for your internet? What set of events had to transpire to make you late for work? How does one build a microwave?
Something about this points at how complicated everything is and how little we individually know about it.

Replies from: gworley

↑ comment by Gordon Seidoh Worley (gworley) · 2021-03-08T02:26:07.825Z · LW(p) · GW(p)

It's a conspiracy, man. You gotta pay for water that's just sitting there in the ground. It's the corporations tricking people into thinking they gotta drink water that came from their pipes because it's been "sanitized". Wake up sheeple, they're sanitizing your brains!!!

And don't get me started on showing up later for work! We all know I could be there in just 15 minutes if there were no traffic, but you see the corporations make more money the longer you drive your car, so they make sure everyone has to be at work at the same time so there's a traffic jam. Then we all use more fuel, end up buying fast food and other junk because we lost so much time in traffic, and gotta put in extra effort at work because the boss is always mad about how we're showing up late. And to top it all of, all those exhaust fumes are causing ozone holes and global warming which is just what they want because then they can use it to sell you more stuff like sunscreen or to control your life to fight "climate change".

As for the microwave? Well, let's just say it's not a coincidence they blow up if you put tin foil in them. Me? I'm keeping my tin foil right where intended: on my head to block out the thought rays and subliminal messages sent out by their so-called "internet".

comment by Rob Bensinger (RobbBB) · 2020-06-30T15:39:54.084Z · LW(p) · GW(p)

From an April 2019 Facebook discussion:

Rob Bensinger: avacyn:

I think one strong argument in favor of eating meat is that beef cattle (esp. grass-fed) might have net positive lives. If this is true, then the utilitarian line is to 1) eat more beef to increase demand, 2) continue advocating for welfare reforms that will make cows' lives even more positive.
Beef cattle are different than e.g. factory farmed chicken in that they live a long time (around 3 years on average vs 6-7 weeks for broilers), and spend much of their lives grazing on stockers where they might have natural-ish lives.
Another argument in favor of eating beef is that it tends to lead to deforestation, which decreases total wild animal habitat, which one might think are worse than beef farms.

... I love how EA does veganism / animal welfare things. It's really good.

(From the comment section on https://forum.effectivealtruism.org/posts/TyLxMrByKuCmzZx6b/reasons-to-eat-meat [EA · GW])

[... Note that in posting this I'm not intending] to advocate for a specific intervention; it's more that it makes me happy to see thorough and outside-the-box reasoning from folks who are trying to help others, whether or not they have the same background views as me.

Jonathan Salter: Even if this line of reasoning might technically be correct in a narrow, first order effects type way, my intuition tells me that that sort of behaviour would lessen EAs credibility when pushing animal welfare messages, and that spreading general anti-speciesist norms and values to be more important in the long run. Just my two cents though.

Rob Bensinger: My model of what EA should be shooting for is that it should establish its reputation as

'that group of people that engages in wonkish analyses and debates of moral issues at great length, and then actually acts on the conclusions they reach'

'that group of people that does lots of cost-benefit analyses and is willing to consider really counter-intuitive concerns rather than rejecting unusual ideas out of hand'

'that group of people that seems to be super concerned about its actual impact and nailing down all the details, rather than being content with good PR or moral signaling'

I think that's the niche EA would occupy if it were going to have the biggest positive impact in the future. And given how diverse EA is and how many disagreements there already are, the ship may have already sailed on us being able to coordinate and converge on moral interventions without any public discussion of things like wild animal suffering.

This is similar to a respect in which my views have changed about whether EAs and rationalists should become vegan en masse. In the past, I've given arguments like [in Inhuman Altruism and Revenge of the Meat People]:

A lot more EAs and rationalists should go vegan, because it really does seem like future generations will view 21st-century factory farming similar to how we view 19th-century slavery today. It would be great to be "ahead of the curve" for once, and to clearly show that we're not just 'unusually good on some moral questions' but actually morally exemplary in all the important ways that we can achieve.

I think this is really standing in for two different arguments:

First, a reputational argument saying 'veganism is an unusually clear signal that we're willing to take big, costly steps to do the right thing, and that we're not just armchair theorists or insular contrarians; so we should put more value on paying that signal in order to convince other people that we're really serious about this save-the-world, help-others, actually-act-based-on-the-abstract-arguments thing'.

Second, an ideal-advisor-style argument saying 'meat-eating is probably worse than it seems, because the analytic arguments strongly support veganism but social pressure and social intuitions don't back up those arguments, so we probably won't emotionally feel their full moral force'.

One objection I got to the first argument is that it seems like the marginal effort and attention of a lot of EAs could save a lot more lives if it were going to things that can have global effects, rather than small-scale personal effects. The reputational argument weighs against this, but there's a reputational argument going in the other way (I believe due to Katja Grace [update: Katja Grace wrote an in-depth response at the time, but this particular argument seems to be due to Paul Christiano and Oliver Habryka]):

'What makes EA's brand special and distinctive, and puts us in an unusual position to have an outsized impact on the world, is that we're the group that gets really finicky and wonkish about EV and puts its energy into the things that seem highest-EV for the world. Prioritizing personal dietary choices over other, better uses of our time, and especially doing so for reputational or signaling reasons, seem like it actively goes against that unique aspect of EA, which makes it very questionable as a PR venture in the first place.'

This still left me feeling, on a gut level, like the 'history will view us as participants in an atrocity' argument is a strong one -- not as a reputational argument, just as an actual argument (by historical analogy) that there's something morally wrong with participating in factory farming at all, even if we're (in other aspects of our life) trying to actively oppose factory farming or even-larger atrocities.

Since then, a few things have made me feel like the latter argument's force isn't so strong. First, I've updated some on the object level about the probability that different species are conscious and that different species in particular circumstances have net-negative lives (though I still think there's a high-enough-to-be-worth-massively-worrying-about probability that farmed chicken, beef, etc. are all causing immense amounts of suffering).

Second, I've realized that when I've done my 'what would future generations think?' ideal-advisor test in the past, I've actually been doing something weird. I'm taking 21st-century intuitions about which things matter most and are most emotionally salient, and projecting them forward to imagine a society where salience works the same way on the meta level, but the object-level social pressures/dynamics are a bit different. But it seems like that heuristic might have been the wrong one for past generations to use, if they wanted to make proper use of this ideal-advisor heuristic.

Jeremy Bentham's exemplary forward-thinking moral views, for example, seem like a thing you'd achieve by going 'imagine future generations that are just super reasonable and analytical about all these things, and view things as atrocities in proportion to what the strongest arguments say', rather than by drawing analogies to things that present-day people find especially atrocious about the past.

(People who have read Bentham: did Bentham ever use intuition pumps like either of these? If so, did either line of thinking seem like it actually played a role in how he reached his conclusions, as opposed to being arguments for persuading others?)

Imagine instead a future society that's most horrified, above all else, by failures of reasoning process like 'foreseeably allocating attention and effort to something other than the thing that looks highest-EV to you'. Imagine a visceral gut-level reaction to systematic decision-making errors (that foreseeably have very negative EV) even more severe than modernity's most negative gut-level reactions to world events. Those failures of reasoning process, after all, are much more directly in your control (and much more influencable by moral praise and condemnation) than action outcomes. That seems like a hypothetical that pushes in a pretty different direction in a lot of these cases. (And one that converges more with the obvious direct 'just do the best thing' argument, which doesn't need any defense.)

Replies from: habryka4

↑ comment by habryka (habryka4) · 2020-06-30T23:15:47.784Z · LW(p) · GW(p)

I really like the FB crossposts here, and also really like this specific comment. Might be worth polishing it into a top-level post, either here or on the EA Forum sometime.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2020-07-01T00:21:11.373Z · LW(p) · GW(p)

Thanks! :) I'm currently not planning to polish it; part of the appeal of cross-posting from Facebook for me is that I can keep it timeboxed by treating it as an artifact of something I already said. I guess someone else could cannibalize it into a prettier stand-alone post.

↑ comment by lsusr · 2021-06-08T05:57:04.904Z · LW(p) · GW(p)

While your comment was clearly written in good faith, it seems to me like you're missing some context. You recommend that EY recommend that the detractors read books. EY doesn't just recommend people read books. He wrote the equivalent of like three books [? · GW] on the subjects relevant to this conversation in particular which he gives away for free. Also, most of the people in this conversation are already big into reading books.

It is my impression he also helped establish the Center for Applied Rationality, which has the explicit mission of training skills. (I'm not sure if he technically did but he was part of the community which did and he helped promote it in its early days.)

Replies from: elityre

↑ comment by Eli Tyre (elityre) · 2021-06-19T03:29:58.898Z · LW(p) · GW(p)

It is my impression he also helped establish the Center for Applied Rationality, which has the explicit mission of training skills. (I'm not sure if he technically did but he was part of the community which did and he helped promote it in its early days.)

Eliezer was involved with CFAR in the early days, but has not been involved since at least 2016.

comment by Rob Bensinger (RobbBB) · 2020-06-30T15:11:23.611Z · LW(p) · GW(p)

From an April 2019 Facebook discussion:

Rob Bensinger:

Julia Galef: Another one of your posts that has stayed with me is a post in which you were responding to someone's question -- I think the question was, “What are your favorite virtues?” And you described three. They were compassion for yourself; creating conditions where you'll learn the truth; and sovereignty. [...] Can you explain briefly what sovereignty means?
Kelsey Piper: Yeah, so I characterize sovereignty as the virtue of believing yourself qualified to reason about your life, and to reason about the world, and to act based on your understanding of it.
I think it is surprisingly common to feel fundamentally unqualified even to reason about what you like, what makes you happy, which of several activities in front of you you want to do, which of your priorities are really important to you.
I think a lot of people feel the need to answer those questions by asking society what the objectively correct answer is, or trying to understand which answer won't get them in trouble. And so I think it's just really important to learn to answer those questions with what you actually want and what you actually care about. [...]
Julia Galef: One insight that I had from reading your post in particular was that maybe a lot of debates over whether you should "trust your gut” are actually about sovereignty. [...]
Kelsey Piper: Yeah, I definitely think -- maybe replace “trust your gut” with --
Julia Galef: Consult?
Kelsey Piper: Yeah, check in with your gut. Treat your gut as some information.
Julia Galef: Yeah.
Kelsey Piper: And treat making your gut more informative as an important part of your growth as a person. [...] I’ve stewed over lots of hard questions. And I got a sense of when I've tended to be right, and when I tended to be wrong, and that informs my gut and the extent to which I feel able to trust it now.

Source: http://rationallyspeakingpodcast.org/show/rs-230-kelsey-piper-on-big-picture-journalism-covering-the-t.html

Based on https://theunitofcaring.tumblr.com/post/177842591031/what-are-your-favorite-virtues

Spencer Mulesky: Why is this good content? Im not getting it.

Rob Bensinger: That seems hard to summarize!

The "trust your gut" portion maybe obscures the thing I think is important, because it seems more banal and specific. The important thing I think is being pointed at with "sovereignty" is more general than just "notice how you feel about things, and hone your intuitions through experience", though that's certainly a core thing people need to do.

One way of pointing at the more basic thing I have in mind is: by default, humans are pretty bad at being honest with themselves and others; are pretty bad at thinking clearly; are pretty bad at expressing and resolving disagreements, as opposed to conformity/mimicry or unproductive brawls; are pretty bad at taking risks and trying new things; are pretty bad at attending to argument structure, as opposed to status/authority/respectability.

We can build habits and group norms that make it a lot easier to avoid those problems, and to catch ourselves when we slip up. But this generally requires that people see past abstractions like "what I should do" and "what's normal to do" and "what's correct to do" and be able to observe and articulate what concrete things are going on in their head. A common thing that blocks this is that people feel like there's something silly or illegitimate or un-objective about reporting what's really going on inside their heads, so they feel a need to grasp at fake reasons that sound more normal/objective/impartial. Giving EAs/rationalists/etc. more social credit for something like "sovereignty", and giving them language for articulating this ideal, is one way of trying to fight back against this epistemically and instrumentally bad set of norms and mental habits.

Spencer Mulesky: Thanks!

Rob Bensinger: It might also help to give some random examples (with interesting interconnections) where I've found this helpful.

'I'm in a longstanding relationship that's turned sour. But I feel like I can't just leave (or make other changes to my life) because I'm not having enough fun / my life isn't satisfying as many values as I'd like; I feel like I need to find something objectively Bad my partner has done, so that I can feel Justified and Legitimate in leaving.' People often feel like they're not "allowed" to take radical action to improve their lives, because of others' seeming claims on their life.
A lot of the distinct issues raised on https://www.facebook.com/robbensinger/posts/10160749026995447, like Jessica Taylor's worry about using moral debt as a social lever to push people around. In my experience, this is not so dissimilar from the relationship case above; people think about their obligations in fuzzy ways that make it hard to see what they actually want and easy to get trapped by others' claims on their output.
People feel like they're being looked down on or shamed or insufficiently socially rewarded/incentivized/respected for things about how they're trying to do EA. Examples might include 'starting risky projects', 'applying to EA jobs', 'applying to non-EA jobs', 'earning to give', 'not earning to give', 'producing ideas that aren't perfectly vetted or write-ups that aren't perfectly polished'. (See e.g. the comments on https://www.facebook.com/robbensinger/posts/10161249846505447; or for the latter point, https://www.lesswrong.com/posts/7YG9zknYE8di9PuFg/epistemic-tenure [LW · GW] and a bunch of other recent writings share a theme of 'highly productive intellectuals are feeling pressure to not say things publicly until they're super super confident of them').
People feel like they can't (even in private conversation) acknowledge social status and esteem (https://www.facebook.com/robbensinger/posts/10160475363630447), respectability, or 'Ra'-type things (https://srconstantin.wordpress.com/2016/10/20/ra), or how those phenomena affects everyone's preferences or judgments.
Modest epistemology in Inadequate Equilibria (https://equilibriabook.com/toc/)
A lot of CFAR-style personal debugging I've done has depended on my ability to catch myself in the mental motion of punishing myself (or disregarding myself, etc.) on the timescale of 'less than a second'. And then stopping to change that response or analyze why that's happening, so I can drill down better on underlying dispositions I want to improve. Cf. https://wiki.lesswrong.com/wiki/Ugh_field and https://www.lesswrong.com/posts/5dhWhjfxn4tPfFQdi/physical-and-mental-behavior [LW · GW].

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2020-06-30T16:02:05.350Z · LW(p) · GW(p)

Rob Wiblin, August 2019:

It seems to me like guilt and shame function surprisingly poorly as motivators for good work in the modern world. Not only do they often not result in people getting things done at the time, they can create a positive feedback loop that makes people depressed and unproductive for months.
But then why have we evolved to feel them so strongly?
One possibility is that guilt and shame do work well, but their function is to stop us from doing bad things. In a world where there's only a few things you can do, it's clear how to do them all, and the priority is to stay away from a few especially bad options, that's helpful.
But to do good skilled work, it's not enough to know what you shouldn't do — e.g. procrastinate. The main problem is figuring out what out of the million things you might do is the right one, and staying focussed on it. And for that curiosity or excitement or pride are much more effective. You need to be pulled in the right direction, not merely pushed away from doing nothing, or severely violating a social norm.
Another second variation on the same theme would be that modern work is different from the tasks our hunter gatherer ancestors did in all sorts of ways that can make it less motivating. In the past just feeling guilt about e.g. being lazy, was enough to get us to go gather some berries, but now for most of us, it isn't. So guilt fails, and then we feel even more guilty, and then we're even less energetic, so it fails again, etc.
A third possibility is that shame and guilt are primarily about motivating you to fit into a group and go along with its peculiar norms. But in a modern workplace that's not the main thing most of us are lacking. Rather we need to be inspired by something we're working on and give it enough focussed attention long enough to produce an interesting product.
Any other theories? Or maybe you think guilt and shame do work well?

Cf. Brienne Yudkowsky on shame [LW(p) · GW(p)] and the discussion on https://www.facebook.com/robbensinger/posts/10160749026995447.

comment by Rob Bensinger (RobbBB) · 2022-12-14T02:45:07.477Z · LW(p) · GW(p)

Twitter thread collecting examples of alignment research MIRI has said relatively positive things about.

comment by Rob Bensinger (RobbBB) · 2020-12-31T17:25:53.742Z · LW(p) · GW(p)

Copied from some conversations on Twitter:

· · · · · · · · · · · · · · ·

Eric Rogstad: I think "illusionism" is a really misleading term. As far as I can tell, illusionists believe that consciousness is real, but has some diff properties than others believe.

It's like if you called Einstein an "illusionist" w.r.t. space or time.

See my comments here:

https://www.lesswrong.com/posts/biKchmLrkatdBbiH8/book-review-rethinking-consciousness [LW · GW]

Rob Bensinger: I mostly disagree. It's possible to define a theory-neutral notion of 'consciousness', but I think it's just true that 'there's no such thing as subjective awareness / qualia / etc.', and I think this cuts real dang deep into the heart of what most people mean by consciousness.

Before the name illusionism caught on, I had to use the term 'eliminativism', but I had to do a lot of work to clarify that I'm not like old-school eliminativists who think consciousness is obviously or analytically fake. Glad to have a clearer term now.

I think people get caught up in knots about the hard problem of consciousness because they try to gesture at 'the fact that they have subjective awareness', without realizing they're gesturing at something that contains massive introspective misrepresentations / illusions.

Seeing that we have to throw out a key part of Chalmers' explanandum is an important insight for avoiding philosophical knots, even though it doesn't much help us build a positive account. That account matters, but epistemic spring cleaning matters too.

Eric Rogstad:

'the fact that they have subjective awareness', without realizing they're gesturing at something that contains massive introspective misrepresentations / illusions

@robbensinger I don't see how this is so different from my relativity example.

The situation still seems to be that there is a real thing that we're pointing at with "subjective awareness", and also people have a lot of wrong beliefs about it.

I guess the question is whether consciousness is more like "space" or more like "phlogiston".

It's possible to define a theory-neutral notion of 'consciousness'

@robbensinger If we defined a theory-neutral notion of 'consciousness' (to refer to whatever the thing is that causes us to talk about our experiences), would you still want to describe yourself as an 'illusionist' w.r.t that theory?

Keith Frankish: Yes! In fact that's exactly what I do. (The claim isn't that consciousness itself is illusory, only that qualia are)

Rob Bensinger: Yeah, I'm happy to define 'consciousness' in theory-neutral terms and say it exists in some sense. 'Qualia are an illusion' or 'phenomenal consciousness is an illusion' is more precise anyway.

I don't care how we define 'consciousness'. The claims I care about are:

1. Illusionism is asserting something substantive and (even among physicalists) controversial.

2. Illusionism is genuinely denying the existence of something widely seen as real and crucial (and self-evident!).

3. On the object level: phenomenal consciousness is in fact an introspective illusion.

'Phenomenal consciousness isn't real', 'phenomenal consciousness is real and reducible', and 'phenomenal consciousness is real and fundamental' are three substantively different views.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2020-12-31T17:28:02.003Z · LW(p) · GW(p)

Hrothgar: What's your answer to the hard problem of consciousness?

Rob Bensinger: The hard problem makes sense, and seems to successfully do away with 'consciousness is real and reducible'. But 'consciousness is real and irreducible' isn't tenable: it either implies violations of physics as we know it (interactionism), or implies we can't know we're conscious (epiphenomenalism).

So we seem to be forced to accept that consciousness (of the sort cited in the hard problem) is somehow illusory. This is... very weird and hard to wrap one's head around. But some version of this view (illusionism) seems incredibly hard to avoid.

(Note: This is a twitter-length statement of my view, so it leaves out a lot of details. E.g., I think panpsychist views must be interactionist or epiphenomenalist, in the sense that matters. But this isn't trivial to establish.)

Hrothgar: What does "illusory" mean here? I think I'm interpreting as gesturing toward denying consciousness is happening, which is, like, the one thing that can't even be doubted (since the experience of doubt requires a conscious experiencer in the first place)

Rob Bensinger: I think "the fact that I'm having an experience" seems undeniable. E.g., it seems to just be a fact that I'm experiencing this exact color of redness as I look at the chair next to me. There's a long philosophical tradition of treating experience as 'directly given', the foundation on which all our other knowledge is built.

I find this super compelling and intuitive at a glance, even if I can't explain how you'd actually build a brain/computer that has infallible 'directly given' knowledge about some of its inner workings.

But I think the arguments alluded to above ultimately force us to reject this picture, and endorse the crazy-sounding view 'the character of my own experiences can be illusory, even though it seems obviously directly given'.

An attempt to clarify what this means: https://nothingismere.com/2017/02/23/phenomenal-consciousness-is-a-quasiperceptual-illusion-objections-and-replies/

I don't want to endorse the obviously false claim 'light isn't bouncing off the chair, hitting my eyes, and getting processed as environmental information by my brain.'

My brain is tracking facts about the environment. And it can accurately model many, many things about itself!

But I think my brain's native self-modeling gets two things wrong: (1) it models my subjective experience as a sort of concrete, 'manifest' inner world; (2) it represents this world as having properties that are too specific or arbitrary to logically follow from 'mere physics'.

I think there is a genuine perception-like (not 'hunch-like') introspective illusion that makes those things appear to be true (to people who are decent introspectors and have thought through the implications) -- even though they're not true. Like a metacognitive optical illusion.

And yes, this sounds totally incoherent from the traditional Descartes-inspired philosophical vantage point.

Optical illusions are fine; calling consciousness itself an illusion invites the question 'what is conscious of this illusion?'.

I nonetheless think this weird view is right.

I want to say: There's of course something going on here; and the things that seems present in my visual field must correspond to real things insofar as they have the potential to affect my actions. But my visual field as-it-appears-to-me isn't a real movie screen playing for an inner Me.

And what's more, the movie screen isn't translatable into neural firings that encode all the 'given'-seeming stuff. (!)

The movie screen is a lie the brain tells itself -- tells itself at the sensory, raw-feel level, not just at the belief/hunch level. (Illusion, rather than delusion.)

And (somehow! this isn't intuitive to me either!) since there's no homunculus outside the brain to notice all this, there's no 'check' on the brain forcing it to not trick itself in how it represents the most basic features of 'experience' to itself.

The way the brain models itself is entirely a product of the functioning of that very brain, with no law of physics or CS to guarantee the truth of anything! No matter how counter-intuitive that seems to the brain itself. (And yes, it's still counter-intuitive to me. I wouldn't endorse this view if I didn't think the alternatives were even worse!)

Core argument:

1. a Bayesian view of cognition. 'the exact redness of red' has to cause brain changes, or our brains can't know about it.

2. we know enough about physics to know these causes aren't coming from outside of physics.

3. hard problem: 'the exact redness of red' isn't reducible.

Thus, 'the exact redness of red' must somehow not be real. Secondarily, we can circle back and consider things that help make sense of this conclusion and help show it isn't nonsense:

4. thinking in detail about what cognition goes on in p-zombies' heads that makes them think there's a hard problem.

5. questioning the claim that (e.g.) my visual field is 'directly given' to me in an infallible way. questioning how you could design a computer that genuinely has infallible access to its internal states.

Hrothgar: But even if I grant that experience is illusion, the fact of 'experiencing illusion' is itself then undeniable. I don't consider it a philosophical tradition, just a description of reality 🤷

Whether this reconciles with physics etc seems like a downstream problem

Reading what you wrote again, I think it's likely I'm misunderstanding you.

What you're saying seems crazy or nonsensical to me, and/but I'm super appreciative that you wrote this all out, and I do intend to spend more time with your words (now or later) to see if i can catch more of your drift

(I don't claim to have it all figured out)

Rob Bensinger: Good, if it sounds crazy/nonsensical then I suspect that (a) I've communicated well, and (b) we share key background context: 'why does consciousness seem obviously real?', 'why does the hard problem seem so hard?', etc.

If my claims seemed obviously true, I'd be worried.

Hrothgar: I haven't read your blog post yet, but i suppose my main objection right now is something like, "Thinking is itself sensorial in nature, & that nature precedes its content. Effectively it seems like you're using thinking to try to refute thinking, & we get into gödel problems"

Rob Bensinger: I agree that thinking has an (apparent) phenomenal character, like e.g. seeing.

I don't think that per se raises a special problem. A calculator could introspect on its acts of calculating and wrongly perceive them as 'fluffy' or 'flibulous', while still getting 2+2=4 right.

Hrothgar: Why would fluffy or flibulous be wrong? I don't see what correctness has to do with it (fluffiness is neither wrong nor right) -- where is there a logical basis to evaluate "correctness" of that which isn't a proposition?

Rob Bensinger: If we take 'fluffy' literally, then the computations can't be fluffy because they aren't physical. It's possible to think that some property holds of your thoughts, when it simply doesn't.

Replies from: TAG

↑ comment by TAG · 2021-01-01T01:10:51.388Z · LW(p) · GW(p)

But ‘consciousness is real and irreducible’ isn’t tenable: it either implies violations of physics as we know it (interactionism), or implies we can’t know we’re conscious (epiphenomenalism).

Edit: What it implies is violations of physicalism. You can accept that physics is a map that predicts observations, without accepting that it is the map, to which all other maps must be reduced.

The epiphenomenalist worry is that, if qualia are not denied entirely, they have no causal role to play, since physical causation already accounts for everything that needs to be accounted for.

But physics is a set of theories and descriptions...a map. Usually, the ability of a map to explain and is not exclusive of another map's ability to do so on. We can explain the death of Mr Smith as the result of bullet entering his heart, or as the result of a finger squeezing a trigger, or a a result of the insurance policy recently taken out on his life, and so on.

So why can't we resolve the epiphenomenal worry by saying that that physical causation and mental causation are just different, non rivalrous, maps? I screamed because my pain fibres fired" alongside -- not versus "I screamed becaue I felt a sharp pain". It is not the case that there is physical stuff that is doing all the causation, and mental stuff that is doing none of it: rather there is a physical view of what is going on, and a mentalistic view.

Physicalists are reluctant to go down this route, because physicalism is based on the idea that there is something special about the physical map, which means it is not just another map. This special quality means that a physical explanation excludes others, unlike a typical map. But what is it?

It's rooted in reductionism, the idea that every other map (that is, every theory of the special sciences) can or should reduce to the physical map.

But the reducibility of consciousness is the center of the Hard Problem. If consciousness really is irreducible, and not just unreduced, then that is evidence against the reduction of everything to the physical, and, in turn, evidence against the special, exclusive nature of the physical map.

So, without the reducibility of consciousness, the epiphenomenal worry can be resolved by the two-view manoeuvre. (And without denying the very existence of qualia).

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2021-01-01T03:34:23.271Z · LW(p) · GW(p)

If the physics map doesn't imply the mind map (because of the zombie argument, the Mary's room argument, etc.), then how do you come to know about the mind map? The causal process by which you come to know the physics map is easy to understand:

Light leaves the Sun and strikes your shoelaces and bounces off; some photons enter the pupils of your eyes and strike your retina; the energy of the photons triggers neural impulses; the neural impulses are transmitted to the visual-processing areas of the brain; and there the optical information is processed and reconstructed into a 3D model that is recognized as an untied shoelace.

What is the version of this story for the mind map, once we assume that the mind map has contents that have no causal effect on the physical world? (E.g., your mind map had absolutely no effect on the words you typed into the LW page.)

At some point you didn't have a concept for "qualia"; how did you learn it, if your qualia have no causal effects?

At some point you heard about the zombie argument and concluded "ah yes, my mental map must be logically independent of my physical map"; how did you do that without your mental map having any effects?

I can imagine an interactionist video game, where my brain has more processing power than the game and therefore can't be fully represented in the game itself. It would then make sense that I can talk about properties that don't exist within the game's engine: I myself exist outside the game universe, and I can use that fact to causally change the game's outcomes in ways that a less computationally powerful agent could not.

Equally, I can imagine an epiphenomenal video game, where I'm strapped into a headset but forbidden from using the controls. I passively watch the events occurring in the game; but no event in the game ever reflects or takes note of the fact that I exist or have any 'unphysical' properties, and if there is an AI steering my avatar or camera's behavior, the AI knows zilch about me. (You could imagine a programmer deliberately designing the game to have NPCs talk about entities outside the game world; but then the programmer's game-transcending cognitive capacities are not epiphenomenal relative to the game.)

The thing that doesn't make sense is to import intuitions from the interactionist game to the epiphenomenal game, while insisting it's all still epiphenomenal.

Replies from: TAG

↑ comment by TAG · 2021-01-03T00:59:22.663Z · LW(p) · GW(p)

If the physics map doesn’t imply the mind map (because of the zombie argument, the Mary’s room argument, etc.), then how do you come to know about the mind map?

Direct evidence. That's the starting point of the whole thing. People think that they have qualia because it seems to them that they do.

Edit: In fact, it's the other way round: we are always using the mind map, but we remove the subjectivity, "warm fuzzies" from it to arrive at the physics map. Ho wdo we know that physics is the whole story, when we start with our experience, and make a subset of it?

What is the version of this story for the mind map, once we assume that the mind map has contents that have no causal effect on the physical world?

I'm not assuming that. I'm arguing against epiphenomenalism.

So I am saying that the mental is causal, but I am not saying that it is a kind of physical causality, as per reductive physicalism. Reductive physicalism is false because consciousness is irreducible, as you agree. Since mental causation isn't a kind of physical causation, I don't have to give a physical account if it.

And I am further not saying that the physical and mental are two separate ontologcal domains, two separate territories. I am talking about maps, not territories.

Without ontological dualism, there are no issues of overdetermination or interaction.

comment by Rob Bensinger (RobbBB) · 2021-04-22T11:24:24.259Z · LW(p) · GW(p)

It's apparently not true that 90% of startups fail. From Ben Kuhn:

Hot take: the outside view is overrated.
(“Outside view” = e.g. asking “what % of startups succeed?” and assuming that’s ~= your chance of success.)
In theory it seems obviously useful. In practice, it makes people underrate themselves and prematurely give up their ambition.
One problem is that finding the right comparison group is hard.

For instance, in one commonly-cited statistic that “90% of startups fail,” (https://www.national.biz/2019-small-business-failure-rate-startup-statistics-industry/) “startup” meant all newly-started small businesses including eg groceries. Failure rates vary wildly by industry!
But it’s worse than that. Even if you knew the failure rate specifically of venture-funded tech startups, that leaves out a ton of important info. Consumer startups probably fail at a higher rate than B2B startups. Single-founder startups probably fail more than 2-founder ones.
OK, so what you really need to do is look at the failure rate of 2-founder B2B startups, right?

The problem is that nearly all the relevant information about whether your startup is going to succeed isn’t encoded in your membership in some legible subgroup like that.
Instead, it’s encoded in things like: How likely are you to break up with your cofounder? How good will you be at hiring? How determined are you to never ever give up?

Outside view fans anchor far too strongly on the base rate, and don’t update enough on inside views like these.
“If only 10% of startups succeed, how could I claim a 50% chance? That’d imply I’ve observed evidence with a 5:1 odds ratio! How presumptuous!”

Actually, evidence way stronger than 5:1 is everywhere: markxu.com/strong-evidence

(It’s rarer for startups than for names, but still.)
[...]

According to an answer on Quora, "the real percentage of venture-backed startups that fail—as defined by companies that provide a 1X return or less to investors—has not risen above 60% since 2001" (2017 source). Other Quora answers I saw claimed numbers as high as 98% (on different definitions, like ten-year "failure" (survival?) rate), but didn't cite their sources.

Replies from: romeostevensit

↑ comment by romeostevensit · 2021-04-27T23:29:59.325Z · LW(p) · GW(p)

I don't have a cite handy as it's memories from 2014 but when I looked into it I recall the 7 year failure rate excluding the obvious dumb stuff like restaurants was something like 70% but importantly the 70% number included acquisitions, so the actual failure rate was something like 60 ish.

comment by Rob Bensinger (RobbBB) · 2022-07-29T21:47:55.776Z · LW(p) · GW(p)

A blurb for the book "The Feeling of Value":

This revolutionary treatise starts from one fundamental premise: that our phenomenal consciousness includes direct experience of value. For too long, ethical theorists have looked for value in external states of affairs or reduced value to a projection of the mind onto these same external states of affairs. The result, unsurprisingly, is widespread antirealism about ethics.
In this book, Sharon Hewitt Rawlette turns our metaethical gaze inward and dares us to consider that value, rather than being something “out there,” is a quality woven into the very fabric of our conscious experience, in a highly objective way. On this view, our experiences of pleasure and pain, joy and sorrow, ecstasy and despair are not signs of value or disvalue. They are instantiations of value and disvalue. When we feel pleasure, we are feeling intrinsic goodness itself. And it is from such feelings, argues Rawlette, that we derive the basic content of our normative concepts—that we understand what it means for something to be intrinsically good or bad.
Rawlette thus defends a version of analytic descriptivism. And argues that this view, unlike previous theories of moral realism, has the resources to explain where our concept of intrinsic value comes from and how we know when it objectively applies, as well as why we sometimes make mistakes in applying it. She defends this view against G. E. Moore’s Open Question Argument as well as shows how these basic facts about intrinsic value can ground facts about instrumental value and value “all things considered.” Ultimately, her view offers us the possibility of a robust metaphysical and epistemological justification for many of our strongest moral convictions.

My reply: Sounds descriptively false? I prefer lots of things that aren't a matter of valenced experience.

You don't need to have "direct experience" of all moral properties in order to be a moral realist, any more than you need direct experience of porcupines in order to be a porcupine realist. You can just acknowledge that moral knowledge is inferred indirectly, the same as our knowledge of most things. Seems like a classic case of philosophers reaching weird conclusions because they're desperate for certainty (rather than embracing Bayesian/probabilistic ways of thinking about stuff).

Likewise, you don't need direct experience or certainty in order to reconcile "is" and "ought". Just accept that "ought" facts look like hypothetical imperatives that we happen to care about a lot, or look like rules-of-a-game that we happen to deeply endorse everyone always playing. No deep riddles are created by the fact that "what is a legal move in chess?" is not reducible to conjunctions of claims about our universe's boundary conditions and laws of physics; we just treat chess-claims like math/logic claims and carry on with our lives. Treating our moral claims (insofar as they're coherent and consistent) in the same way raises no special difficulties.

This also feels to me like an example of the common philosopher-error "notice a super important fact about morality, and rush (in your excitement) to conclude that this must therefore be the important fact about morality, the one big thing morality is About".

Human morality is immensely complicated, because human brains are a complicated, incoherent mishmash of innumerable interacting preferences and experiences. Valenced experience indeed seems to be a super important piece of that puzzle, but we can acknowledge that without pretending that it Solves Everything or Exhausts The Phenomenon.

I suspect moral philosophy would have made a lot more progress by now if philosophers spent more of their time adding to the pool of claims about morality (so we can build a full understanding of what phenomenon we need to explain / account for in the first place), and less time trying to reduce all of those claims to a single simple principle.

In principle, I love theorizing and philosophizing about this stuff. In practice, seems like people have a strong tendency to fall in love with the first Grand Theory of Everything they discover in this domain, causing progress to stagnate (and unrealistic views to proliferate) relative to if we had more modest goals. Less "try to reduce all of morality to virtue cultivation", more "try to marginally improve our understanding of what virtue cultivation consists in".

comment by Rob Bensinger (RobbBB) · 2021-08-27T02:41:29.089Z · LW(p) · GW(p)

Ben Weinstein-Raun wrote on social media:

It seems to me that the basic appeal of panpsychism goes like "It seems really weird that you can put together some apparently unfeeling pieces, and then out comes this thing that feels. Maybe those things aren't actually unfeeling? That would sort of explain where the feeling-ness comes from."
But this feels kind of analogous to a being that doesn't have a good theory about houses, but is aware that some things are houses and some things aren't, by their experiences of those things. Such a being might analogously reason that *everything* is a little bit house-y. Panhousism isn't exactly wrong, but it's not actually very enlightening. It doesn't explain how the houseyness of a tree is increased when you rearrange the tree to be a log cabin. In fact it might naively want to deny that the total houseyness is increased.

I think panpsychism is outrageously false, and profoundly misguided as an approach to the hard problem. But I think I can describe it in a way that makes its appeal more obvious:

When I introspect, it seems like there's a lot of complexity to my experiences. But it doesn't seem like a complex fact that that I'm conscious at all -- the distinction between conscious and unconscious seems very basic. It feels like there's 'what it's like to be an algorithm from the inside', and then there's the causal behavior of the algorithm, and that's all there is to it.

And thought experiments show that 'how an algorithm feels from inside' can't be exhaustively reduced to any functional/causal/'external' behavior. (See the knowledge argument, the zombie argument, etc.)

So maybe I should just think of 'algorithms have an inside' as a basic feature of the universe, an extension of 'everything has an inside (so to speak) and the only reason I feel "special" is that I happen to be this one part of the universe'.

Panprotopsychism might make this an easier pill to swallow. We can say that electrons aren't "conscious" in the fashion of a human; but they maybe have an "inside" (in the sense of "how an algorithm feels from inside") in the most rudimentary and abstract way possible, and the more complex "inside" of things like humans is built up out of all these smaller, more bare-bones "insides".

We can imagine that "how an electron feels from inside" is like an empty room. There's an "inside", but there's no content to it, just empty structure. This structure can then produce some amazing things, if you arrange an enormous number of parts in just the right way; but the important thing is to start from the kind of universe that has an "inside" at all, as opposed to the zombie universe.

The problem with this view, as usual, is that we've assumed that the "inside" can't causally affect the "outside"; and yet I just wrote a bunch of paragraphs about the "inside", presumably based on some knowledge I have about that inside.

So either my paragraphs must be straightforwardly false; or I must be confused about what I'm discussing (ascribing attributes to a nonfunctional, nonphysical thing that are really just run-of-the-mill descriptions of physical facts); or my statements must be miraculously true, even though there is no explanation for why I would have any knowledge of the things I'm discussing.

Without some further argument to make sense of the miracle, I think we have to reject the miracle, even if we still feel confused about what's actually going on with phenomenal consciousness.

Replies from: Natália Mendonça, TAG

↑ comment by Natália (Natália Mendonça) · 2021-08-27T03:24:40.625Z · LW(p) · GW(p)

I think panpsychism is outrageously false, and profoundly misguided as an approach to the hard problem.

What do you think of Brian Tomasik's flavor of panpsychism, which he says is compatible with (and, indeed, follows from) type-A materialism? As he puts it,

It's unsurprising that a type-A physicalist should attribute nonzero consciousness to all systems. After all, "consciousness" is a concept -- a "cluster in thingspace" -- and all points in thingspace are less than infinitely far away from the centroid of the "consciousness" cluster. By a similar argument, we might say that any system displays nonzero similarity to any concept (except maybe for strictly partitioned concepts that map onto the universe's fundamental ontology, like the difference between matter vs. antimatter). Panpsychism on consciousness is just one particular example of that principle.

Replies from: RobbBB, Natália Mendonça

↑ comment by Rob Bensinger (RobbBB) · 2021-08-28T10:59:53.225Z · LW(p) · GW(p)

I haven't read Brian Tomasik's thoughts on this, so let me know if you think I'm misunderstanding him / should read more.

The hard problem of consciousness at least gives us a prima facie reason to consider panpsychism. (Though I think this ultimately falls apart when we consider 'we couldn't know about the hard problem of consciousness if non-interactionist panpsychism were true; and interactionist panpsychism would mean new, detectable physics'.)

If we deny the hard problem, then I don't see any reason to give panpsychism any consideration in the first place. We could distinguish two panpsychist views here: 'trivial' (doesn't have any practical implications, just amounts to defining 'consciousness' so broadly as to include anything and everything); and 'nontrivial' (has practical implications, or at least the potential for such; e.g., perhaps the revelation that panpsychism is true should cause us to treat electrons as moral patients, with their own rights and/or their own welfare).

But I see no reason whatsoever to think that electrons are moral patients, or that electrons have any other nontrivial mental property. The mere fact that we don't fully understand how human brains work is not a reason to ask whether there's some new undiscovered feature of particles times smaller than a human brain that explains the comically larger macro-process -- any more than limitations in our understanding of stomachs would be a reason to ask whether individual electrons have some hidden digestive properties.

↑ comment by Natália (Natália Mendonça) · 2021-08-27T04:13:04.401Z · LW(p) · GW(p)

(Brian Tomasik's view superficially sounds a lot like what Ben Weinstein-Raun is criticizing in his second paragraph, so I thought I'd add here the comment I wrote in response to Ben's post:

> Panhousism isn't exactly wrong, but it's not actually very enlightening. It doesn't explain how the houseyness of a tree is increased when you rearrange the tree to be a log cabin. In fact it might naively want to deny that the total houseyness is increased.

I really don’t see how that is what panhousism would say, at least what I have in mind when I think of panhousism (which is analogous to what I have in mind when I think of (type-A materialist[1]) panpsychism). If all that panhousism means is that (1) “house” is a cluster in thingspace and (2) nothing is infinitely far away from the centroid of the “house” cluster, then it seems very obvious to me that the distance of a tree from the “house” centroid decreases if you rearrange the tree into a log cabin. As an example, focus on the “suitability to protect humans from rain” dimension in thingspace. It’s very clear to me that turning a tree into a log cabin moves it closer to the “house” cluster in that dimension. And the same principle applies to all other dimensions. So I don’t see your point here.

I'm not sure if I should quote Ben's reply to me, since his post is not public, but he pretty much said that his original post was not addressing type-A physicalist panpsychism, although he finds this view unuseful for other reasons.

)

Replies from: Brian_Tomasik

↑ comment by Brian_Tomasik · 2021-08-29T02:40:35.661Z · LW(p) · GW(p)

Thanks for sharing. :) Yeah, it seems like most people have in mind type-F monism when they refer to panpsychism, since that's the kind of panpsychism that's growing in popularity in philosophy in recent years. I agree with Rob's reasons for rejecting that view.

↑ comment by TAG · 2021-09-02T00:26:10.340Z · LW(p) · GW(p)

There's another theory that isn't even on Chalmers's list: dual aspect neutral monism.

This holds that the physical sciences are one possible map of territory which is not itself, intrinsically, physical (or, for that matter, mental). Consciousness is another map, or aspect.

This approach has the advantage of dualism, in that there is no longer a need to explain the mental in terms of the physical, to reduce it to the physical, because the physical is no longer regarded as fundamental (nor is the mental, hence the "neutral"). Although an ontological identity between the physical and mental is accepted, the epistemic irreducibility of the mental to the physical is also accepted. Physicalism, in the sense that the physical sciences have a unique and priveleged explanatory role, is therefore rejected.

Nonetheless, the fact that the physical sciences "work" in many ways, that the physical map can be accurate, is retained. Moreover, since Dual Aspect theory is not fully fledged dualism, it is able to sidestep most or all of the standard objections to dualism.

To take one example, since the a conscious mental state and physical brain state are ultimately the same thing, the expected relationships hold between them. For instance, mental states cannot vary without some change in the physical state (supervenience follows directly from identity, without any special apparatus); furthermore, since mental states are ultimately identical to physical brain states, they share the causal powers of brain states (again without the need to posit special explanatory apparatus such as "psychophysical laws")This holds that the physical sciences are one possible map of territory which is not itself, intrinsically, physical (or, for that matter, mental). Consciousness is another map, or aspect.

The more familiar kinds of dualism are substance and property dualism. Both take a physical ontology "as is" and add something extra, and both have problems with explaining how the additional substances or properties interact with physical substances and properties, and both of course have problems with ontological parsimony (Occam's Razor).

In contrast to a substance or property, an aspect is a relational kind of thing. In Dual Aspect theory, a conscious state is interpreted as being based on the kind of relationship and entity has with itself, and the kind of interaction it has with itself. The physical is reinterpreted as a kind of interaction with and relation to the external. It is not clear whether this theory adds anything fundamentally new, ontologically, since most people will accept the existence of some kind of inner/outer distinction, although the distinction may be made to do more work in Dual Aspect theory. Reinterpreting the physical is a genuine third alternative to accepting (only) the physical, denying the physical, and suplementing the physical.

comment by Rob Bensinger (RobbBB) · 2020-06-30T16:56:08.932Z · LW(p) · GW(p)

[Epistemic status: Thinking out loud, just for fun, without having done any scholarship on the topic at all.]

It seems like a lot of horror games/movies are converging on things like 'old people', 'diseased-looking people', 'psychologically ill people', 'women', 'children', 'dolls', etc. as particularly scary.

Why would that be, from an evolutionary perspective? If horror is about fear, and fear is about protecting the fearful from threats, why would weird / uncanny / out-of-evolutionary-distribution threats have a bigger impact than e.g. 'lots of human warriors coming to attack you' or 'a big predator-looking thing stalking you', which are closer to the biggest things you'd want to worry about in our environment of evolutionary adaptedness? Why are shambling, decrepit things more of a horror staple than big bulky things with claws?

(I mean, both are popular, so maybe this isn't a real phenomenon. I at least subjectively feel as though those uncanny things are scarier than super-lions or super-snakes.)

Maybe we should distinguish between two clusters of fear-ish emotions:

Terror. This is closer to the fight-or-flight response of 'act quick because you're in imminent danger'. It's a panicky 'go go go go go!!' type of feeling, like when a jumpscare happens or when you're running from a monster in a game.
Dread. This is more like feeling freaked out or creeped out, and it can occur alongside terror, or it can occur separately. It seems to be triggered less by 'imminent danger' than by ambiguous warning signs of danger.

So, a first question is why uncanny, mysterious, 'unnatural' phenomena often cause the most dread, even though they thereby become less similar to phenomena that actually posed the largest dangers to us ancestrally. (E.g., big hulking people with giant spears or snakes/dragons or werewolves correlate more with 'things dangerous to our ancestors' than decrepit zombies. Sure, creepiness maybe requires that the threat be 'ambiguous', but then why isn't an ambiguous shadow of a maybe-snake or maybe-hulking-monster creepier than an obviously-frail/sickly monster?)

Plausibly part of the answer is that more mysterious, inexplicable phenomena are harder to control, and dread is the brain's way of saying something like 'this situation looks hard to control in a way that makes me want to avoid situations like this'.

Terror-inspiring things like jumpscares have relatively simple triggers corresponding to a relatively simple response -- usually fleeing. Dread-inspiring things like 'the local wildlife has gotten eerily quiet' have subtler and more context-sensitive triggers corresponding to responses like 'don't necessarily rush into any hasty action, but do pay extra close attention to your environment, and if you can do something to get away from the stimuli that are giving you these unpleasant uneasy feelings, maybe prioritize doing that'.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2020-06-30T17:05:03.121Z · LW(p) · GW(p)

A second question is why horror films and games seem to be increasingly converging on the creepy/uncanny/mysterious cluster of things, rather than on the overtly physically threatening cluster -- assuming this is a real trend. Some hypotheses about the second question:

A: Horror games and movies are increasingly optimizing for dread instead of terror these days, maybe because it's novel -- pure terror feels overdone and out-of-fashion. Or because dread just lends itself to a more fun multi-hour viewing/playing experience, because it's more of a 'slow burn'. Or something else.
B: Horror games aren't optimizing for dread to the exclusion of terror; rather, they've discovered that dread is a better way to maximize terror.

Why would B be true?

One just-so story you could tell is that humans have multiple responses to possible dangers, ranging from 'do some Machiavellian scheming to undermine a political rival' to 'avoid eating that weird-smelling food' to 'be cautious near that precipice' to 'attack' to 'flee'. Different emotions correspond to different priors on 'what reaction is likeliest to be warranted here?', and different movie genres optimize for different sets of emotions. And optimizing for a particular emotion usually involves steering clear of things that prime a person to experience a different emotion -- people want a 'purer' experience.

So one possibility is: big muscular agents, lion-like agents, etc. are likelier to be dangerous (in reality) than a decrepit corpse or a creepy child or a mysterious frail woman; but the correct response to hulking masculine agents is much more mixed between 'fight / confront' and 'run away / avoid', whereas the correct response to situations that evoke disgust, anxiety, uncertainty, and dread is a lot more skewed toward 'run away / avoid'. And an excess of jumpscare-ish, heart-pounding terror does tend to incline people more toward running away than toward fighting back, so it might be that both terror and dread are better optimized in tandem, while 'fight back' partly competes with terror.

On this view, 'ratchet up the intensity of danger' matters less for fear intensity than 'eliminate likely responses to the danger other than being extra-alert or fleeing'.

... Maybe because movie/game-makers these days just find it really easy to max out our danger-intensity detectors regardless? Pretty much everything in horror movies is pretty deadly relative to the kind of thing you'd regularly encounter in the ancestral environment, and group sizes in horror contexts tend to be smaller than ancestral group sizes.

People who want to enjoy the emotions corresponding purely to the 'fight' response might be likelier to watch things like action movies. And indeed, action movies don't make much use of jumpscares or terror (though they do like tension and adrenaline-pumping intensity).

Or perhaps there's something more general going on, like:

Hypothesis C: Dread increases 'general arousal / sensitivity to environmental stimuli', and then terror can piggy-back off of that and get bigger scares.

Perhaps emotions like 'disgust' and 'uncertainty' also have this property, hence why horror movies often combine dread, disgust, and uncertainty with conventional terror. In contrast, hypothesis B seems to suggest that we should expect disgust and terror to mostly show up in disjoint sets of movies/games, because the correct response to 'disease-ish things' and the correct response to 'physical attackers' is very different.

comment by Rob Bensinger (RobbBB) · 2019-05-10T23:13:24.150Z · LW(p) · GW(p)

The wiki glossary for the sequences / Rationality: A-Z ( https://wiki.lesswrong.com/wiki/RAZ_Glossary ) is updated now with the glossary entries from the print edition of vol. 1-2.

New entries from Map and Territory:

anthropics, availability heuristic, Bayes's theorem, Bayesian, Bayesian updating, bit, Blue and Green, calibration, causal decision theory, cognitive bias, conditional probability, confirmation bias, conjunction fallacy, deontology, directed acyclic graph, elan vital, Everett branch, expected value, Fermi paradox, foozality, hindsight bias, inductive bias, instrumental, intentionality, isomorphism, Kolmogorov complexity, likelihood, maximum-entropy probability distribution, probability distribution, statistical bias, two-boxing

New entries from How to Actually Change Your Mind:

affect heuristic, causal graph, correspondence bias, epistemology, existential risk, frequentism, Friendly AI, group selection, halo effect, humility, intelligence explosion, joint probability distribution, just-world fallacy, koan, many-worlds interpretation, modesty, transhuman

A bunch of other entries from the M&T and HACYM glossaries were already on the wiki; most of these have been improved a bit or made more concise.

Replies from: SaidAchmiz, RobbBB

↑ comment by Said Achmiz (SaidAchmiz) · 2019-05-11T07:41:08.370Z · LW(p) · GW(p)

This reminds me of something I’ve been meaning to ask:

Last I checked, the contents of the Less Wrong Wiki were licensed under the GNU Free Documentation License, which is… rather inconvenient. Is it at all possible to re-license it (ideally as CC BY-NC-SA, to match R:AZ itself)?

(My interest in this comes from the fact that the Glossary is mirrored on ReadTheSequences.com, and I’d prefer not to have to deal with two different licenses, as I currently have to.)

Replies from: habryka4

↑ comment by habryka (habryka4) · 2019-05-11T08:24:11.348Z · LW(p) · GW(p)

I can reach out to Trike Apps about this, but can we actually do this? Seems plausible that we would have to ask for permission from all editors involved in a page before we can change the license.

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2019-05-11T08:50:40.021Z · LW(p) · GW(p)

I have no idea; I cannot claim to really understand the GFDL well enough to know… but if doable, this seems worthwhile, as there’s a lot of material on the wiki which I and others could do various useful/interesting things with, if it were released under a convenient license.

↑ comment by Rob Bensinger (RobbBB) · 2019-05-10T23:19:00.628Z · LW(p) · GW(p)

Are there any other OK-quality rationalist glossaries out there? https://wiki.lesswrong.com/wiki/Jargon is the only one I know of. I vaguely recall there being one on http://www.bayrationality.com/ at some point, but I might be misremembering.

Replies from: SaidAchmiz, jimrandomh

↑ comment by Said Achmiz (SaidAchmiz) · 2019-05-11T07:36:03.246Z · LW(p) · GW(p)

https://namespace.obormot.net/Jargon/Jargon

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2019-05-11T20:21:03.506Z · LW(p) · GW(p)

Fantastic!

↑ comment by jimrandomh · 2019-05-11T00:31:56.540Z · LW(p) · GW(p)

It's optimized on a *very* different axis, but there's the Rationality Cardinality card database.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2019-05-11T20:18:55.445Z · LW(p) · GW(p)

That counts! :) Part of why I'm asking is in case we want to build a proper LW glossary, and Rationality Cardinality could at least provide ideas for terms we might be missing.

comment by Rob Bensinger (RobbBB) · 2022-02-26T23:43:40.498Z · LW(p) · GW(p)

Jeffrey Ladish asked on Twitter:

Do you think the singularity (technological singularity) is a useful term? I've been seeing it used less among people talking about the future of humanity and I don't understand why. Many people still think an intelligence explosion is likely, even if it's "slow"

I replied:

'Singularity' was vague (https://intelligence.org/2007/09/30/three-major-singularity-schools/) and got too associated with Kurzweilian magical thinking, so MIRI switched to something like:
'rapid capability gain' = progress from pretty-low-impact AI to astronomically high-impact AI is fast in absolute terms
'hard takeoff' = rapid capability gain that's discontinuous
'intelligence explosion' = hard takeoff via recursive self-improvement
Eliezer says:
"'Rapid capability gain' is, I'd say, going from 'capable enough to do moderately neat non-pivotal world-affecting things' to 'capable enough to destroy world' quickly in absolute terms.
"I don't think it's about 'subhuman' because, like, is Alpha Zero subhuman, things go superhuman in bits and pieces until, in some sense, all hell breaks loose"

FOOM is a synonym for intelligence explosion, based on the analogy where an AGI recursively self-improving to superintelligence is like a nuclear pile going critical.

Sometimes people also talk about already-pretty-smart-and-impactful AI "going FOOM", which I take to mean that they're shooting off to even higher capability levels.

Jeffrey replied:

Okay I appreciate these distinctions. I think the difficulty of replacing "singularity" with "intelligence explosion" is that the latter sounds like a process rather than an outcome. I want to refer to the outcome

To which Nate Soares responded:

In my vocab, "singularity" refers to something more like an event (& the term comes from Vinge noting that the dawn of superintelligence obscures our predictive vision). I still use "singularity" for the event, and "post-singularity" for the time regime.

Replies from: Dagon

↑ comment by Dagon · 2022-02-28T17:48:10.408Z · LW(p) · GW(p)

I suspect there's a school of thought for which "singularity" was massively overoptimistic - is this what you mean by Kurzweilian magical thinking? That it's a transition in a very short period of time from scarcity-based capitalism to post-scarcity utopia. Rather than a simple destruction of most of humanity, and of the freedom and value of those remaining.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2022-02-28T19:31:46.015Z · LW(p) · GW(p)

That it's a transition in a very short period of time from scarcity-based capitalism to post-scarcity utopia.

No, that part of Kurzweil's view is 100% fine. In fact, I believe I expect a sharper transition than Kurzweil expects. My objection to Kurzweil's thinking isn't 'realistic mature futurists are supposed to be pessimistic across the board', it's specific unsupported flaws in his arguments:

Rejection of Eliezer's five theses [LW · GW] (which were written in response to Kurzweil): intelligence explosion, orthogonality, convergent instrumental goals, complexity of value, fragility of value.
Mystical, quasi-Hegelian thinking about surface trends like 'economic growth'. See the 'Actual Ray Kurzweil' quote in https://www.lesswrong.com/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works. [LW · GW]
Otherwise weird and un-Bayesian-sounding attitudes toward forecasting. Seems to think he has a crystal ball that lets him exactly time tech developments, even where he has no model of a causal path by which he could be entangled with evidence about that future development...?

comment by Rob Bensinger (RobbBB) · 2020-07-10T04:35:29.805Z · LW(p) · GW(p)

From Facebook:

Mark Norris Lance: [...] There is a long history of differential evaluation of actions taken by grassroots groups and similar actions taken by elites or those in power. This is evident when we discuss violence. If a low-power group places someone under their control it is kidnapping. If they assess their crimes or punish them for it, it is mob justice or vigilanteism. [...]

John Maxwell: Does the low power group in question have a democratic process for appointing judges who then issue arrest warrants?

That's a key issue for me... "Mob rule" is bad because the process mobs use to make their judgements are bad. Doubly so if the mob attacks anyone who points that out.

A common crime that modern mobs accuse people of is defending bad people. But if people can be convicted of defending bad people, that corrupts the entire justice process, because the only way we can figure out if someone really is bad is by hearing what can be said in their defense.

comment by Rob Bensinger (RobbBB) · 2020-06-30T16:05:24.323Z · LW(p) · GW(p)

From https://twitter.com/JonHaidt/status/1166318786959609856:

Why are online political discussions perceived to contain elevated levels of hostility compared to offline discussions? In this manuscript, we leverage cross-national representative surveys and online behavioral experiments to [test] the mismatch hypothesis regarding this hostility gap. The mismatch hypothesis entails that novel features of online communication technology induce biased behavior and perceptions such that ordinary people are, e.g., less able to regulate negative emotions in online contexts. We test several versions of the mismatch hypothesis and find little to no evidence in all cases. Instead, online political hostility is committed by individuals who are predisposed to be hostile in all contexts. The perception that online discussions are more hostile seemingly emerges because other people are more likely to witness the actions of these individuals in the large, public network structure of online platforms compared to more private offline settings.

comment by Rob Bensinger (RobbBB) · 2021-03-31T17:42:53.592Z · LW(p) · GW(p)

Yeah, I'm an EA: an Estimated-as-Effective-in-Expectation (in Excess of Endeavors with Equivalent Ends I've Evaluated) Agent with an Audaciously Altruistic Agenda.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2021-03-31T17:44:44.607Z · LW(p) · GW(p)

This is being cute, but I do think parsing 'effective altruist' this way makes a bit more sense than tacking on the word 'aspiring' and saying 'aspiring EA'. (Unless you actually are a non-EA who's aspiring to become one.)

I'm not an 'aspiring effective altruist'. It's not that I'm hoping to effectively optimize altruistic goals someday. It's that I'm already trying to do that, but I'm uncertain about whether I'm succeeding. It's an ongoing bet, not an aspiration to do something in the future.

'Aspiring rationalist' is better, but it feels at least a little bit artificial or faux-modest to me -- I'm not aspiring to be a rationalist, I'm aspiring to be rational. I feel like rationalism is weight-training, and rationality is the goal.

If people are unhealthy, we might use 'health-ism' to refer to a community or a practice for improving health.

If everyone is already healthy, it seems fine to say they're healthy but weird to say 'they're healthists'. Why is it an ism? Isn't it just a fact about their physiology?

comment by Pattern · 2019-05-18T17:03:02.089Z · LW(p) · GW(p)

How would you feel about the creation of a Sequence of Shortform Feeds? (Including this one?) (Not a mod.)

Replies from: RobbBB, Raemon

↑ comment by Rob Bensinger (RobbBB) · 2021-02-20T23:29:36.292Z · LW(p) · GW(p)

Sure

↑ comment by Raemon · 2019-05-18T22:22:58.014Z · LW(p) · GW(p)

I can't speak for Rob but I'd be fine with my own shortform feed [LW · GW] being included.

comment by Rob Bensinger (RobbBB) · 2023-12-14T20:58:40.013Z · LW(p) · GW(p)

In the context of a conversation with Balaji Srinivasan about my AI views snapshot [LW · GW], I asked Nate Soares what sorts of alignment results would impress him, and he said:

example thing that would be relatively impressive to me: specific, comprehensive understanding of models (with the caveat that that knowledge may lend itself more (and sooner) to capabilities before alignment). demonstrated e.g. by the ability to precisely predict the capabilities and quirks of the next generation (before running it)
i'd also still be impressed by simple theories of aimable cognition (i mostly don't expect that sort of thing to have time to play out any more, but if someone was able to come up with one after staring at LLMs for a while, i would at least be impressed)
fwiw i don't myself really know how to answer the question "technical research is more useful than policy research"; like that question sounds to me like it's generated from a place of "enough of either of these will save you" whereas my model is more like "you need both"
tho i'm more like "to get the requisite technical research, aim for uploads" at this juncture
if this was gonna be blasted outwards, i'd maybe also caveat that, while a bunch of this is a type of interpretability work, i also expect a bunch of interpretability work to strike me as fake, shallow, or far short of the bar i consider impressive/hopeful
(which is not itself supposed to be any kind of sideswipe; i applaud interpretability efforts even while thinking it's moving too slowly etc.)

↑ comment by Richard_Kennaway · 2021-07-27T15:49:34.235Z · LW(p) · GW(p)

For being too indistinguishable from GPT-3.

Rob B's Shortform Feed

Contents

79 comments