EA Forum AMA - MIRI's Buck Shlegeris 2019-11-15T23:27:07.238Z · score: 31 (12 votes)
A simple sketch of how realism became unpopular 2019-10-11T22:25:36.357Z · score: 61 (24 votes)
Christiano decision theory excerpt 2019-09-29T02:55:35.542Z · score: 57 (16 votes)
Kohli episode discussion in 80K's Christiano interview 2019-09-29T01:40:33.852Z · score: 14 (4 votes)
Rob B's Shortform Feed 2019-05-10T23:10:14.483Z · score: 19 (3 votes)
Helen Toner on China, CSET, and AI 2019-04-21T04:10:21.457Z · score: 71 (25 votes)
New edition of "Rationality: From AI to Zombies" 2018-12-15T21:33:56.713Z · score: 79 (30 votes)
On MIRI's new research directions 2018-11-22T23:42:06.521Z · score: 57 (16 votes)
Comment on decision theory 2018-09-09T20:13:09.543Z · score: 72 (27 votes)
Ben Hoffman's donor recommendations 2018-06-21T16:02:45.679Z · score: 40 (17 votes)
Critch on career advice for junior AI-x-risk-concerned researchers 2018-05-12T02:13:28.743Z · score: 207 (72 votes)
Two clarifications about "Strategic Background" 2018-04-12T02:11:46.034Z · score: 77 (23 votes)
Karnofsky on forecasting and what science does 2018-03-28T01:55:26.495Z · score: 17 (3 votes)
Quick Nate/Eliezer comments on discontinuity 2018-03-01T22:03:27.094Z · score: 71 (23 votes)
Yudkowsky on AGI ethics 2017-10-19T23:13:59.829Z · score: 92 (40 votes)
MIRI: Decisions are for making bad outcomes inconsistent 2017-04-09T03:42:58.133Z · score: 7 (8 votes)
CHCAI/MIRI research internship in AI safety 2017-02-13T18:34:34.520Z · score: 5 (6 votes)
MIRI AMA plus updates 2016-10-11T23:52:44.410Z · score: 15 (13 votes)
A few misconceptions surrounding Roko's basilisk 2015-10-05T21:23:08.994Z · score: 57 (53 votes)
The Library of Scott Alexandria 2015-09-14T01:38:27.167Z · score: 72 (54 votes)
[Link] Nate Soares is answering questions about MIRI at the EA Forum 2015-06-11T00:27:00.253Z · score: 19 (20 votes)
Rationality: From AI to Zombies 2015-03-13T15:11:20.920Z · score: 85 (84 votes)
Ends: An Introduction 2015-03-11T19:00:44.904Z · score: 4 (4 votes)
Minds: An Introduction 2015-03-11T19:00:32.440Z · score: 6 (8 votes)
Biases: An Introduction 2015-03-11T19:00:31.605Z · score: 84 (128 votes)
Rationality: An Introduction 2015-03-11T19:00:31.162Z · score: 15 (16 votes)
Beginnings: An Introduction 2015-03-11T19:00:25.616Z · score: 9 (6 votes)
The World: An Introduction 2015-03-11T19:00:12.370Z · score: 3 (3 votes)
Announcement: The Sequences eBook will be released in mid-March 2015-03-03T01:58:45.893Z · score: 47 (48 votes)
A forum for researchers to publicly discuss safety issues in advanced AI 2014-12-13T00:33:50.516Z · score: 12 (13 votes)
Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda 2014-11-26T11:02:01.038Z · score: 26 (31 votes)
Groundwork for AGI safety engineering 2014-08-06T21:29:38.767Z · score: 13 (14 votes)
Politics is hard mode 2014-07-21T22:14:33.503Z · score: 43 (74 votes)
The Problem with AIXI 2014-03-18T01:55:38.274Z · score: 29 (29 votes)
Solomonoff Cartesianism 2014-03-02T17:56:23.442Z · score: 34 (31 votes)
Bridge Collapse: Reductionism as Engineering Problem 2014-02-18T22:03:08.008Z · score: 54 (49 votes)
Can We Do Without Bridge Hypotheses? 2014-01-25T00:50:24.991Z · score: 11 (12 votes)
Building Phenomenological Bridges 2013-12-23T19:57:22.555Z · score: 67 (60 votes)
The genie knows, but doesn't care 2013-09-06T06:42:38.780Z · score: 57 (63 votes)
The Up-Goer Five Game: Explaining hard ideas with simple words 2013-09-05T05:54:16.443Z · score: 29 (34 votes)
Reality is weirdly normal 2013-08-25T19:29:42.541Z · score: 33 (48 votes)
Engaging First Introductions to AI Risk 2013-08-19T06:26:26.697Z · score: 20 (27 votes)
What do professional philosophers believe, and why? 2013-05-01T14:40:47.028Z · score: 31 (44 votes)


Comment by robbbb on Raemon's Scratchpad · 2019-12-07T02:25:16.405Z · score: 2 (1 votes) · LW · GW

I think I prefer bolding full lines b/c it makes it easier to see who authored what?

Comment by robbbb on Raemon's Scratchpad · 2019-12-07T01:30:07.655Z · score: 7 (3 votes) · LW · GW

I'd be interested in trying it out. At a glance, it feels too much to me like it's trying to get me to read Everything, when I can tell from the titles and snippets that some posts aren't for me. If anything the posts I've already read are often ones I want emphasized more? (Because I'm curious to see if there are new comments on things I've already read, or I may otherwise want to revisit the post to link others to it, or finish reading it, etc.)

The bold font does look aesthetically fine and breaks things up in an interesting way, so I like the idea of maybe using it for more stuff?

Comment by robbbb on The Devil Made Me Write This Post Explaining Why He Probably Didn't Hide Dinosaur Bones · 2019-12-06T15:02:07.609Z · score: 4 (2 votes) · LW · GW
Comment by robbbb on Misconceptions about continuous takeoff · 2019-12-05T21:12:04.809Z · score: 6 (3 votes) · LW · GW

That part of the interview with Paul was super interesting to me, because the following were previously claims I'd heard from Nate and Eliezer in their explanations of how they think about fast takeoff:

[E]volution [hasn't] been putting a decent amount of effort into optimizing for general intelligence. [...]

'I think if you optimize AI systems for reasoning, it appears much, much earlier.'

Ditto things along the lines of this Paul quote from the same 80K interview:

It’s totally conceivable from our current perspective, I think, that an intelligence that was as smart as a crow, but was actually designed for doing science, actually designed for doing engineering, for advancing technologies rapidly as possible -- it is quite conceivable that such a brain would actually outcompete humans pretty badly at those tasks.

I think that’s another important thing to have in mind, and then when we talk about when stuff goes crazy, I would guess humans are an upper bound for when stuff goes crazy. That is we know that if we had cheap simulated humans, that technological progress would be much, much faster than it is today. But probably stuff goes crazy somewhat before you actually get to humans.

This is part of why I don't talk about "human-level" AI when I write things for MIRI.

If you think humans, corvids, etc. aren't well-optimized for economically/pragmatically interesting feats, this predicts that timelines may be shorter and that "human-level" may be an especially bad way of thinking about the relevant threshold(s).

There still remains the question of whether the technological path to "optimizing messy physical environments" (or "science AI", or whatever we want to call it) looks like a small number of "we didn't know how to do this at all, and now we do know how to do this and can suddenly take much better advantage of available compute" events, vs. looking like a large number of individually low-impact events spread out over time.

If no one event is impactful enough, then a series of numerous S-curves ends up looking like a smooth slope when you zoom out; and large historical changes are usually made of many small changes that add up to one big effect. We don't invent nuclear weapons, get hit by a super-asteroid, etc. every other day.

Comment by robbbb on A list of good heuristics that the case for AI x-risk fails · 2019-12-04T21:54:38.980Z · score: 8 (3 votes) · LW · GW

This doesn't seem like it belongs on a "list of good heuristics", though!

Comment by robbbb on A list of good heuristics that the case for AI x-risk fails · 2019-12-03T18:30:21.502Z · score: 5 (3 votes) · LW · GW

I helped make this list in 2016 for a post by Nate, partly because I was dissatisfied with Scott's list (which includes people like Richard Sutton, who thinks worrying about AI risk is carbon chauvinism):

Stuart Russell’s Cambridge talk is an excellent introduction to long-term AI risk. Other leading AI researchers who have expressed these kinds of concerns about general AI include Francesca Rossi (IBM), Shane Legg (Google DeepMind), Eric Horvitz (Microsoft), Bart Selman (Cornell), Ilya Sutskever (OpenAI), Andrew Davison (Imperial College London), David McAllester (TTIC), and Jürgen Schmidhuber (IDSIA).

These days I'd probably make a different list, including people like Yoshua Bengio. AI risk stuff is also sufficiently in the Overton window that I care more about researchers' specific views than about "does the alignment problem seem nontrivial to you?". Even if we're just asking the latter question, I think it's more useful to list the specific views and arguments of individuals (e.g., note that Rossi is more optimistic about the alignment problem than Russell), list the views and arguments of the similarly prominent CS people who think worrying about AGI is silly, and let people eyeball which people they think tend to produce better reasons.

Comment by robbbb on Optimization Amplifies · 2019-12-02T06:10:05.382Z · score: 4 (2 votes) · LW · GW

One of the main explanations of the AI alignment problem I link people to.

Comment by robbbb on Useful Does Not Mean Secure · 2019-12-02T03:33:24.439Z · score: 13 (3 votes) · LW · GW

Eliezer also strongly believes that discrete jumps will happen. But the crux for him AFAIK is absolute capability and absolute speed of capability gain in AGI systems, not discontinuity per se (and not particular methods for improving capability, like recursive self-improvement). Hence in So Far: Unfriendly AI Edition Eliezer lists his key claims as:

  • (1) "Orthogonality thesis",
  • (2) "Instrumental convergence",
  • (3) "Rapid capability gain and large capability differences",
  • (A) superhuman intelligence makes things break that don't break at infrahuman levels,
  • (B) "you have to get [important parts of] the design right the first time",
  • (C) "if something goes wrong at any level of abstraction, there may be powerful cognitive processes seeking out flaws and loopholes in your safety measures", and the meta-level
  • (D) "these problems don't show up in qualitatively the same way when people are pursuing their immediate incentives to get today's machine learning systems working today".

From Sam Harris' interview of Eliezer (emphasis added):

Eliezer: [...] I think that artificial general intelligence capabilities, once they exist, are going to scale too fast for that to be a useful way to look at the problem. AlphaZero going from 0 to 120 mph in four hours or a day—that is not out of the question here. And even if it’s a year, a year is still a very short amount of time for things to scale up.

[...] I’d say this is a thesis of capability gain. This is a thesis of how fast artificial general intelligence gains in power once it starts to be around, whether we’re looking at 20 years (in which case this scenario does not happen) or whether we’re looking at something closer to the speed at which Go was developed (in which case it does happen) or the speed at which AlphaZero went from 0 to 120 and better-than-human (in which case there’s a bit of an issue that you better prepare for in advance, because you’re not going to have very long to prepare for it once it starts to happen).

[...] Why do I think that? It’s not that simple. I mean, I think a lot of people who see the power of intelligence will already find that pretty intuitive, but if you don’t, then you should read my paper Intelligence Explosion Microeconomics about returns on cognitive reinvestment. It goes through things like the evolution of human intelligence and how the logic of evolutionary biology tells us that when human brains were increasing in size, there were increasing marginal returns to fitness relative to the previous generations for increasing brain size. Which means that it’s not the case that as you scale intelligence, it gets harder and harder to buy. It’s not the case that as you scale intelligence, you need exponentially larger brains to get linear improvements.

At least something slightly like the opposite of this is true; and we can tell this by looking at the fossil record and using some logic, but that’s not simple.

Sam: Comparing ourselves to chimpanzees works. We don’t have brains that are 40 times the size or 400 times the size of chimpanzees, and yet what we’re doing—I don’t know what measure you would use, but it exceeds what they’re doing by some ridiculous factor.

Eliezer: And I find that convincing, but other people may want additional details.

[...] AlphaZero seems to me like a genuine case in point. That is showing us that capabilities that in humans require a lot of tweaking and that human civilization built up over centuries of masters teaching students how to play Go, and that no individual human could invent in isolation… [...] AlphaZero blew past all of that in less than a day, starting from scratch, without looking at any of the games that humans played, without looking at any of the theories that humans had about Go, without looking at any of the accumulated knowledge that we had, and without very much in the way of special-case code for Go rather than chess—in fact, zero special-case code for Go rather than chess. And that in turn is an example that refutes another thesis about how artificial general intelligence develops slowly and gradually, which is: “Well, it’s just one mind; it can’t beat our whole civilization.”

I would say that there’s a bunch of technical arguments which you walk through, and then after walking through these arguments you assign a bunch of probability, maybe not certainty, to artificial general intelligence that scales in power very fast—a year or less. And in this situation, if alignment is technically difficult, if it is easy to screw up, if it requires a bunch of additional effort—in this scenario, if we have an arms race between people who are trying to get their AGI first by doing a little bit less safety because from their perspective that only drops the probability a little; and then someone else is like, “Oh no, we have to keep up. We need to strip off the safety work too. Let’s strip off a bit more so we can get in the front.”—if you have this scenario, and by a miracle the first people to cross the finish line have actually not screwed up and they actually have a functioning powerful artificial general intelligence that is able to prevent the world from ending, you have to prevent the world from ending. You are in a terrible, terrible situation. You’ve got your one miracle. And this follows from the rapid capability gain thesis and at least the current landscape for how these things are developing.

See also:

The question is simply "Can we do cognition of this quality at all?"[...] The speed and quantity of cognition isn't the big issue, getting to that quality at all is the question. Once you're there, you can solve any problem which can realistically be done with non-exponentially-vast amounts of that exact kind of cognition.

Comment by robbbb on What's been written about the nature of "son-of-CDT"? · 2019-12-01T20:19:37.561Z · score: 4 (3 votes) · LW · GW

The Retro Blackmail Problem in "Toward Idealized Decision Theory" shows that if CDT can self-modify (i.e., build an agent that follows an arbitrary decision rule), it self-modifies to something that still gives in to some forms of blackmail. This is Son-of-CDT, though they don't use the name.

Comment by robbbb on What's been written about the nature of "son-of-CDT"? · 2019-12-01T20:14:37.537Z · score: 5 (2 votes) · LW · GW

My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in expectation).

This is true if we mean something very specific by "causes". CDT picks the action that would cause the highest number of paperclips to be created, if past predictions were uncorrelated with future events.

I agree that a CDT agent will never agree to precommit to acting like a LDT agent for correlations that have already been created, but I don't think that determines what kind of successor agent they would choose to create.

If an agent can arbitrarily modify its own source code ("precommit" in full generality), then we can model "the agent making choices over time" as "a series of agents that are constantly choosing which successor-agent follows them at the next time-step". If Son-of-CDT were the same as LDT, this would be the same as saying that a self-modifying CDT agent will rewrite itself into an LDT agent, since nothing about CDT or LDT assigns special weight to actions that happen inside the agent's brain vs. outside the agent's brain.

Comment by robbbb on Toward a New Technical Explanation of Technical Explanation · 2019-12-01T20:02:14.542Z · score: 8 (4 votes) · LW · GW

When I read this post, it struck me as a remarkably good introduction to logical induction, and the whole discussion seemed very core to the formal-epistemology projects on LW and AIAF.

Comment by robbbb on Useful Does Not Mean Secure · 2019-12-01T07:47:53.962Z · score: 11 (3 votes) · LW · GW

Note that on my model, the kind of paranoia Eliezer is pointing to with "AI safety mindset" or security mindset is something he believes you need in order to prevent adversarialness and the other bad byproducts of "your system devotes large amounts of thought to things and thinks in really weird ways". It's not just (or even primarily) a fallback measure to keep you safe on the off chance your system does generate a powerful adversary. Quoting Nate:

Lastly, alignment looks difficult for the same reason computer security is difficult: systems need to be robust to intelligent searches for loopholes.

Suppose you have a dozen different vulnerabilities in your code, none of which is itself fatal or even really problematic in ordinary settings. Security is difficult because you need to account for intelligent attackers who might find all twelve vulnerabilities and chain them together in a novel way to break into (or just break) your system. Failure modes that would never arise by accident can be sought out and exploited; weird and extreme contexts can be instantiated by an attacker to cause your code to follow some crazy code path that you never considered.

A similar sort of problem arises with AI. The problem I’m highlighting here is not that AI systems might act adversarially: AI alignment as a research program is all about finding ways to prevent adversarial behavior before it can crop up. We don’t want to be in the business of trying to outsmart arbitrarily intelligent adversaries. That’s a losing game.

The parallel to cryptography is that in AI alignment we deal with systems that perform intelligent searches through a very large search space, and which can produce weird contexts that force the code down unexpected paths. This is because the weird edge cases are places of extremes, and places of extremes are often the place where a given objective function is optimized. Like computer security professionals, AI alignment researchers need to be very good at thinking about edge cases.

It’s much easier to make code that works well on the path that you were visualizing than to make code that works on all the paths that you weren’t visualizing. AI alignment needs to work on all the paths you weren’t visualizing.

Scott Garrabrant mentioned to me at one point that he thought Optimization Amplifies distills a (maybe the?) core idea in Security Mindset and Ordinary Paranoia. The problem comes from "lots of weird, extreme-state-instantiating, loophole-finding optimization", not from "lots of adversarial optimization" (even though the latter is a likely consequence of getting things badly wrong with the former).

Eliezer models most of the difficulty (and most of the security-relatedness) of the alignment problem as lying in 'get ourselves to a place where in fact our systems don't end up as powerful adversarial optimizers', rather than (a) treating this as a gimme and focusing on what we should do absent such optimizers, or (b) treating the presence of adversarial optimization as inevitable and asking how to manage it.

I think this idea ("avoiding generating powerful adversarial optimizers is an enormous constraint and requires balancing on a knife's edge between disaster and irrelevance") is also behind the view that system safety largely comes from things like "the system can't think about any topics, or try to solve any cognitive problems, other than the ones we specifically want it to", vs. Rohin's "the system is trying to do what we want".

Comment by robbbb on Useful Does Not Mean Secure · 2019-12-01T07:41:52.224Z · score: 15 (4 votes) · LW · GW


"As you take the system and make it vastly superintelligent, your primary focus needs to be on security from adversarial forces, rather than primarily on making something that's useful."

I agree if you assume a discrete action that simply causes the system to become vastly superintelligent. But we can try not to get to powerful adversarial optimization in the first place; if that never happens then you never need the security.


I certainly agree that in the presence of powerful adversarial optimizers, you need security to get your system to do what you want. However, we can just not build powerful adversarial optimizers. My preferred solution is to make sure our AI systems are trying to do what we want , so that they never become adversarial in the first place. But if for some reason we can't do that, then we could make sure AI systems don't become too powerful, or not build them at all. It seems very weird to instead say "well, the AI system is going to be adversarial and way more powerful, let's figure out how to make it secure" -- that should be the last approach, if none of the other approaches work out.

The latter summary in particular sounds superficially like Eliezer's proposed approach, except that he doesn't think it's easy in the AGI regime to "just not build powerful adversarial optimizers" (and if he suspected this was easy, he wouldn't want to build in the assumption that it's easy as a prerequisite for a safety approach working; he would want a safety approach that's robust to the scenario where it's easy to accidentally end up with vastly more quality-adjusted optimization than intended).

The "do alignment in a way that doesn't break if capability gain suddenly speeds up" approach, or at least Eliezer's version of that approach, similarly emphasizes "you're screwed (in the AGI regime) if you build powerful adversarial optimizers, and it's a silly idea to do that in the first place, so just don't do it, ever, in any context". From AI Safety Mindset:

Niceness as the first line of defense / not relying on defeating a superintelligent adversary

[...] Paraphrasing Schneier, we might say that there's three kinds of security in the world: Security that prevents your little brother from reading your files, security that prevents major governments from reading your files, and security that prevents superintelligences from getting what they want. We can then go on to remark that the third kind of security is unobtainable, and even if we had it, it would be very hard for us to know we had it. Maybe superintelligences can make themselves knowably secure against other superintelligences, but we can't do that and know that we've done it.

[...] The final component of an AI safety mindset is one that doesn't have a strong analogue in traditional computer security, and it is the rule of not ending up facing a transhuman adversary in the first place. The winning move is not to play. Much of the field of value alignment theory is about going to any length necessary to avoid needing to outwit the AI.

In AI safety, the first line of defense is an AI that does not want to hurt you. If you try to put the AI in an explosive-laced concrete bunker, that may or may not be a sensible and cost-effective precaution in case the first line of defense turns out to be flawed. But the first line of defense should always be an AI that doesn't want to hurt you or avert your other safety measures, rather than the first line of defense being a clever plan to prevent a superintelligence from getting what it wants.

A special case of this mindset applied to AI safety is the Omni Test - would this AI hurt us (or want to defeat other safety measures) if it were omniscient and omnipotent? If it would, then we've clearly built the wrong AI, because we are the ones laying down the algorithm and there's no reason to build an algorithm that hurts us period. If an agent design fails the Omni Test desideratum, this means there are scenarios that it prefers over the set of all scenarios we find acceptable, and the agent may go searching for ways to bring about those scenarios.

If the agent is searching for possible ways to bring about undesirable ends, then we, the AI programmers, are already spending computing power in an undesirable way. We shouldn't have the AI running a search that will hurt us if it comes up positive, even if we expect the search to come up empty. We just shouldn't program a computer that way; it's a foolish and self-destructive thing to do with computing power. Building an AI that would hurt us if omnipotent is a bug for the same reason that a NASA probe crashing if all seven other planets line up would be a bug - the system just isn't supposed to behave that way period; we should not rely on our own cleverness to reason about whether it's likely to happen.

Omnipotence Test for AI Safety:

Suppose your AI suddenly became omniscient and omnipotent - suddenly knew all facts and could directly ordain any outcome as a policy option. Would the executing AI code lead to bad outcomes in that case? If so, why did you write a program that in some sense 'wanted' to hurt you and was only held in check by lack of knowledge and capability? Isn't that a bad way for you to configure computing power? Why not write different code instead?

The Omni Test is that an advanced AI should be expected to remain aligned, or not lead to catastrophic outcomes, or fail safely, even if it suddenly knows all facts and can directly ordain any possible outcome as an immediate choice. The policy proposal is that, among agents meant to act in the rich real world, any predicted behavior where the agent might act destructively if given unlimited power (rather than e.g. pausing for a safe user query) should be treated as a bug.

Non-Adversarial Principle:

The 'Non-Adversarial Principle' is a proposed design rule for sufficiently advanced Artificial Intelligence stating that:

By design, the human operators and the AGI should never come into conflict.

Special cases of this principle include Niceness is the first line of defense and The AI wants your safety measures.

[...] No aspect of the AI's design should ever put us in an adversarial position vis-a-vis the AI, or pit the AI's wits against our wits. If a computation starts looking for a way to outwit us, then the design and methodology has already failed. We just shouldn't be putting an AI in a box and then having the AI search for ways to get out of the box. If you're building a toaster, you don't build one element that heats the toast and then add a tiny refrigerator that cools down the toast.

Cf. the "X-and-only-X" problem.

Comment by robbbb on The Correct Contrarian Cluster · 2019-11-30T15:44:56.841Z · score: 8 (4 votes) · LW · GW

Huh? Strong evidence for that would be us all being dead.

I want to insist that "it's unreasonable to strongly update about technological risks until we're all dead" is not a great heuristic for evaluating GCRs.

Comment by robbbb on The Correct Contrarian Cluster · 2019-11-29T21:10:03.982Z · score: 13 (3 votes) · LW · GW

bfinn was discounting Eliezer for being a non-economist, rather than discounting Sumner for being insufficiently mainstream; and bfinn was skeptical in particular that Eliezer understood NGDP targeting well enough to criticize the Bank of Japan. So Sumner seems unusually relevant here, and I'd expect him to pick up on more errors from someone talking at length about his area of specialization.

Comment by robbbb on Getting Ready for the FB Donation Match · 2019-11-28T22:55:20.253Z · score: 3 (3 votes) · LW · GW

Colm put together specific recommendations for people who want to help MIRI get matched on Giving Tuesday:

Other EA orgs that want to get matched might benefit from something similar; I haven't looked at the specific suggestions other orgs are making.

Comment by robbbb on The Correct Contrarian Cluster · 2019-11-28T22:10:36.780Z · score: 21 (5 votes) · LW · GW

You're leaning heavily on the concept "amateur", which (a) doesn't distinguish "What's your level of knowledge and experience with X?" and "Is X your day job?", and (b) treats people as being generically "good" or "bad" at extremely broad and vague categories of proposition like "propositions about quantum physics" or "propositions about macroeconomics".

I think (b) is the main mistake you're making in the quantum physics case. Eliezer isn't claiming "I'm better at quantum physics than professionals". He's claiming that the specific assertion "reifying quantum amplitudes (in the absence of evidence against collapse/agnosticism/nonrealism) violates Ockham's Razor because it adds 'stuff' to the universe" is false, and that a lot of quantum physicists have misunderstood this because their training is in quantum physics, not in algorithmic information theory or formal epistemology.

I think (a) is the main mistake you're making in the economics case. Eliezer is basically claiming to understand macroeconomics better than key decisionmakers at the Bank of Japan, but based on the results, I think he was just correct about that. As far as I can tell, Eliezer is just really good at economic reasoning, even though it's not his day job. Cf. Central banks should have listened to Eliezer Yudkowsky (or 1, 2, 3).

Comment by robbbb on Two clarifications about "Strategic Background" · 2019-11-19T17:52:08.333Z · score: 4 (2 votes) · LW · GW

Oops, I saw your question when you first posted it but forgot to get back to you, Issa. (Issa re-asked here.) My apologies.

I think there are two main kinds of strategic thought we had in mind when we said "details forthcoming":

  • 1. Thoughts on MIRI's organizational plans, deconfusion research, and how we think MIRI can help play a role in improving the future — this is covered by our November 2018 update post,
  • 2. High-level thoughts on things like "what we think AGI developers probably need to do" and "what we think the world probably needs to do" to successfully navigate the acute risk period.

Most of the stuff discussed in "strategic background" is about 2: not MIRI's organizational plan, but our model of some of the things humanity likely needs to do in order for the long-run future to go well. Some of these topics are reasonably sensitive, and we've gone back and forth about how best to talk about them.

Within the macrostrategy / "high-level thoughts" part of the post, the densest part was maybe 7a. The criteria we listed for a strategically adequate AGI project were "strong opsec, research closure, trustworthy command, a commitment to the common good, security mindset, requisite resource levels, and heavy prioritization of alignment work".

With most of these it's reasonably clear what's meant in broad strokes, though there's a lot more I'd like to say about the specifics. "Trustworthy command" and "a commitment to the common good" are maybe the most opaque. By "trustworthy command" we meant things like:

  • The organization's entire command structure is fully aware of the difficulty and danger of alignment.
  • Non-technical leadership can't interfere and won't object if technical leadership needs to delete a code base or abort the project.

By "a commitment to the common good" we meant a commitment to both short-term goodness (the immediate welfare of present-day Earth) and long-term goodness (the achievement of transhumanist astronomical goods), paired with a real commitment to moral humility: not rushing ahead to implement every idea that sounds good to them.

We still plan to produce more long-form macrostrategy exposition, but given how many times we've failed to word our thoughts in a way we felt comfortable publishing, and given how much other stuff we're also juggling, I don't currently expect us to have any big macrostrategy posts in the next 6 months. (Note that I don't plan to give up on trying to get more of our thoughts out sooner than that, if possible. We'll see.)

Comment by robbbb on Raemon's Scratchpad · 2019-11-12T04:32:17.161Z · score: 2 (1 votes) · LW · GW

I haven't noticed a problem with this in my case. Might just not have noticed having this issue.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-28T01:05:57.674Z · score: 2 (1 votes) · LW · GW

I mean "But we should consider that bodies are [...] a mere appearance of who knows what unknown object; that motion is not the effect of this unknown cause, but merely the appearance of its influence on our senses; that consequently neither of these is something outside us, but both are merely representations in us" seems pretty unambiguous to me. Kant isn't saying here that 'we can only know stuff about mind-independent objects by using language and concepts and frameworks' in this passage; he's saying 'we can only know stuff about mere representations inside of us'.

Kant passages oscillate between making sense under one of these interpretations or the other (or neither):

  • the "causality interpretation", which says that things-in-themselves are objects that cause appearances, like a mind-independent object causes an experience in someone's head. If noumena are the "true correlates" of phenomena, while phenomena are nothing but subjective experiences, then this implies that we really don't know anything about the world outside our heads. You can try to squirm out of this interpretation by asserting that words like "empirical" and "world" should be redefined to refer to subjective experiences in our heads, but this is just playing with definitions.
  • the "identity interpretation", which says that things-in-themselves are the same objects as phenomena, just construed differently.

Quoting Wood (66-67, 69-70):

Yet the two interpretations appear to yield very different (incompatible) answers to the following three questions:
1. Is an appearance the very same entity as a thing in itself? The causality interpretation says no, the identity interpretation says yes.
2. Are appearances caused by things in themselves? The causality interpretation says yes, the identity interpretation says no.
3. Do the bodies we cognize have an existence in themselves? The causality interpretation says no, the identity interpretation says yes.
[... N]o entity stands to itself in the relation of cause to effect. Transcendental idealism is no intelligible doctrine at all if it cannot give self-consistent answers to the above three questions. [...]
Kant occasionally tries to combine "causality interpretation" talk with "identity interpretation" talk. When he does, the result is simply nonsense and self-contradiction:
"I say that things as objects of our senses existing outside us are given, but we know nothing of what they may be in themselves, cognizing only their appearances, that is, the representations which they cause in us by affecting our senses. Consequently, I grant by all means that there are bodies outside us, that is, things which, though quite unknown to us as to what they are in themselves, we still cognize by the representations which their influence on our sensibility procures us, and which we call bodies, a term signifying merely the appearance of the thing which is unknown to us but not the less actual. (P 4:289)
The first sentence here says that objects of the senses are given to our cognition, but then denies that we cognize these objects, saying instead that we cognize an entirely different set of objects (different from the ones he has just said are given). The second sentence infers from this that there are bodies outside us, but proceeds to say that it is not these bodies (that is, the entities Kant has just introduced to us as 'bodies') that we call 'bodies', but rather bodies are a wholly different set of entities. Such Orwellian doubletalk seems to be the inevitable result of trying to combine the causality interpretation with the identity interpretation while supposing that they are just two ways of saying the same thing. [...]
Kant of course denies that we can ever have cognition of an object as it is in itself, because we can have no sensible intuition of it -- as it is in itself. But he seems to regard it as entirely permissible and even inevitable that we should be able to think the phenomenal objects around us solely through pure concepts of the understanding, hence as they are in themselves. If I arrive at the concept of a chair in the corner first by cognizing it empirically and then by abstracting from those conditions of cognition, so that I think of it existing in itself outside those conditions, then it is obvious that I am thinking of the same object, not of two different objects. It is also clear that when I think of it the second way, I am thinking of it, and not of its cause (if it has one). From this point of view, the causality interpretation seems utterly unmotivated and even nonsensical.
The problem arises, however, because Kant also wants to arrive at the concept of a thing existing in itself in another way. He starts from the fact that our empirical cognition results from the affection of our sensibility by something outside us. This leads him to think that there must be a cause acting on our sensibility from outside, making it possible for us to intuit appearances, which are then conceived as the effects of this cause.
Of course it would be open to him to think of this for each case of sensible intuition as the appearance acting on our sensibility those a wholly empirical causality. But Kant apparently arrived at transcendental idealism in part by thinking of it as a revised version of the metaphysics of physical influence between substances that he derived from Crusius. Thus sensible intuition is sometimes thought of as the affection of our senses by an object not as an appearance but as a thing in itself, and transcendental idealism is thought of as having to claim (inconsistently) that we are to regard ourselves (as things in themselves) as being metaphysically influenced by things in themselves.
Such a metaphysics would of course be illegitimately transcendent by the standards of the Critique, but Kant unfortunately appears sometimes to think that transcendental idealism is committed to it, and many of his followers down to the present day seem addicted to the doctrine that appears to be stated in the letter of those texts that express that thought, despite the patent nonsense they involve from the critical point of view. The thing in itself is then taken to be this transcendent cause affecting our sensibility as a whole, and the appearance is seen as the ensemble of representations resulting from its activity on us.
Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-27T23:11:14.391Z · score: 2 (1 votes) · LW · GW

The reason I'm focusing on this is that I think some of the phrasings you chose in trying to summarize Kant (and translate or steelman his views) are sliding between the three different claims I described above:

[1] "We can't know things about ultimate reality without relying on initially unjustified knowledge/priors/cognitive machinery."
[2] "We can't know things about ultimate reality."
[3] "(We can know that) ultimate reality is wildly different from reality-as-we-conceive-of-it."

E.g., you say

The kind of knowledge he says you can't have is knowledge of the thing in itself, which in modern terms would mean something like knowledge that is not relative to some conceptual framework or way of perceiving

In treating all these claims as equivalent, you're taking a claim that sounds at first glance like 2 ("you can't have knowledge of the thing in itself"), and identifying it with claims that sound like either 1 or 3 ("you can't have knowledge that is not relative to some conceptual framework or way of perceiving," "you can't have knowledge of the real world that exists outside our concepts", "space/time/etc. are things our brains make up, not ultimately real things").

I think dissecting these examples helps make it easier to see how a whole continent could get confused about Berkeleian master-argument-syle reasoning for 100-200 years, and get confused about distinctions like 'a thought you aren't thinking' vs. 'an object-of-thought you aren't thinking about'.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-27T23:08:08.635Z · score: 8 (2 votes) · LW · GW

I claim that the most natural interpretation of "[Transcendental] idealism means all specific human perceptions are moulded by the general form of human perception and there is no way to backtrack to a raw form." is that there's no way to backtrack from our beliefs, impressions, and perceptions to ultimate reality. That is, I'm interpreting "backtrack" causally: the world causes our perceptions, and backtracking would mean reconstructing what the ultimate, outside-our-heads, existed-before-humanity reality is like before we perceive or categorize it. (Or perhaps backtracking causally to the initial, relatively unprocessed sense-data our brains receive.)

In those terms, we know a ton about ultimate, outside-our-heads reality (and a decent amount about how the brain processes new sensory inputs), and there's no special obstacle to backtracking from our processed sense data to the raw, unprocessed real world. (Our reasoning faculties do need to be working OK, but that's true for our ability to learn truths about math, about our own experiences, etc. as well. Good conclusions require a good concluder.)

If instead the intended interpretation of "backtrack to a raw form" is "describe something without describing it", "think about something without thinking about it", or "reason about something without reasoning about it", then your original phrasing stops making sense to me.

Take the example of someone standing by a barn. They can see the front side of the barn, but they've never observed the back side. At noon, you ask them to describe their subjective experience of the barn, and they do so. Then you ask them to "backtrack to the raw form" beyond their experience. They proceed to start describing the full quantum state of the front of the barn as it was at noon (taking into account many-worlds: the currently-speaking observer has branched off from the original observer).

Then you go, "No, no, I meant describe something about the barn as it exists outside of your conceptual schemes." And the person repeats their quantum description, which is a true description regardless of the conceptual scheme used; the quantum state is in the world, not in my brain or in my concepts.

Then you go, "No, I meant describe an aspect of the barn that transcends your experiences entirely; not a property of the barn that caused your experience, but a property unconnected to your experience." And the person proceeds to conjecture that the barn has a back side, even though they haven't seen it; and they start speculating about likely properties the back side may have.

Then you go, "No! I meant describe something about the barn without using your concepts in the description." Or: "Describe something that bears no causal relation to your cognition whatsoever, like a causally inert quiddity that in no way interacts with any of the kinds of things you've ever experienced or computed."

And the person might reply: Well, I can say that such a thing would be a causally inert quiddity, as you say; and then perhaps I can't say much more than that, other than to drill down on what the relevant terms mean. Or, if the requirement is to describe a thing without describing it, then obviously I can't do that; but that seems like an even more trivial observation.

Why would the request to "describe something without describing it" ever be phrased as "backtracking to a raw form"? There's no "backtracking" involved, and we aren't returning to an earlier "raw" or unprocessed thing, since we're evidently not talking about an earlier (preconceptual) cognition that was subsequently processed into a proper experience; and since we're evidently not talking about the physical objects outside our heads that are the cause and referent for our thoughts about them.

I claim that there's an important equivocation at work in the idealist tradition between "backtracking" or finding a more "raw" or ultimate version of a thing, and "describe a thing without describing it". I claim that these only sound similar because of the mistake in Berkeley's master argument: confusing the ideas "an electron (i.e., an object) that exists outside of any conceptual framework" and "an 'electron' (i.e., a term or concept) that exists outside of any conceptual framework". I claim that the very temptation to use 'Ineffable-Thingie'-reifying phrasings like "there is no way to backtrack to a raw form" and "what an electron is outside of any conceptual framework", is related to this mistake.

Phrasing it as "We can't conceive of an electron without conceiving of it" makes it sound trivial, whereas the way of speaking that phrases things almost as though there were some object in the world (Kant's 'noumena') that transcends our conceptual frameworks and outstrips our every attempt to describe it, makes it sound novel and important and substantive. (And makes it an appealing Inherently Mysterious Thing to worship.)

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-26T16:51:08.452Z · score: 2 (1 votes) · LW · GW

I agree that Kant thought of himself as trying to save science from skepticism (e.g., Hume) and weird metaphysics (e.g., Berkeley), and I'm happy you're trying to make it easier to pass Kant's Ideological Turing Test.

Transendental idealism means all specific human perceptions are moulded by the general form of human perception and there is no way to backtrack to a raw form. [...]
The kind of knowledge he says you can't have is knowledge of the thing in itself, which in modern terms would mean something like knowledge that is not relative to some conceptual framework or way of perceiving. Physicalism doesn't refute that in the least, because it is explicitly based on using physical science as its framework.

I have two objections:

(1) Physicalism does contradict the claim "there is no way to backtrack to a raw form", if this is taken to mean we should be agnostic about whether things are (really, truly, mind-independently) physical.

I assert that the "raw form" of an electron, insofar as physics is accurate, is just straightforwardly and correctly described by physics; and unless there's a more fundamental physical account of electrons we have yet to discover, physics is plausibly (though I doubt we can ever prove this) a complete description of electrons. There may not be extra features that we're missing.

(2) Modern anti-realist strains, similar to 19th-century idealism, tend to slide between these three claims:

  • "We can't know things about ultimate reality without relying on initially unjustified knowledge/priors/cognitive machinery."
  • "We can't know things about ultimate reality."
  • "(We can know that) ultimate reality is wildly different from reality-as-we-conceive-of-it."

The first claim is true, but the second and third claims are false.

This sliding is probably the real thing we have Kant to thank for, and the thing that's made anti-realist strains so slippery and hard to root out; Berkeley was lucid enough to unequivocally avoid the above leaps.

Quoting Allen Wood (pp. 63-64, 66-67):

The doctrine can even be stated with apparent simplicity: We can have cognition of appearances but not of things in themselves. But it is far from clear what this doctrine means, and especially unclear what sort of restriction it is supposed to place on our knowledge.
Some readers of Kant have seen the restriction as trivial, so trivial as to be utterly meaningless, even bordering on incoherence. They have criticized Kant not for denying that we can know 'things in themselves' but rather for thinking that the notion of a 'thing in itself' even makes sense. If by a 'thing in itself' we mean a thing standing outside any relation to our cognitive powers, then of course it seems impossible for us to know such things; perhaps it is even self-contradictory to suppose that we could so much as think of them.
Other readers have seen transcendental idealism as a radical departure from common sense, a form of skepticism at least as extreme as any Kant might have been trying to combat. To them it seems that Kant is trying (like Berkeley) to reduce all objects of our knowledge to mere ghostly representations in our minds. He is denying us the capacity to know anything whatever about any genuine (that is, any extra-mental) reality. [...]
I think much of the puzzlement about transcendental idealism arises from the fact that Kant himself formulates transcendental idealism in a variety of ways, and it is not at all clear how, or whether, his statements of it can all be reconciled, or taken as statements of a single, self-consistent doctrine. I think Kant's central formulations suggest two quite distinct and mutually incompatible doctrines. [...]
Some interpreters of Kant, when they become aware of these divergences, respond by saying that there is no significant difference between the two interpretations, that they are only 'two ways of saying the same thing.' These interpreters are probably faithful to Kant's intentions, since it looks as if he thought the two ways of talking about appearances and things in themselves are interchangeable and involve no difference in doctrine. But someone can intend to speak self-consistently and yet fail to do so; and it looks like this is what has happened to Kant in this case.

In particular, here's Wood on why Kant is sometimes saying 'we can't know about the world outside our heads', not just 'we can't have knowledge without relying on some conceptual framework or way of perceiving' (p. 64):

Kant often distinguishes appearances from things in themselves through locutions like the following: "What the objects may be in themselves would still never be known through the most enlightened cognition of their appearance, which alone is given to us" (KrV A43/B60). "Objects in themselves are not known to us at all, and what we call external objects are nothing other than mere representations of our sensibility, whose form is space, but whose true correlate, i.e. the thing in itself, is not and cannot be cognized through them" (KrV A30/B45).
Passages like these suggest that things existing in themselves are entities distinct from 'their appearances' -- which are subjective states caused in us by these things. Real things (things in themselves) cause appearances. Appearances have no existence in themselves, being only representations in us. "Appearances do not exist in themselves, but only relative to the [subject] insofar as it has senses" (KrV B164). "But we should consider that bodies are not objects in themselves that are present to us, but rather a mere appearance of who knows what unknown object; that motion is not the effect of this unknown cause, but merely the appearance of its influence on our senses; that consequently neither of these is something outside us, but both are merely representations in us" (KrV A387).

Whereas (p. 65):

In other passages, transcendental idealism is formulated so as to present us with a very different picture. [...] Here Kant does not distinguish between two separate entities, but rather between the same entity as it appears (considered in relation to our cognitive faculties) and as it exists in itself (considered apart from that relation). [...]
On the identity interpretation, appearances are not merely subjective entities or states in our minds; they do have an existence in themselves. The force of transcendental idealism is not to demote them, so to speak, from reality to ideality, but rather to limit our cognition of real entities to those features of them that stand in determinate relations to our cognitive faculties.
Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-26T15:52:17.254Z · score: 2 (1 votes) · LW · GW

I'd say: definitely nuanced. Definitely very inconsistent on this point. Not consistently asserting an extreme metaphysical view like "the true, mind-independent world is incomprehensibly different from the world we experience", though seeming to flirt with this view (or framing?) constantly, to the extent that all his contemporaries did think he had a view at least this weird. Mainly guilty of (a) muddled and poorly-articulated thoughts and (b) approaching epistemology with the wrong goals and methods.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-20T02:57:39.561Z · score: 2 (1 votes) · LW · GW
When Chalmers claims to have "direct" epistemic access to certain facts, the proper response is to provide the arguments for doubting that claim, not to play a verbal sleight-of-hand like Dennett's (1991, emphasis added):

Chalmers' The Conscious Mind was written in 1996, so this is wrong. The wrongness doesn't seem important to me. (Jackson and Nagel were 1979/1982, and Dennett re-endorsed this passage in 2003.)

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T23:22:04.852Z · score: 2 (1 votes) · LW · GW
It is indisputably the case that Chalmers, for instance, makes arguments along the lines of “there are further facts revealed by introspection that can’t be translated into words”. But it is not only not indisputably the case

What does "indisputably" mean here in Bayesian terms? A Bayesian's epistemology is grounded in what evidence that individual has access to, not in what disputes they can win. When Chalmers claims to have "direct" epistemic access to certain facts, the proper response is to provide the arguments for doubting that claim, not to play a verbal sleight-of-hand like Dennett's (1991, emphasis added):

You are not authoritative about what is happening in you, but only about what seems to be happening in you, and we are giving you total, dictatorial authority over the account of how it seems to you, about what it is like to be you. And if you complain that some parts of how it seems to you are ineffable, we heterophenomenologists will grant that too. What better grounds could we have for believing that you are unable to describe something than that (1) you don’t describe it, and (2) confess that you cannot? Of course you might be lying, but we’ll give you the benefit of the doubt.

It's intellectually dishonest of Dennett to use the word "ineffable" here to slide between the propositions "I'm unable to describe my experience" and "my experience isn't translatable in principle", as it is to slide between Nagel's term of art "what it's like to be you" and "how it seems to you".

Again, I agree with Dennett that Chalmers is factually wrong about his experience (and therefore lacks a certain degree of epistemic "authority" with me, though that's such a terrible way of phrasing it!). There are good Bayesian arguments against trusting autophenomenology enough for Chalmers' view to win the day (though Dennett isn't describing any of them here), and it obviously is possible to take philosophers' verbal propositions as data to study (cf. also the meta-problem of consciousness), but it's logically rude to conceal your cruxes, pretend that your method is perfectly neutral and ecumenical, and let the "scientificness" of your proposed methodology do the rhetorical pushing and pulling.

but indeed can’t ever (without telepathy etc., or maybe not even then) be shown to another person, or perceived by another person, to be the case, that there are further facts revealed by introspection that can’t be translated into words.

There's a version of this claim I agree with (since I'm a physicalist), but the version here is too strong. First, I want to note again that this is equating group epistemology with individual epistemology. But even from a group's perspective, it's perfectly possible for "facts revealed by introspection that can't be translated into words" to be transmitted between people; just provide someone with the verbal prompts (or other environmental stimuli) that will cause them to experience and notice the same introspective data in their own brains.

If that's too vague, consider this scenario as an analogy: Our universe is a (computable) simulation, running in a larger universe that's uncomputable. Humans are "dualistic" in the sense that they're Cartesian agents outside the simulation whose brains contain uncomputable subprocesses, but their sensory experiences and communication with other agents is all via the computable simulation. We could then imagine scenarios where the agents have introspective access to evidence that they're performing computations too powerful to run in the laws of physics (as they know them), but don't have output channels expressive enough to demonstrate this fact to others in-simulation; instead, they prompt the other agents to perform the relevant introspective feat themselves.

The other agents can then infer that their minds are plausibly all running on physics that's stronger than the simulated world's physics, even though they haven't found a directly demonstrate this (e.g., via neurosurgery on the in-simulation pseudo-brain).

Indeed it’s not even clear how you’d demonstrate to yourself that what your introspection reveals is real.

You can update upward or downward about the reliability of your introspection (either in general, or in particular respects), in the same way you can update upward or downward about the reliability of your sensory perception. E.g., different introspective experiences or faculties can contradict each other, suggest their own unreliability ("I'm introspecting that this all feels like bullshit..."), or contradict other evidence sources.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T22:13:07.441Z · score: 4 (2 votes) · LW · GW

A simple toy example would be: "You have perfect introspective access to everything about how your brain works, including how your sensory organs work. This allows you to deduce that your external sensory organs provide noise data most of the time, but provide accurate data about the environment anytime you wear blue sunglasses at night."

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T20:23:36.313Z · score: 4 (2 votes) · LW · GW

"Heterophenomenology" might be fine as a meme for encouraging certain kinds of interesting research projects, but there are several things I dislike about how Dennett uses the idea.

Mainly, it's leaning on the social standards of scientific practice, and on a definition of what "real science" or "good science" is, to argue against propositions like "any given scientist studying consciousness should take into account their own introspective data -- e.g., the apparent character of their own visual field -- in addition to verbal descriptions, as an additional fact to explain." This is meant to serve as a cudgel and bulwark against philosophers like David Chalmers, who claim that introspection reveals further facts (/data/explananda) not strictly translatable into verbal reports.

This is framing the issue as one of social-acceptability-to-the-norms-of-scientists or conformity-with-a-definition-of-"science", whereas correct versions of the argument are Bayesian. (And it's logically rude to not make the Bayesianness super explicit and clear, given the opportunity; it obscures your premises while making your argument feel more authoritative via its association with "science".)

We can imagine a weird alien race (or alien AI) that has extremely flawed sensory faculties, and very good introspection. A race like that might be able to bootstrap to good science, via leveraging their introspection to spot systematic ways in which their sensory faculties fail, and sift out the few bits of reliable information about their environments.

Humans are plausibly the opposite: as an accident of evolution, we have much more reliable sensory faculties than introspective faculties. This is a generalization from the history of science and philosophy, and from the psychology literature. Moreover, humans have a track record of being bad at distinguishing cases where their introspection is reliable from cases where it's unreliable; so it's hard to be confident of any lines we could draw between the "good introspection" and the "bad introspection". All of this is good reason to require extra standards of evidence before humanity "takes introspection at face value" and admits it into its canon of Established Knowledge.

Personally, I think consciousness is (in a certain not-clarified-here sense) an illusion, and I'm happy to express confidence that Chalmers' view is wrong. But I think Dennett has been uniquely bad at articulating the reasons Chalmers is probably wrong, often defaulting to dismissing them or trying to emphasize their social illegitimacy (as "unscientific").

The "heterophenomenology" meme strikes me as part of that project, whereas a more honest approach would say "yeah, in principle introspective arguments are totally admissible, they just have to do a bit more work than usual because we're giving them a lower prior (for reasons X, Y, Z)" and "here are specific reasons A, B, C that Chalmers' arguments don't meet the evidential bar that's required for us to take the 'autophenomenological' data at face value in this particular case".

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T19:07:08.136Z · score: 2 (1 votes) · LW · GW

Also interesting: "insistence that we be immune to skeptical arguments" and "fascination with the idea of representation/intentionality/'aboutness'" seems to have led the continental philosophers in similar directions, as in Sartre's "Intentionality: A Fundamental Idea of Husserl’s Phenomenology." But that intellectual tradition had less realism, instrumentalism, and love-of-science in its DNA, so there was less resistance to sliding toward an "everything is sort of subjective" position.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T17:41:40.282Z · score: 16 (4 votes) · LW · GW

Upvoted! My discussion of a bunch of these things above is very breezy, and I approve of replacing the vague claims with more specific historical ones. To clarify, here are four things I'm not criticizing:

  • 1. Eliminativism about particular mental states, of the form 'we used to think that this psychological term (e.g., "belief") mapped reasonably well onto reality, but now we understand the brain well enough to see it's really doing [description] instead, and our previous term is a misleading way of gesturing at this (or any other) mental process.'

I'm an eliminativist (or better, an illusionist) about subjectivity and phenomenal consciousness myself. (Though I think the arguments favoring that view are complicated and non-obvious, and there's no remotely intellectually satisfying illusionist account of what the things we call "conscious" really consist in.)

  • 2. In cases where the evidence for an eliminativist hypothesis isn't strong, the practice of having some research communities evaluate eliminativism or try eliminativism out and see if it leads in any productive directions. Importantly, a community doing this should treat the eliminativist view as an interesting hypothesis or an exploratory research program, not in any way as settled science (or pre-scientific axiom!).
  • 3. Demanding evidence for claims, and being relatively skeptical of varieties of evidence that have a poor track record, even if they "feel compelling".
  • 4. Demanding that high-level terms be in principle reducible to lower-level physical terms (given our justified confidence in physicalism and reductionism).

In the case of psychology, I am criticizing (and claiming really happened, though I agree that these views weren't as universal, unquestioned, and extreme as is sometimes suggested):

  • Skinner's and other behaviorists' greedy reductionism; i.e., their tendency to act like they'd reduced or explained more than they actually had. Scientists should go out of their way to emphasize the limitations and holes in their current models, and be very careful (and fully explicit about why they believe this) when it comes to claims of the form 'we can explain literally everything in [domain] using only [method].'
  • Rushing to achieve closure, dismiss open questions, forbid any expressions of confusion or uncertainty, and treat blank parts of your map as though they must correspond to a blank (or unimportant) territory. Quoting Watson (1928):
With the advent of behaviorism in 1913 the mind-body problem disappeared — not because ostrich-like its devotees hid their heads in the sand but because they would take no account of phenomena which they could not observe. The behaviorist finds no mind in his laboratory — sees it nowhere in his subjects. Would he not be unscientific if he lingered by the wayside and idly speculated upon it; just as unscientific as the biologists would be if they lingered over the contemplation of entelechies, engrams and the like. Their world and the world of the behaviorist are filled with facts — with data which can be accumulated and verified by observation — with phenomena which can be predicted and controlled.
If the behaviorists are right in their contention that there is no observable mind-body problem and no observable separate entity called mind — then there can be no such thing as consciousness and its subdivision. Freud's concept borrowed from somatic pathology breaks down. There can be no festering spot in the substratum of the mind — in the unconscious —because there is no mind.
  • More generally: overconfidence in cool new ideas, and exaggeration of what they can do.
  • Over-centralizing around an eliminativist hypothesis or research program in a way that pushes out brainstorming, hypothesis-generation, etc. that isn't easy to fit into that frame. I quote Hempel (1935) here:
[Behaviorism's] principal methodological postulate is that a scientific psychology should limit itself to the study of the bodily behavior with which man and the animals respond to changes in their physical environment, and should proscribe as nonscientific any descriptive or explanatory step which makes use of terms from introspective or 'understanding' psychology, such as 'feeling', 'lived experience', 'idea', 'will', 'intention', 'goal', 'disposition', 'represension'. We find in behaviorism, consequently, an attempt to construct a scientific psychology[.]
  • Simply put: getting the wrong answer. Some errors are more excusable than others, but even if my narrative about why they got it wrong is itself wrong, it would still be important to emphasize that they got it wrong, and could have done much better.
  • The general idea that introspection is never admissible as evidence. It's fine if you want to verbally categorize introspective evidence as 'unscientific' in order to distinguish it from other kinds of evidence, and there are some reasonable grounds for skepticism about how strong many kinds of introspective evidence are. But evidence is still evidence; a Bayesian shouldn't discard evidence just because it's hard to share with other agents.
  • The rejection of folk-psychology language, introspective evidence, or anything else for science-as-attire reasons.

Idealism emphasized some useful truths (like 'our perceptions and thoughts are all shaped by our mind's contingent architecture') but ended up in a 'wow it feels great to make minds more and more important' death spiral.

Behaviorism too emphasized some useful truths (like 'folk psychology presupposes a bunch of falsifiable things about minds that haven't all been demonstrated very well', 'it's possible for introspection to radically mislead us in lots of ways', and 'it might benefit psychology to import and emphasize methods from other scientific fields that have a better track record') but seems to me to have fallen into a 'wow it feels great to more and more fully feel like I'm playing the role of a True Scientist and being properly skeptical and cynical and unromantic about humans' trap.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-18T15:56:48.174Z · score: 4 (2 votes) · LW · GW
The same "that sounds silly" heuristic that helps you reject Berkeley's argument (when it's fringe and 'wears its absurdity on its sleeve') helps you accept 19th-century idealists' versions of the argument (when it's respectable and framed as the modern/scientific/practical/educated/consensus view on the issue).

I should also emphasize that Berkeley's idealism is very different from (e.g.) Hegel's idealism. "Idealism" comes in enough different forms that it's probably more useful for referring to a historical phenomenon than a particular ideology. (Fortunately, the former is the topic I'm interested in here.)

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-18T14:22:49.771Z · score: 2 (1 votes) · LW · GW
Berkely's argument caused a fair amount of incredulity at the time. Samuel Johnon's Argumentum ad Lapidum was intended as a reponse.

This seems like incredulity at his conclusion, rather than at his argument. Do you know of good criticisms of the master argument from the time? (Note it wasn't given a standard name until the 1970s.)

To be clear, I think Berkeley was near-universally rejected at the time, because his conclusion ('there's no material world') was so wild. Most people also didn't understand what Berkeley was saying, even though he was pretty clear about it (see: Kant's misunderstanding; and the above fallacious counterargument, assuming it wasn't just a logically rude joke on Johnson's part).

But I don't update positively about people for rejecting silly-sounding conclusions just based on how silly they sound. The same "that sounds silly" heuristic that helps you reject Berkeley's argument (when it's fringe and 'wears its absurdity on its sleeve') helps you accept 19th-century idealists' versions of the argument (when it's respectable and framed as the modern/scientific/practical/educated/consensus view on the issue).

BTW, I notice that a lot of people here are persuaded by Aumann's Agreement Theorem, which is every bit as flawed in my view.

Flawed how?

Comment by robbbb on Is value amendment a convergent instrumental goal? · 2019-10-18T06:22:09.105Z · score: 14 (8 votes) · LW · GW

"Avoiding amending your utility function" is one of the classic convergent instrumental goals in Bostrom and Omohundro, and the reasoning there is sound: almost any goal will be better satisfied if it preserves itself than if it replaces itself with a different goal.

I do think it's plausible that AGI systems will have pretty unstable goals early on, but that's because goal stability seems hard to me and AGI systems probably won't perfectly figure it out very early along their development curve. I'm imagining accidental goal modification (for insufficiently capable systems), whereas you're describing deliberate goal modification (for sufficiently capable systems).

One way of thinking about this is to note that "wanting your goals to not be externally supplied" is itself a goal, and a relatively specific one at that; if you don't have something like that specific goal as part of the core criteria you use to select options, there's no instrumental reason for you to converge upon it. E.g., if your goal is simply "maximize the number of paperclips in your future light cone," then the etiology of your goal doesn't matter (from your perspective).

Comment by robbbb on What's going on with "provability"? · 2019-10-14T20:55:44.523Z · score: 4 (3 votes) · LW · GW

Ike is responding to this:

Gödel: What could it mean for a statement to be "true but not provable"? Is this just because there are some statements such that neither P nor not-P can be proven, yet one of them must be true? If so, I would (stubbornly) contest that perhaps P and not-P really are both non-true.

"P and not-P really are both non-true" is classically false, and Gödel holds in classical mathematics, so Evan's response isn't available in that case.

Evan's sense that "perhaps P and not-P really are both non-true" might be a reason for him to endorse intuitionism as "more correct" than classical math in some sense.

Comment by robbbb on What's going on with "provability"? · 2019-10-13T20:46:33.539Z · score: 5 (3 votes) · LW · GW

Proofs, Implications, and Models introduces some of these ideas more slowly. Other stuff from the Highly Advanced Epistemology 101 for Beginners is relevant too, and includes more realism-flavored concerns about choosing between systems.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-13T20:33:18.068Z · score: 2 (1 votes) · LW · GW
My claim is that instrumentalism is the correct metaphysics regardless

What does it mean for instrumentalism to be the correct metaphysics? Normally, I'd interpret "the correct metaphysics" as saying something basic about reality or the universe. (Or, if you're an instrumentalist and you say "X is the correct metaphysics", I'd assume you were saying "it's useful to have a model that treats X as a basic fact about reality or the universe", which also doesn't make sense to me if X is "instrumentalism".)

Although it is also true that if you try interpreting quantum mechanics according to sufficiently strong realist desiderata

Well, sufficiently specific realist desiderata. Adding hidden variables to QM doesn't make the theory any more realist, the way we're using "realist" here.

Comment by robbbb on What's going on with "provability"? · 2019-10-13T12:39:04.807Z · score: 3 (2 votes) · LW · GW

A non-technical summary of how arithmetization is used in this argument:

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-13T04:20:49.488Z · score: 6 (3 votes) · LW · GW

I think my characterization is accurate, but maybe guilty of weak-manning: I'm recounting a salient recent (long) conversation with laypeople, rather than attempting a representative survey of non-realists or trying to find the best proponents.

I had in mind a small social gathering I attended (without any conscious effort to seek out and find non-realists) where most of the people in the room voiced disagreement with my claim that truth is a coherent idea, that some entities aren't social or psychological constructs, that some methods for learning things are more objective/reasonable/justified than others, and so on.

I tried to find common ground on the most basic claims I could think of, like "OK, but we can at least agree that something is real, right? There's, like, stuff actually going on?" I wasn't successful. And I think I'm pretty good at not straw-manning people on these issues; I'm used to drawing pretty fine distinctions between pretty out-there ontological and epistemological views. (E.g., I'm perfectly happy to try to tease apart the nuances of thinkers like Parmenides, Nagarjuna, Zhuangzi, Sextus, William James, Dharmakirti, Schopenhauer, Jonathan Schaffer, Graham Priest, Sartre, Berkeley. This stuff is interesting, even if I put no stock in it.)

To my ear, "it pays to think in terms other than reality/truth sometimes" sound too weak on its own to count as 'anti-realism'. If I think it's ever (cognitively?) useful to read fiction, or explore fake frameworks, or just take a nap and clear my head, that already seems to qualify. I'm happy to hear more about what you have in mind, though, regardless of what labels fit best.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T22:05:24.268Z · score: 4 (2 votes) · LW · GW
This post was educational, however, I want to push back against the implicit criticism of instrumentalism and the Copenhagen interpretation. The metaphilosophical position I will use here is: to solve a philosophical question, we need to rephrase it as a question about AI design

Maybe the problem is that I'm not sufficiently convinced that there's a philosophical question here. Sometimes philosophers (and even physicists) argue about things that aren't open questions. "Do refrigerators exist, or only mental models of refrigerators?" sounds like a straightforward, testable empirical question to me, with all the evidence favoring "refrigerators exist".

I predict I'm missing an implicit premise explaining why "I don't currently understand where the Born rule comes from" is a bigger problem for realism than "I don't currently understand how my refrigerator works", or some other case where realism makes things unnecessarily hard/confusing, like infinite ethics or anthropics or somesuch.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T21:43:50.501Z · score: 4 (2 votes) · LW · GW

I should note that Russell was also led in some pretty weird directions by the desire to resist skeptical arguments:

What distinguishes neutral monism from its monistic rivals is the claim that the intrinsic nature of ultimate reality is neither mental nor physical. [...] Following a series of critical engagements with neutral monism (see especially Russell 1914a,b), Russell adopted it in Russell 1919 and remained a neutral monist for the rest of his long career: “I am conscious of no major change in my opinions since the adoption of neutral monism” is what he says in an interview from 1964 (Eames 1969: 108). [...]
For an entity to be neutral is to “have neither the hardness and indestructibility of matter, nor the reference to objects which is supposed to characterize the mind” (Russell 1921: 36; cf. 124). Russell never suspected sensations of being material (in this sense). That sensations are mental (in this sense)—that they consist of a mental act of sensing directed at a non-mental object—was, however, a pivotal part of his earlier view. But then his views changed:
"I formerly believed that my own inspection showed me the distinction between a noise [the object] and my hearing of a noise [the act of sensing], and I am now convinced that it shows me no such thing, and never did." (Russell 1918b: 255)

Also, insofar as Moore and Russell are gesturing at similar issues, Moore's paper provides some support for the claim that master-argument-ish reasoning was central to the idealism and rested on a simple error that was concealed by motivated reasoning and obfuscatory language, and that no one noticed (or successfully popularized) the error prior to Moore/Russell:

I am suggesting that the Idealist maintains that object and subject are necessarily connected, mainly because he fails to see that they are distinct, that they are two, at all. When he thinks of 'yellow' and when he thinks of the 'sensation of yellow', he fails to see that there is anything whatever in the latter which is not in the former. [...]
But I am well aware that there are many Idealists who would repel it as an utterly unfounded charge that they fail to distinguish between a sensation or idea and what I will call its object. And there are, I admit, many who not only imply, as we all do, that green is distinct from the sensation of green, but expressly insist upon the distinction as an important part of their system. They would perhaps only assert that the two form an inseparable unity.
But I wish to point out that many, who use this phrase, and who do admit the distinction, are not thereby absolved from the charge that they deny it. For there is a certain doctrine, very prevalent among philosophers nowadays, which by a very simple reduction may be seen to assert that two distinct things both are and are not distinct. A distinction is asserted; but it is also asserted that the things distinguished form an 'organic unity'. But, forming such a unity, it is held, each would not be what it is apart from its relation to the other. Hence to consider either by itself is to make an illegitimate abstraction.
The recognition that there are 'organic unities' and 'illegitimate abstractions' in this sense is regarded as one of the chief conquests of modern philosophy. But what is the sense attached to these terms? An abstraction is illegitimate, when and only when we attempt to assert of a part - of something abstracted - that which is true only of the whole to which it belongs: and it may perhaps be useful to point out that this should not be done. But the application actually made of this principle, and what perhaps would be expressly acknowledged as its meaning, is something much the reverse of useful. The principle is used to assert that certain abstractions are in all cases illegitimate; that whenever you try to assert anything whatever of that which is part of an organic whole, what you assert can only be true of the whole. And this principle, so far from being a useful truth, is necessarily false. For if the whole can, nay must, be substituted for the part in all propositions and for all purposes, this can only be because the whole is absolutely identical with the part.
When, therefore, we are told that green and the sensation of green are certainly distinct but yet are not separable, or that it is an illegitimate abstraction to consider the one apart from the other, what these provisos are used to assert is, that though the two things are distinct yet you not only can but must treat them as if they were not. Many philosophers, therefore, when they admit a distinction, yet (following the lead of Hegel) boldly assert their right, in a slightly more obscure form of words, also to deny it. The principle of organic unities, like that of combined analysis and synthesis, is mainly used to defend the practice of holding both of two contradictory propositions, wherever this may seem convenient.
In this, as in other matters, Hegel's main service to philosophy has consisted in giving a name to and erecting into a principle, a type of fallacy to which experience had shown philosophers, along with the rest of mankind, to be addicted. No wonder that he has followers and admirers. [...]
And at this point I need not conceal my opinion that no philosopher has ever yet succeeded in avoiding this self-contradictory error: that the most striking results both of Idealism and of Agnosticism are only obtained by identifying blue with the sensation of blue: that esse ["existing"] is held to be percipi ["being perceived"], solely because what is experienced is held to be identical with the experience of it. That Berkeley and Mill committed this error will, perhaps, be granted: that modern Idealists make it will, I hope, appear more probable later.

This updates me partway back toward the original claim I made (that Berkeley's master argument was causally important for the rise of idealism and its 20th-century successors).

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T21:19:42.796Z · score: 9 (2 votes) · LW · GW
Isn't that the same argument Russell was making?

They're... similar? I find Russell a lot clearer on this point:

We might state the argument by which they support their view in some such way as this: 'Whatever can be thought of is an idea in the mind of the person thinking of it; therefore nothing can be thought of except ideas in minds; therefore anything else is inconceivable, and what is inconceivable cannot exist.'
Such an argument, in my opinion, is fallacious; and of course those who advance it do not put it so shortly or so crudely. But whether valid or not, the argument has been very widely advanced in one form or another; and very many philosophers, perhaps a majority, have held that there is nothing real except minds and their ideas.

I find Moore's way of discussing the issue weirder. Moore is definitely making an argument from 'a thought's object is different from the thought itself' to 'idealism is false', but his argument seems to involve weird steps like 'our experiences don't have contents' (rather than the expected 'the content of our experience is different from its referent'):

What I wish to point out is (1) that we have no reason for supposing that there are such things as mental images at all -- for supposing that blue is part of the content of the sensation of blue, and (2) that even if there are mental images, no mental image and no sensation or idea is merely a thing of this kind; that 'blue', even if it is part of the content of the image or sensation or idea of blue, is always also related to it in quite another way, and that this other relation, omitted in the traditional analysis, is the only one which makes the sensation of blue a mental fact at all. [...]
To have in your mind 'knowledge' of blue, is not to have in your mind a 'thing' or 'image' of which blue is the content. To be aware of the sensation of blue is not to be aware of a mental image - of a 'thing', of which 'blue' and some other element are constituent parts in the same sense in which blue and glass are constituents of a blue bead. It is to be aware of an awareness of blue; awareness being used, in both cases, in exactly the same sense. This element, we have seen, is certainly neglected by the 'content' theory: that theory entirely fails to express the fact that there is, in the sensation of blue, this unique relation between blue and the other constituent.

Baldwin (2004) confirms that this line of reasoning, plus Moore's attempt to resist skeptical hypotheses, led Moore in a very confused direction:

But what is the relationship between sense-data [i.e., the thingies we're directly conscious of] and physical objects? Moore took it that there are three serious candidates to be considered: (i) an indirect realist position, according to which sense-data are non-physical but somehow produced by interactions between physical objects and our senses; (ii) the phenomenalist position, according to which our conception of physical objects is merely one which expresses observed and anticipated uniformities among the sense-data we apprehend; (iii) a direct realist position, according to which sense-data are parts of physical objects — so that, for example, visual sense-data are visible parts of the surfaces of physical objects.
The indirect realist position is that to which he was initially drawn; but he could see that it leaves our beliefs about the physical world exposed to skeptical doubt, since it implies that the observations which constitute evidence for these beliefs concern only the properties of non-physical sense-data, and there is no obvious way for us to obtain further evidence to support a hypothesis about the properties of the physical world and its relationship to our sense-data.
This argument is reminiscent of Berkeley's critique of Locke, and Moore therefore considered carefully Berkeley's phenomenalist alternative. Moore's initial response to this position was that the implied conception of the physical world was just too ‘pickwickian’ to be believable. This may be felt to be too intuitive, like Dr. Johnson's famous objection to Berkeley; but Moore could also see that there were substantive objections to the phenomenalist position, such as the fact that our normal ways of identifying and anticipating significant uniformities among our sense-data draw on our beliefs about our location in physical space and the state of our physical sense-organs, neither of which are available to the consistent phenomenalist.
So far Moore's dialectic is familiar. What is unfamiliar is his direct realist position, according to which sense-data are physical. This position avoids the problems so far encountered, but in order to accommodate false appearances Moore has to allow that sense-data may lack the properties which we apprehend them as having. It may be felt that in so far as sense-data are objects at all, this is inevitable; but Moore now needs to provide an account of the apparent properties of sense-data and it is not clear how he can do this without going back on the initial motivation for the sense-datum theory by construing these apparent properties as properties of our experiences. But what in fact turns Moore against this direct realist position is the difficulty he thinks it leads to concerning the treatment of hallucinations. In such cases, Moore holds, any sense-data we apprehend are not parts of a physical object; so direct realism cannot apply to them, and yet there is no reason to hold that they are intrinsically different from the sense-data which we apprehend in normal experience. This last point might well be disputed, and at one point Moore himself considers the possibility of a distinction between ‘subjective’ and ‘objective’ sense-data; but once one has introduced sense-data in the first place as the primary objects of experience it is not going to be easy to make a distinction here without assuming more about experience than Moore at any rate would have wanted to concede.
Moore wrote more extensively about perception than about any other topic. In these writings he moves between the three alternatives set out here without coming to any firm conclusion.
Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T04:17:30.587Z · score: 5 (3 votes) · LW · GW

Re "was Berkeley making such an obvious mistake?", I think this is historians' majority view, but multiple people have tried to come up with more reasonable versions of the argument; see Gallois (1974) and Downing (2011). Note that Berkeley makes the same argument in dialogue form here (starts at "How say you, Hylas, can you see a thing which is at the same time unseen?"), so you can check if you find that version more tenable.

The Bloomsbury Companion to Berkeley says:

[This passage] can be interpreted as making a straightforward howler, arguing that because whenever you think of something it is being thought of and anything being thought of is, ipso facto, 'in the mind' then you cannot think of something that is not in the mind. According to Russell, this was a keystone for idealism and it involves a simple mistake.

"Berkeley's view . . . seems to depend for its plausibility upon confusing the thing apprehended with the act of apprehension. Either of these might be called an 'idea'; probably either would have been called an idea by Berkeley. The act is undoubtedly in the mind; hence, when we are thinking of the act, we readily assent to the view that ideas must be in the mind. Then, forgetting that this was only true when ideas were taken as acts of apprehension, we transfer the proposition that 'ideas are in the mind' to ideas in the other sense, i.e. to the things apprehended by our acts of apprehension. Thus, by an unconscious equivocation, we arrive at the conclusion that whatever we apprehend must be in our minds."

Russell's criticism is in line with Moore's famous 'The Refutation of Idealism' (1903), where he argues that if one recognizes the act-object distinction within conscious states, one can see that the object is independent of the act. This 'discovery', together with the development of a formal logic for relations, was the cornerstone of the rejection of 'British idealism'. If objects can be conceived of as independent of conscious thought, and if it is consistent to think of them as in actually related to each other, the mentalistic holism that was contemporary idealism is demolished.

That said, I put a lot of weight on Allen Wood's view as a leading Kant scholar, and revisiting his book Kant, he doesn't think Kant accepted the master argument (p. 69). David (2015) asserts a link, but it looks tenuous to me.

Kant's earliest interpreters took him to be saying "trees, oceans, etc. all just exist in your head and have nothing in common with the mysterious ineffable things-in-themselves", and Kant definitely talks like that a great deal, but he also says a lot that contradicts that view. Wood thinks Kant was just really confused and fuzzy about his own view, and didn't have a consistent model here (pp. 63-71).

My new pet theory is that Kant was being pulled in one direction by "wanting to make things as subjective as possible so he can claim more epistemic immediacy and therefore more immunity to skeptical arguments", and in the opposite direction by "not wanting to sound like a crazy person like Berkeley", so we get inconsistencies.

I don't know who, if anyone, noted the obvious fallacy in Berkeley's master argument prior to Russell in 1912, and Russell seems to think the argument was central to idealism's appeal. Regardless, my new view is: philosophy mainly ended up going down an idealist cul-de-sac because Kant shared Berkeley's "try to find ways to treat more things as subjective" approach to defeating skepticism. (Possibly without realizing it; Stang (2016) suggests Kant was pretty confused about what Berkeley believed.) Then Kant and Hegel built sufficiently dense, mysterious, and complicated intellectual edifices that it was easy for them to confuse themselves and others, while still being brilliant, innovative, and internally consistent enough to attract a lot of followers.

Comment by robbbb on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-11T13:44:01.380Z · score: 5 (4 votes) · LW · GW

I downvoted TAG's comment because I found it confusing/misleading. I can't tell which of these things TAG's trying to do:

  • Assert, in a snarky/indirect way, that people agitating about AI safety have no overlap with AI researchers. This seems doubly weird in a conversation with Stuart Russell.
  • Suggest that LeCun believes this. (??)
  • Assert that LeCun doesn't mean to discourage Russell's research. (But the whole conversation seems to be about what kind of research people should be doing when in order to get good outcomes from AI.)
Comment by robbbb on Misconceptions about continuous takeoff · 2019-10-09T16:50:18.826Z · score: 22 (6 votes) · LW · GW
My intuition is that it'd probably be pretty easy to create an aligned superhuman AI if we knew how to create non-singular, mis-aligned superhuman AIs, and had cheap, robust methods to tell if a particular AI was misaligned.

This sounds different from how I model the situation; my views agree here with Nate's (emphasis added):

I would rephrase 3 as "There are many intuitively small mistakes one can make early in the design process that cause resultant systems to be extremely difficult to align with operators’ intentions.” I’d compare these mistakes to the “small” decision in the early 1970s to use null-terminated instead of length-prefixed strings in the C programming language, which continues to be a major source of software vulnerabilities decades later.
I’d also clarify that I expect any large software product to exhibit plenty of actually-trivial flaws, and that I don’t expect that AGI code needs to be literally bug-free or literally proven-safe in order to be worth running. Furthermore, if an AGI design has an actually-serious flaw, the likeliest consequence that I expect is not catastrophe; it’s just that the system doesn’t work. Another likely consequence is that the system is misaligned, but in an obvious ways that makes it easy for developers to recognize that deployment is a very bad idea. The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

My current model of 'the default outcome if the first project to develop AGI is highly safety-conscious, is focusing on alignment, and has a multi-year lead over less safety-conscious competitors' is that the project still fails, because their systems keep failing their tests but they don't know how to fix the deep underlying problems (and may need to toss out years of work and start from scratch in order to have a real chance at fixing them). Then they either (a) lose their lead, and some other project destroys the world; (b) decide they have to ignore some of their tests, and move ahead anyway; or (c) continue applying local patches without understanding or fixing the underlying generator of the test failures, until they or their system find a loophole in the tests and sneak by.

I don't think any of this is inevitable or impossible to avoid; it's just the default way I currently visualize things going wrong for AGI developers with a strong interest in safety and alignment.

Possibly you'd want to rule out (c) with your stipulation that the tests are "robust"? But I'm not sure you can get tests that robust. Even in the best-case scenario where developers are in a great position to build aligned AGI and successfully do so, I'm not imagining post-hoc tests that are robust to a superintelligence trying to game them. I'm imagining that the developers have a prior confidence from their knowledge of how the system works that every part of the system either lacks the optimization power to game any relevant tests, or will definitely not apply any optimization to trying to game them.

Comment by robbbb on FB/Discord Style Reacts · 2019-10-04T01:14:46.041Z · score: 4 (3 votes) · LW · GW

Another idea, maybe harder to implement: allow users to start a private chat with the anonymous user who left a reaction. I think in general these kinds of issues are often best resolved through one-on-one chat, and even if the anon chooses not to reply, people might feel less helpless/disempowered if they can reply in some fashion and know their critic is likely to see what they think.

If LW or the EA Forum tried something like this (which might also be helpful in some form even for downvotes), you'd probably want to make the expected discourse norms of these chats extra-prominent in the UI, to reduce the risk of bad interactions (and explain why mods may need to read the private messages if there's a worry about e.g. private verbal abuse going on).

Comment by robbbb on AI Alignment Open Thread August 2019 · 2019-10-01T00:04:49.551Z · score: 4 (2 votes) · LW · GW
Or do you think the discontinuity will be more in the realm of embedded agency style concerns (and how does this make it less safe, instead of just dysfunctional?)

This in particular doesn't match my model. Quoting some relevant bits from Embedded Agency:

So I'm not talking about agents who know their own actions because I think there's going to be a big problem with intelligent machines inferring their own actions in the future. Rather, the possibility of knowing your own actions illustrates something confusing about determining the consequences of your actions—a confusion which shows up even in the very simple case where everything about the world is known and you just need to choose the larger pile of money.
But it’s not that I’m imagining real-world embedded systems being “too Bayesian” and this somehow causing problems, if we don’t figure out what’s wrong with current models of rational agency. It’s certainly not that I’m imagining future AI systems being written in second-order logic! In most cases, I’m not trying at all to draw direct lines between research problems and specific AI failure modes.
What I’m instead thinking about is this: We sure do seem to be working with the wrong basic concepts today when we try to think about what agency is, as seen by the fact that these concepts don’t transfer well to the more realistic embedded framework.

This is also the topic of The Rocket Alignment Problem.

Comment by robbbb on Follow-Up to Petrov Day, 2019 · 2019-09-29T18:37:24.222Z · score: 10 (6 votes) · LW · GW

FWIW, I thought the ritual this year was fine and I'm not sure adding a cash prize to the ritual itself will be communicating the right lesson. It then starts to feel like a ritual about 'do we care more about symbolism than about saving lives?', rather than a ritual about coordination.

Comment by robbbb on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T18:00:18.927Z · score: 6 (3 votes) · LW · GW

No. This is maybe clearer given the parenthetical I edited in. Speaking for myself, Critch's recommendations in seemed broadly reasonable to me, though I'm uncertain about those too and I don't know of a 'MIRI consensus view' on Critch's suggestions.

I feel pretty confident about "this is a line of thinking that's reasonable and healthy to be able to entertain, alongside lots of other complicated case-by-case factors that all need to be weighed by each actor", and then I don't know how to translate that into concrete recommendations for arbitrary LW users.

Comment by robbbb on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T02:11:40.326Z · score: 2 (1 votes) · LW · GW
It seems to me if you’re someone who has done a PhD in ML or is very good at ML, but you currently can’t get a position that seems especially safety-focused or that is going to disproportionately affect safety more than capabilities, it is probably still good to take a job that just advances AI in general, mostly because you’ll be reaching the cutting edge potentially of what’s going on and improving your career capital a lot and having relevant understanding.

(The following is an off-the-cuff addition that occurred to me while reading this -- it's something I've thought about frequently, but it's intended as something to chew on, not as an endorsement or disavowal of any specific recommendation by Rob W or Paul above.)

The cobbled-together model of Eliezer in my head wants to say something like: 'In the Adequate World, the foremost thing in everyone's heads is "I at least won't destroy the world by my own hands", because that's the bare-minimum policy each individual would want everyone else to follow. This should probably also be in the back of everyone's heads in the real world, at least as a weight on the scale and a thought that's fine and not-arrogant to factor in.'

Comment by robbbb on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T01:49:45.953Z · score: 2 (1 votes) · LW · GW

Note that although my views are much closer to Paul’s than to Pushmeet’s here, I’m posting this because I found it a useful summary of some ML perspectives and disagreements on AI safety, not because I’m endorsing the claims above.

Some disagreements that especially jumped out at me: I'd treat it as a negative update if I learned that AI progress across the board had sped up, and I wouldn't agree with "even absent the actions of the longtermists, there’s a reasonably good chance that everything would just be totally fine".