Posts

A simple sketch of how realism became unpopular 2019-10-11T22:25:36.357Z · score: 57 (20 votes)
Christiano decision theory excerpt 2019-09-29T02:55:35.542Z · score: 54 (14 votes)
Kohli episode discussion in 80K's Christiano interview 2019-09-29T01:40:33.852Z · score: 14 (4 votes)
Rob B's Shortform Feed 2019-05-10T23:10:14.483Z · score: 19 (3 votes)
Helen Toner on China, CSET, and AI 2019-04-21T04:10:21.457Z · score: 71 (25 votes)
New edition of "Rationality: From AI to Zombies" 2018-12-15T21:33:56.713Z · score: 79 (30 votes)
On MIRI's new research directions 2018-11-22T23:42:06.521Z · score: 57 (16 votes)
Comment on decision theory 2018-09-09T20:13:09.543Z · score: 70 (26 votes)
Ben Hoffman's donor recommendations 2018-06-21T16:02:45.679Z · score: 40 (17 votes)
Critch on career advice for junior AI-x-risk-concerned researchers 2018-05-12T02:13:28.743Z · score: 204 (71 votes)
Two clarifications about "Strategic Background" 2018-04-12T02:11:46.034Z · score: 77 (23 votes)
Karnofsky on forecasting and what science does 2018-03-28T01:55:26.495Z · score: 17 (3 votes)
Quick Nate/Eliezer comments on discontinuity 2018-03-01T22:03:27.094Z · score: 71 (23 votes)
Yudkowsky on AGI ethics 2017-10-19T23:13:59.829Z · score: 92 (40 votes)
MIRI: Decisions are for making bad outcomes inconsistent 2017-04-09T03:42:58.133Z · score: 7 (8 votes)
CHCAI/MIRI research internship in AI safety 2017-02-13T18:34:34.520Z · score: 5 (6 votes)
MIRI AMA plus updates 2016-10-11T23:52:44.410Z · score: 15 (13 votes)
A few misconceptions surrounding Roko's basilisk 2015-10-05T21:23:08.994Z · score: 57 (53 votes)
The Library of Scott Alexandria 2015-09-14T01:38:27.167Z · score: 63 (53 votes)
[Link] Nate Soares is answering questions about MIRI at the EA Forum 2015-06-11T00:27:00.253Z · score: 19 (20 votes)
Rationality: From AI to Zombies 2015-03-13T15:11:20.920Z · score: 85 (84 votes)
Ends: An Introduction 2015-03-11T19:00:44.904Z · score: 3 (3 votes)
Minds: An Introduction 2015-03-11T19:00:32.440Z · score: 6 (8 votes)
Biases: An Introduction 2015-03-11T19:00:31.605Z · score: 80 (124 votes)
Rationality: An Introduction 2015-03-11T19:00:31.162Z · score: 15 (16 votes)
Beginnings: An Introduction 2015-03-11T19:00:25.616Z · score: 8 (5 votes)
The World: An Introduction 2015-03-11T19:00:12.370Z · score: 3 (3 votes)
Announcement: The Sequences eBook will be released in mid-March 2015-03-03T01:58:45.893Z · score: 47 (48 votes)
A forum for researchers to publicly discuss safety issues in advanced AI 2014-12-13T00:33:50.516Z · score: 12 (13 votes)
Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda 2014-11-26T11:02:01.038Z · score: 26 (31 votes)
Groundwork for AGI safety engineering 2014-08-06T21:29:38.767Z · score: 13 (14 votes)
Politics is hard mode 2014-07-21T22:14:33.503Z · score: 41 (73 votes)
The Problem with AIXI 2014-03-18T01:55:38.274Z · score: 29 (29 votes)
Solomonoff Cartesianism 2014-03-02T17:56:23.442Z · score: 34 (31 votes)
Bridge Collapse: Reductionism as Engineering Problem 2014-02-18T22:03:08.008Z · score: 54 (49 votes)
Can We Do Without Bridge Hypotheses? 2014-01-25T00:50:24.991Z · score: 11 (12 votes)
Building Phenomenological Bridges 2013-12-23T19:57:22.555Z · score: 67 (60 votes)
The genie knows, but doesn't care 2013-09-06T06:42:38.780Z · score: 57 (63 votes)
The Up-Goer Five Game: Explaining hard ideas with simple words 2013-09-05T05:54:16.443Z · score: 29 (34 votes)
Reality is weirdly normal 2013-08-25T19:29:42.541Z · score: 33 (48 votes)
Engaging First Introductions to AI Risk 2013-08-19T06:26:26.697Z · score: 20 (27 votes)
What do professional philosophers believe, and why? 2013-05-01T14:40:47.028Z · score: 31 (44 votes)

Comments

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-20T02:57:39.561Z · score: 2 (1 votes) · LW · GW
When Chalmers claims to have "direct" epistemic access to certain facts, the proper response is to provide the arguments for doubting that claim, not to play a verbal sleight-of-hand like Dennett's (1991, emphasis added):

Chalmers' The Conscious Mind was written in 1996, so this is wrong. The wrongness doesn't seem important to me. (Jackson and Nagel were 1979/1982, and Dennett re-endorsed this passage in 2003.)

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T23:22:04.852Z · score: 2 (1 votes) · LW · GW
It is indisputably the case that Chalmers, for instance, makes arguments along the lines of “there are further facts revealed by introspection that can’t be translated into words”. But it is not only not indisputably the case

What does "indisputably" mean here in Bayesian terms? A Bayesian's epistemology is grounded in what evidence that individual has access to, not in what disputes they can win. When Chalmers claims to have "direct" epistemic access to certain facts, the proper response is to provide the arguments for doubting that claim, not to play a verbal sleight-of-hand like Dennett's (1991, emphasis added):

You are not authoritative about what is happening in you, but only about what seems to be happening in you, and we are giving you total, dictatorial authority over the account of how it seems to you, about what it is like to be you. And if you complain that some parts of how it seems to you are ineffable, we heterophenomenologists will grant that too. What better grounds could we have for believing that you are unable to describe something than that (1) you don’t describe it, and (2) confess that you cannot? Of course you might be lying, but we’ll give you the benefit of the doubt.

It's intellectually dishonest of Dennett to use the word "ineffable" here to slide between the propositions "I'm unable to describe my experience" and "my experience isn't translatable in principle", as it is to slide between Nagel's term of art "what it's like to be you" and "how it seems to you".

Again, I agree with Dennett that Chalmers is factually wrong about his experience (and therefore lacks a certain degree of epistemic "authority" with me, though that's such a terrible way of phrasing it!). There are good Bayesian arguments against trusting autophenomenology enough for Chalmers' view to win the day (though Dennett isn't describing any of them here), and it obviously is possible to take philosophers' verbal propositions as data to study (cf. also the meta-problem of consciousness), but it's logically rude to conceal your cruxes, pretend that your method is perfectly neutral and ecumenical, and let the "scientificness" of your proposed methodology do the rhetorical pushing and pulling.

but indeed can’t ever (without telepathy etc., or maybe not even then) be shown to another person, or perceived by another person, to be the case, that there are further facts revealed by introspection that can’t be translated into words.

There's a version of this claim I agree with (since I'm a physicalist), but the version here is too strong. First, I want to note again that this is equating group epistemology with individual epistemology. But even from a group's perspective, it's perfectly possible for "facts revealed by introspection that can't be translated into words" to be transmitted between people; just provide someone with the verbal prompts (or other environmental stimuli) that will cause them to experience and notice the same introspective data in their own brains.

If that's too vague, consider this scenario as an analogy: Our universe is a (computable) simulation, running in a larger universe that's uncomputable. Humans are "dualistic" in the sense that they're Cartesian agents outside the simulation whose brains contain uncomputable subprocesses, but their sensory experiences and communication with other agents is all via the computable simulation. We could then imagine scenarios where the agents have introspective access to evidence that they're performing computations too powerful to run in the laws of physics (as they know them), but don't have output channels expressive enough to demonstrate this fact to others in-simulation; instead, they prompt the other agents to perform the relevant introspective feat themselves.

The other agents can then infer that their minds are plausibly all running on physics that's stronger than the simulated world's physics, even though they haven't found a directly demonstrate this (e.g., via neurosurgery on the in-simulation pseudo-brain).

Indeed it’s not even clear how you’d demonstrate to yourself that what your introspection reveals is real.

You can update upward or downward about the reliability of your introspection (either in general, or in particular respects), in the same way you can update upward or downward about the reliability of your sensory perception. E.g., different introspective experiences or faculties can contradict each other, suggest their own unreliability ("I'm introspecting that this all feels like bullshit..."), or contradict other evidence sources.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T22:13:07.441Z · score: 4 (2 votes) · LW · GW

A simple toy example would be: "You have perfect introspective access to everything about how your brain works, including how your sensory organs work. This allows you to deduce that your external sensory organs provide noise data most of the time, but provide accurate data about the environment anytime you wear blue sunglasses at night."

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T20:23:36.313Z · score: 4 (2 votes) · LW · GW

"Heterophenomenology" might be fine as a meme for encouraging certain kinds of interesting research projects, but there are several things I dislike about how Dennett uses the idea.

Mainly, it's leaning on the social standards of scientific practice, and on a definition of what "real science" or "good science" is, to argue against propositions like "any given scientist studying consciousness should take into account their own introspective data -- e.g., the apparent character of their own visual field -- in addition to verbal descriptions, as an additional fact to explain." This is meant to serve as a cudgel and bulwark against philosophers like David Chalmers, who claim that introspection reveals further facts (/data/explananda) not strictly translatable into verbal reports.

This is framing the issue as one of social-acceptability-to-the-norms-of-scientists or conformity-with-a-definition-of-"science", whereas correct versions of the argument are Bayesian. (And it's logically rude to not make the Bayesianness super explicit and clear, given the opportunity; it obscures your premises while making your argument feel more authoritative via its association with "science".)

We can imagine a weird alien race (or alien AI) that has extremely flawed sensory faculties, and very good introspection. A race like that might be able to bootstrap to good science, via leveraging their introspection to spot systematic ways in which their sensory faculties fail, and sift out the few bits of reliable information about their environments.

Humans are plausibly the opposite: as an accident of evolution, we have much more reliable sensory faculties than introspective faculties. This is a generalization from the history of science and philosophy, and from the psychology literature. Moreover, humans have a track record of being bad at distinguishing cases where their introspection is reliable from cases where it's unreliable; so it's hard to be confident of any lines we could draw between the "good introspection" and the "bad introspection". All of this is good reason to require extra standards of evidence before humanity "takes introspection at face value" and admits it into its canon of Established Knowledge.

Personally, I think consciousness is (in a certain not-clarified-here sense) an illusion, and I'm happy to express confidence that Chalmers' view is wrong. But I think Dennett has been uniquely bad at articulating the reasons Chalmers is probably wrong, often defaulting to dismissing them or trying to emphasize their social illegitimacy (as "unscientific").

The "heterophenomenology" meme strikes me as part of that project, whereas a more honest approach would say "yeah, in principle introspective arguments are totally admissible, they just have to do a bit more work than usual because we're giving them a lower prior (for reasons X, Y, Z)" and "here are specific reasons A, B, C that Chalmers' arguments don't meet the evidential bar that's required for us to take the 'autophenomenological' data at face value in this particular case".

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T19:07:08.136Z · score: 2 (1 votes) · LW · GW

Also interesting: "insistence that we be immune to skeptical arguments" and "fascination with the idea of representation/intentionality/'aboutness'" seems to have led the continental philosophers in similar directions, as in Sartre's "Intentionality: A Fundamental Idea of Husserl’s Phenomenology." But that intellectual tradition had less realism, instrumentalism, and love-of-science in its DNA, so there was less resistance to sliding toward an "everything is sort of subjective" position.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-19T17:41:40.282Z · score: 16 (4 votes) · LW · GW

Upvoted! My discussion of a bunch of these things above is very breezy, and I approve of replacing the vague claims with more specific historical ones. To clarify, here are four things I'm not criticizing:

  • 1. Eliminativism about particular mental states, of the form 'we used to think that this psychological term (e.g., "belief") mapped reasonably well onto reality, but now we understand the brain well enough to see it's really doing [description] instead, and our previous term is a misleading way of gesturing at this (or any other) mental process.'

I'm an eliminativist (or better, an illusionist) about subjectivity and phenomenal consciousness myself. (Though I think the arguments favoring that view are complicated and non-obvious, and there's no remotely intellectually satisfying illusionist account of what the things we call "conscious" really consist in.)

  • 2. In cases where the evidence for an eliminativist hypothesis isn't strong, the practice of having some research communities evaluate eliminativism or try eliminativism out and see if it leads in any productive directions. Importantly, a community doing this should treat the eliminativist view as an interesting hypothesis or an exploratory research program, not in any way as settled science (or pre-scientific axiom!).
  • 3. Demanding evidence for claims, and being relatively skeptical of varieties of evidence that have a poor track record, even if they "feel compelling".
  • 4. Demanding that high-level terms be in principle reducible to lower-level physical terms (given our justified confidence in physicalism and reductionism).

In the case of psychology, I am criticizing (and claiming really happened, though I agree that these views weren't as universal, unquestioned, and extreme as is sometimes suggested):

  • Skinner's and other behaviorists' greedy reductionism; i.e., their tendency to act like they'd reduced or explained more than they actually had. Scientists should go out of their way to emphasize the limitations and holes in their current models, and be very careful (and fully explicit about why they believe this) when it comes to claims of the form 'we can explain literally everything in [domain] using only [method].'
  • Rushing to achieve closure, dismiss open questions, forbid any expressions of confusion or uncertainty, and treat blank parts of your map as though they must correspond to a blank (or unimportant) territory. Quoting Watson (1928):
With the advent of behaviorism in 1913 the mind-body problem disappeared — not because ostrich-like its devotees hid their heads in the sand but because they would take no account of phenomena which they could not observe. The behaviorist finds no mind in his laboratory — sees it nowhere in his subjects. Would he not be unscientific if he lingered by the wayside and idly speculated upon it; just as unscientific as the biologists would be if they lingered over the contemplation of entelechies, engrams and the like. Their world and the world of the behaviorist are filled with facts — with data which can be accumulated and verified by observation — with phenomena which can be predicted and controlled.
If the behaviorists are right in their contention that there is no observable mind-body problem and no observable separate entity called mind — then there can be no such thing as consciousness and its subdivision. Freud's concept borrowed from somatic pathology breaks down. There can be no festering spot in the substratum of the mind — in the unconscious —because there is no mind.
  • More generally: overconfidence in cool new ideas, and exaggeration of what they can do.
  • Over-centralizing around an eliminativist hypothesis or research program in a way that pushes out brainstorming, hypothesis-generation, etc. that isn't easy to fit into that frame. I quote Hempel (1935) here:
[Behaviorism's] principal methodological postulate is that a scientific psychology should limit itself to the study of the bodily behavior with which man and the animals respond to changes in their physical environment, and should proscribe as nonscientific any descriptive or explanatory step which makes use of terms from introspective or 'understanding' psychology, such as 'feeling', 'lived experience', 'idea', 'will', 'intention', 'goal', 'disposition', 'represension'. We find in behaviorism, consequently, an attempt to construct a scientific psychology[.]
  • Simply put: getting the wrong answer. Some errors are more excusable than others, but even if my narrative about why they got it wrong is itself wrong, it would still be important to emphasize that they got it wrong, and could have done much better.
  • The general idea that introspection is never admissible as evidence. It's fine if you want to verbally categorize introspective evidence as 'unscientific' in order to distinguish it from other kinds of evidence, and there are some reasonable grounds for skepticism about how strong many kinds of introspective evidence are. But evidence is still evidence; a Bayesian shouldn't discard evidence just because it's hard to share with other agents.
  • The rejection of folk-psychology language, introspective evidence, or anything else for science-as-attire reasons.

Idealism emphasized some useful truths (like 'our perceptions and thoughts are all shaped by our mind's contingent architecture') but ended up in a 'wow it feels great to make minds more and more important' death spiral.

Behaviorism too emphasized some useful truths (like 'folk psychology presupposes a bunch of falsifiable things about minds that haven't all been demonstrated very well', 'it's possible for introspection to radically mislead us in lots of ways', and 'it might benefit psychology to import and emphasize methods from other scientific fields that have a better track record') but seems to me to have fallen into a 'wow it feels great to more and more fully feel like I'm playing the role of a True Scientist and being properly skeptical and cynical and unromantic about humans' trap.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-18T15:56:48.174Z · score: 4 (2 votes) · LW · GW
The same "that sounds silly" heuristic that helps you reject Berkeley's argument (when it's fringe and 'wears its absurdity on its sleeve') helps you accept 19th-century idealists' versions of the argument (when it's respectable and framed as the modern/scientific/practical/educated/consensus view on the issue).

I should also emphasize that Berkeley's idealism is very different from (e.g.) Hegel's idealism. "Idealism" comes in enough different forms that it's probably more useful for referring to a historical phenomenon than a particular ideology. (Fortunately, the former is the topic I'm interested in here.)

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-18T14:22:49.771Z · score: 2 (1 votes) · LW · GW
Berkely's argument caused a fair amount of incredulity at the time. Samuel Johnon's Argumentum ad Lapidum was intended as a reponse.

This seems like incredulity at his conclusion, rather than at his argument. Do you know of good criticisms of the master argument from the time? (Note it wasn't given a standard name until the 1970s.)

To be clear, I think Berkeley was near-universally rejected at the time, because his conclusion ('there's no material world') was so wild. Most people also didn't understand what Berkeley was saying, even though he was pretty clear about it (see: Kant's misunderstanding; and the above fallacious counterargument, assuming it wasn't just a logically rude joke on Johnson's part).

But I don't update positively about people for rejecting silly-sounding conclusions just based on how silly they sound. The same "that sounds silly" heuristic that helps you reject Berkeley's argument (when it's fringe and 'wears its absurdity on its sleeve') helps you accept 19th-century idealists' versions of the argument (when it's respectable and framed as the modern/scientific/practical/educated/consensus view on the issue).

BTW, I notice that a lot of people here are persuaded by Aumann's Agreement Theorem, which is every bit as flawed in my view.

Flawed how?

Comment by robbbb on Is value amendment a convergent instrumental goal? · 2019-10-18T06:22:09.105Z · score: 11 (6 votes) · LW · GW

"Avoiding amending your utility function" is one of the classic convergent instrumental goals in Bostrom and Omohundro, and the reasoning there is sound: almost any goal will be better satisfied if it preserves itself than if it replaces itself with a different goal.

I do think it's plausible that AGI systems will have pretty unstable goals early on, but that's because goal stability seems hard to me and AGI systems probably won't perfectly figure it out very early along their development curve. I'm imagining accidental goal modification (for insufficiently capable systems), whereas you're describing deliberate goal modification (for sufficiently capable systems).

One way of thinking about this is to note that "wanting your goals to not be externally supplied" is itself a goal, and a relatively specific one at that; if you don't have something like that specific goal as part of the core criteria you use to select options, there's no instrumental reason for you to converge upon it. E.g., if your goal is simply "maximize the number of paperclips in your future light cone," then the etiology of your goal doesn't matter (from your perspective).

Comment by robbbb on What's going on with "provability"? · 2019-10-14T20:55:44.523Z · score: 4 (3 votes) · LW · GW

Ike is responding to this:

Gödel: What could it mean for a statement to be "true but not provable"? Is this just because there are some statements such that neither P nor not-P can be proven, yet one of them must be true? If so, I would (stubbornly) contest that perhaps P and not-P really are both non-true.

"P and not-P really are both non-true" is classically false, and Gödel holds in classical mathematics, so Evan's response isn't available in that case.

Evan's sense that "perhaps P and not-P really are both non-true" might be a reason for him to endorse intuitionism as "more correct" than classical math in some sense.

Comment by robbbb on What's going on with "provability"? · 2019-10-13T20:46:33.539Z · score: 5 (3 votes) · LW · GW

Proofs, Implications, and Models introduces some of these ideas more slowly. Other stuff from the Highly Advanced Epistemology 101 for Beginners is relevant too, and includes more realism-flavored concerns about choosing between systems.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-13T20:33:18.068Z · score: 2 (1 votes) · LW · GW
My claim is that instrumentalism is the correct metaphysics regardless

What does it mean for instrumentalism to be the correct metaphysics? Normally, I'd interpret "the correct metaphysics" as saying something basic about reality or the universe. (Or, if you're an instrumentalist and you say "X is the correct metaphysics", I'd assume you were saying "it's useful to have a model that treats X as a basic fact about reality or the universe", which also doesn't make sense to me if X is "instrumentalism".)

Although it is also true that if you try interpreting quantum mechanics according to sufficiently strong realist desiderata

Well, sufficiently specific realist desiderata. Adding hidden variables to QM doesn't make the theory any more realist, the way we're using "realist" here.

Comment by robbbb on What's going on with "provability"? · 2019-10-13T12:39:04.807Z · score: 3 (2 votes) · LW · GW

A non-technical summary of how arithmetization is used in this argument: https://plato.stanford.edu/entries/goedel-incompleteness/#AriForLan

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-13T04:20:49.488Z · score: 6 (3 votes) · LW · GW

I think my characterization is accurate, but maybe guilty of weak-manning: I'm recounting a salient recent (long) conversation with laypeople, rather than attempting a representative survey of non-realists or trying to find the best proponents.

I had in mind a small social gathering I attended (without any conscious effort to seek out and find non-realists) where most of the people in the room voiced disagreement with my claim that truth is a coherent idea, that some entities aren't social or psychological constructs, that some methods for learning things are more objective/reasonable/justified than others, and so on.

I tried to find common ground on the most basic claims I could think of, like "OK, but we can at least agree that something is real, right? There's, like, stuff actually going on?" I wasn't successful. And I think I'm pretty good at not straw-manning people on these issues; I'm used to drawing pretty fine distinctions between pretty out-there ontological and epistemological views. (E.g., I'm perfectly happy to try to tease apart the nuances of thinkers like Parmenides, Nagarjuna, Zhuangzi, Sextus, William James, Dharmakirti, Schopenhauer, Jonathan Schaffer, Graham Priest, Sartre, Berkeley. This stuff is interesting, even if I put no stock in it.)

To my ear, "it pays to think in terms other than reality/truth sometimes" sound too weak on its own to count as 'anti-realism'. If I think it's ever (cognitively?) useful to read fiction, or explore fake frameworks, or just take a nap and clear my head, that already seems to qualify. I'm happy to hear more about what you have in mind, though, regardless of what labels fit best.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T22:05:24.268Z · score: 4 (2 votes) · LW · GW
This post was educational, however, I want to push back against the implicit criticism of instrumentalism and the Copenhagen interpretation. The metaphilosophical position I will use here is: to solve a philosophical question, we need to rephrase it as a question about AI design

Maybe the problem is that I'm not sufficiently convinced that there's a philosophical question here. Sometimes philosophers (and even physicists) argue about things that aren't open questions. "Do refrigerators exist, or only mental models of refrigerators?" sounds like a straightforward, testable empirical question to me, with all the evidence favoring "refrigerators exist".

I predict I'm missing an implicit premise explaining why "I don't currently understand where the Born rule comes from" is a bigger problem for realism than "I don't currently understand how my refrigerator works", or some other case where realism makes things unnecessarily hard/confusing, like infinite ethics or anthropics or somesuch.

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T21:43:50.501Z · score: 4 (2 votes) · LW · GW

I should note that Russell was also led in some pretty weird directions by the desire to resist skeptical arguments:

What distinguishes neutral monism from its monistic rivals is the claim that the intrinsic nature of ultimate reality is neither mental nor physical. [...] Following a series of critical engagements with neutral monism (see especially Russell 1914a,b), Russell adopted it in Russell 1919 and remained a neutral monist for the rest of his long career: “I am conscious of no major change in my opinions since the adoption of neutral monism” is what he says in an interview from 1964 (Eames 1969: 108). [...]
For an entity to be neutral is to “have neither the hardness and indestructibility of matter, nor the reference to objects which is supposed to characterize the mind” (Russell 1921: 36; cf. 124). Russell never suspected sensations of being material (in this sense). That sensations are mental (in this sense)—that they consist of a mental act of sensing directed at a non-mental object—was, however, a pivotal part of his earlier view. But then his views changed:
"I formerly believed that my own inspection showed me the distinction between a noise [the object] and my hearing of a noise [the act of sensing], and I am now convinced that it shows me no such thing, and never did." (Russell 1918b: 255)

Also, insofar as Moore and Russell are gesturing at similar issues, Moore's paper provides some support for the claim that master-argument-ish reasoning was central to the idealism and rested on a simple error that was concealed by motivated reasoning and obfuscatory language, and that no one noticed (or successfully popularized) the error prior to Moore/Russell:

I am suggesting that the Idealist maintains that object and subject are necessarily connected, mainly because he fails to see that they are distinct, that they are two, at all. When he thinks of 'yellow' and when he thinks of the 'sensation of yellow', he fails to see that there is anything whatever in the latter which is not in the former. [...]
But I am well aware that there are many Idealists who would repel it as an utterly unfounded charge that they fail to distinguish between a sensation or idea and what I will call its object. And there are, I admit, many who not only imply, as we all do, that green is distinct from the sensation of green, but expressly insist upon the distinction as an important part of their system. They would perhaps only assert that the two form an inseparable unity.
But I wish to point out that many, who use this phrase, and who do admit the distinction, are not thereby absolved from the charge that they deny it. For there is a certain doctrine, very prevalent among philosophers nowadays, which by a very simple reduction may be seen to assert that two distinct things both are and are not distinct. A distinction is asserted; but it is also asserted that the things distinguished form an 'organic unity'. But, forming such a unity, it is held, each would not be what it is apart from its relation to the other. Hence to consider either by itself is to make an illegitimate abstraction.
The recognition that there are 'organic unities' and 'illegitimate abstractions' in this sense is regarded as one of the chief conquests of modern philosophy. But what is the sense attached to these terms? An abstraction is illegitimate, when and only when we attempt to assert of a part - of something abstracted - that which is true only of the whole to which it belongs: and it may perhaps be useful to point out that this should not be done. But the application actually made of this principle, and what perhaps would be expressly acknowledged as its meaning, is something much the reverse of useful. The principle is used to assert that certain abstractions are in all cases illegitimate; that whenever you try to assert anything whatever of that which is part of an organic whole, what you assert can only be true of the whole. And this principle, so far from being a useful truth, is necessarily false. For if the whole can, nay must, be substituted for the part in all propositions and for all purposes, this can only be because the whole is absolutely identical with the part.
When, therefore, we are told that green and the sensation of green are certainly distinct but yet are not separable, or that it is an illegitimate abstraction to consider the one apart from the other, what these provisos are used to assert is, that though the two things are distinct yet you not only can but must treat them as if they were not. Many philosophers, therefore, when they admit a distinction, yet (following the lead of Hegel) boldly assert their right, in a slightly more obscure form of words, also to deny it. The principle of organic unities, like that of combined analysis and synthesis, is mainly used to defend the practice of holding both of two contradictory propositions, wherever this may seem convenient.
In this, as in other matters, Hegel's main service to philosophy has consisted in giving a name to and erecting into a principle, a type of fallacy to which experience had shown philosophers, along with the rest of mankind, to be addicted. No wonder that he has followers and admirers. [...]
And at this point I need not conceal my opinion that no philosopher has ever yet succeeded in avoiding this self-contradictory error: that the most striking results both of Idealism and of Agnosticism are only obtained by identifying blue with the sensation of blue: that esse ["existing"] is held to be percipi ["being perceived"], solely because what is experienced is held to be identical with the experience of it. That Berkeley and Mill committed this error will, perhaps, be granted: that modern Idealists make it will, I hope, appear more probable later.

This updates me partway back toward the original claim I made (that Berkeley's master argument was causally important for the rise of idealism and its 20th-century successors).

Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T21:19:42.796Z · score: 9 (2 votes) · LW · GW
Isn't that the same argument Russell was making?

They're... similar? I find Russell a lot clearer on this point:

We might state the argument by which they support their view in some such way as this: 'Whatever can be thought of is an idea in the mind of the person thinking of it; therefore nothing can be thought of except ideas in minds; therefore anything else is inconceivable, and what is inconceivable cannot exist.'
Such an argument, in my opinion, is fallacious; and of course those who advance it do not put it so shortly or so crudely. But whether valid or not, the argument has been very widely advanced in one form or another; and very many philosophers, perhaps a majority, have held that there is nothing real except minds and their ideas.

I find Moore's way of discussing the issue weirder. Moore is definitely making an argument from 'a thought's object is different from the thought itself' to 'idealism is false', but his argument seems to involve weird steps like 'our experiences don't have contents' (rather than the expected 'the content of our experience is different from its referent'):

What I wish to point out is (1) that we have no reason for supposing that there are such things as mental images at all -- for supposing that blue is part of the content of the sensation of blue, and (2) that even if there are mental images, no mental image and no sensation or idea is merely a thing of this kind; that 'blue', even if it is part of the content of the image or sensation or idea of blue, is always also related to it in quite another way, and that this other relation, omitted in the traditional analysis, is the only one which makes the sensation of blue a mental fact at all. [...]
To have in your mind 'knowledge' of blue, is not to have in your mind a 'thing' or 'image' of which blue is the content. To be aware of the sensation of blue is not to be aware of a mental image - of a 'thing', of which 'blue' and some other element are constituent parts in the same sense in which blue and glass are constituents of a blue bead. It is to be aware of an awareness of blue; awareness being used, in both cases, in exactly the same sense. This element, we have seen, is certainly neglected by the 'content' theory: that theory entirely fails to express the fact that there is, in the sensation of blue, this unique relation between blue and the other constituent.

Baldwin (2004) confirms that this line of reasoning, plus Moore's attempt to resist skeptical hypotheses, led Moore in a very confused direction:

But what is the relationship between sense-data [i.e., the thingies we're directly conscious of] and physical objects? Moore took it that there are three serious candidates to be considered: (i) an indirect realist position, according to which sense-data are non-physical but somehow produced by interactions between physical objects and our senses; (ii) the phenomenalist position, according to which our conception of physical objects is merely one which expresses observed and anticipated uniformities among the sense-data we apprehend; (iii) a direct realist position, according to which sense-data are parts of physical objects — so that, for example, visual sense-data are visible parts of the surfaces of physical objects.
The indirect realist position is that to which he was initially drawn; but he could see that it leaves our beliefs about the physical world exposed to skeptical doubt, since it implies that the observations which constitute evidence for these beliefs concern only the properties of non-physical sense-data, and there is no obvious way for us to obtain further evidence to support a hypothesis about the properties of the physical world and its relationship to our sense-data.
This argument is reminiscent of Berkeley's critique of Locke, and Moore therefore considered carefully Berkeley's phenomenalist alternative. Moore's initial response to this position was that the implied conception of the physical world was just too ‘pickwickian’ to be believable. This may be felt to be too intuitive, like Dr. Johnson's famous objection to Berkeley; but Moore could also see that there were substantive objections to the phenomenalist position, such as the fact that our normal ways of identifying and anticipating significant uniformities among our sense-data draw on our beliefs about our location in physical space and the state of our physical sense-organs, neither of which are available to the consistent phenomenalist.
So far Moore's dialectic is familiar. What is unfamiliar is his direct realist position, according to which sense-data are physical. This position avoids the problems so far encountered, but in order to accommodate false appearances Moore has to allow that sense-data may lack the properties which we apprehend them as having. It may be felt that in so far as sense-data are objects at all, this is inevitable; but Moore now needs to provide an account of the apparent properties of sense-data and it is not clear how he can do this without going back on the initial motivation for the sense-datum theory by construing these apparent properties as properties of our experiences. But what in fact turns Moore against this direct realist position is the difficulty he thinks it leads to concerning the treatment of hallucinations. In such cases, Moore holds, any sense-data we apprehend are not parts of a physical object; so direct realism cannot apply to them, and yet there is no reason to hold that they are intrinsically different from the sense-data which we apprehend in normal experience. This last point might well be disputed, and at one point Moore himself considers the possibility of a distinction between ‘subjective’ and ‘objective’ sense-data; but once one has introduced sense-data in the first place as the primary objects of experience it is not going to be easy to make a distinction here without assuming more about experience than Moore at any rate would have wanted to concede.
Moore wrote more extensively about perception than about any other topic. In these writings he moves between the three alternatives set out here without coming to any firm conclusion.
Comment by robbbb on A simple sketch of how realism became unpopular · 2019-10-12T04:17:30.587Z · score: 5 (3 votes) · LW · GW

Re "was Berkeley making such an obvious mistake?", I think this is historians' majority view, but multiple people have tried to come up with more reasonable versions of the argument; see Gallois (1974) and Downing (2011). Note that Berkeley makes the same argument in dialogue form here (starts at "How say you, Hylas, can you see a thing which is at the same time unseen?"), so you can check if you find that version more tenable.

The Bloomsbury Companion to Berkeley says:

[This passage] can be interpreted as making a straightforward howler, arguing that because whenever you think of something it is being thought of and anything being thought of is, ipso facto, 'in the mind' then you cannot think of something that is not in the mind. According to Russell, this was a keystone for idealism and it involves a simple mistake.

"Berkeley's view . . . seems to depend for its plausibility upon confusing the thing apprehended with the act of apprehension. Either of these might be called an 'idea'; probably either would have been called an idea by Berkeley. The act is undoubtedly in the mind; hence, when we are thinking of the act, we readily assent to the view that ideas must be in the mind. Then, forgetting that this was only true when ideas were taken as acts of apprehension, we transfer the proposition that 'ideas are in the mind' to ideas in the other sense, i.e. to the things apprehended by our acts of apprehension. Thus, by an unconscious equivocation, we arrive at the conclusion that whatever we apprehend must be in our minds."
1912/1967,22

Russell's criticism is in line with Moore's famous 'The Refutation of Idealism' (1903), where he argues that if one recognizes the act-object distinction within conscious states, one can see that the object is independent of the act. This 'discovery', together with the development of a formal logic for relations, was the cornerstone of the rejection of 'British idealism'. If objects can be conceived of as independent of conscious thought, and if it is consistent to think of them as in actually related to each other, the mentalistic holism that was contemporary idealism is demolished.

That said, I put a lot of weight on Allen Wood's view as a leading Kant scholar, and revisiting his book Kant, he doesn't think Kant accepted the master argument (p. 69). David (2015) asserts a link, but it looks tenuous to me.

Kant's earliest interpreters took him to be saying "trees, oceans, etc. all just exist in your head and have nothing in common with the mysterious ineffable things-in-themselves", and Kant definitely talks like that a great deal, but he also says a lot that contradicts that view. Wood thinks Kant was just really confused and fuzzy about his own view, and didn't have a consistent model here (pp. 63-71).

My new pet theory is that Kant was being pulled in one direction by "wanting to make things as subjective as possible so he can claim more epistemic immediacy and therefore more immunity to skeptical arguments", and in the opposite direction by "not wanting to sound like a crazy person like Berkeley", so we get inconsistencies.

I don't know who, if anyone, noted the obvious fallacy in Berkeley's master argument prior to Russell in 1912, and Russell seems to think the argument was central to idealism's appeal. Regardless, my new view is: philosophy mainly ended up going down an idealist cul-de-sac because Kant shared Berkeley's "try to find ways to treat more things as subjective" approach to defeating skepticism. (Possibly without realizing it; Stang (2016) suggests Kant was pretty confused about what Berkeley believed.) Then Kant and Hegel built sufficiently dense, mysterious, and complicated intellectual edifices that it was easy for them to confuse themselves and others, while still being brilliant, innovative, and internally consistent enough to attract a lot of followers.

Comment by robbbb on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-11T13:44:01.380Z · score: 4 (3 votes) · LW · GW

I downvoted TAG's comment because I found it confusing/misleading. I can't tell which of these things TAG's trying to do:

  • Assert, in a snarky/indirect way, that people agitating about AI safety have no overlap with AI researchers. This seems doubly weird in a conversation with Stuart Russell.
  • Suggest that LeCun believes this. (??)
  • Assert that LeCun doesn't mean to discourage Russell's research. (But the whole conversation seems to be about what kind of research people should be doing when in order to get good outcomes from AI.)
Comment by robbbb on Misconceptions about continuous takeoff · 2019-10-09T16:50:18.826Z · score: 15 (5 votes) · LW · GW
My intuition is that it'd probably be pretty easy to create an aligned superhuman AI if we knew how to create non-singular, mis-aligned superhuman AIs, and had cheap, robust methods to tell if a particular AI was misaligned.

This sounds different from how I model the situation; my views agree here with Nate's (emphasis added):

I would rephrase 3 as "There are many intuitively small mistakes one can make early in the design process that cause resultant systems to be extremely difficult to align with operators’ intentions.” I’d compare these mistakes to the “small” decision in the early 1970s to use null-terminated instead of length-prefixed strings in the C programming language, which continues to be a major source of software vulnerabilities decades later.
I’d also clarify that I expect any large software product to exhibit plenty of actually-trivial flaws, and that I don’t expect that AGI code needs to be literally bug-free or literally proven-safe in order to be worth running. Furthermore, if an AGI design has an actually-serious flaw, the likeliest consequence that I expect is not catastrophe; it’s just that the system doesn’t work. Another likely consequence is that the system is misaligned, but in an obvious ways that makes it easy for developers to recognize that deployment is a very bad idea. The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

My current model of 'the default outcome if the first project to develop AGI is highly safety-conscious, is focusing on alignment, and has a multi-year lead over less safety-conscious competitors' is that the project still fails, because their systems keep failing their tests but they don't know how to fix the deep underlying problems (and may need to toss out years of work and start from scratch in order to have a real chance at fixing them). Then they either (a) lose their lead, and some other project destroys the world; (b) decide they have to ignore some of their tests, and move ahead anyway; or (c) continue applying local patches without understanding or fixing the underlying generator of the test failures, until they or their system find a loophole in the tests and sneak by.

I don't think any of this is inevitable or impossible to avoid; it's just the default way I currently visualize things going wrong for AGI developers with a strong interest in safety and alignment.

Possibly you'd want to rule out (c) with your stipulation that the tests are "robust"? But I'm not sure you can get tests that robust. Even in the best-case scenario where developers are in a great position to build aligned AGI and successfully do so, I'm not imagining post-hoc tests that are robust to a superintelligence trying to game them. I'm imagining that the developers have a prior confidence from their knowledge of how the system works that every part of the system either lacks the optimization power to game any relevant tests, or will definitely not apply any optimization to trying to game them.

Comment by robbbb on FB/Discord Style Reacts · 2019-10-04T01:14:46.041Z · score: 4 (3 votes) · LW · GW

Another idea, maybe harder to implement: allow users to start a private chat with the anonymous user who left a reaction. I think in general these kinds of issues are often best resolved through one-on-one chat, and even if the anon chooses not to reply, people might feel less helpless/disempowered if they can reply in some fashion and know their critic is likely to see what they think.

If LW or the EA Forum tried something like this (which might also be helpful in some form even for downvotes), you'd probably want to make the expected discourse norms of these chats extra-prominent in the UI, to reduce the risk of bad interactions (and explain why mods may need to read the private messages if there's a worry about e.g. private verbal abuse going on).

Comment by robbbb on AI Alignment Open Thread August 2019 · 2019-10-01T00:04:49.551Z · score: 4 (2 votes) · LW · GW
Or do you think the discontinuity will be more in the realm of embedded agency style concerns (and how does this make it less safe, instead of just dysfunctional?)

This in particular doesn't match my model. Quoting some relevant bits from Embedded Agency:

So I'm not talking about agents who know their own actions because I think there's going to be a big problem with intelligent machines inferring their own actions in the future. Rather, the possibility of knowing your own actions illustrates something confusing about determining the consequences of your actions—a confusion which shows up even in the very simple case where everything about the world is known and you just need to choose the larger pile of money.
[...]
But it’s not that I’m imagining real-world embedded systems being “too Bayesian” and this somehow causing problems, if we don’t figure out what’s wrong with current models of rational agency. It’s certainly not that I’m imagining future AI systems being written in second-order logic! In most cases, I’m not trying at all to draw direct lines between research problems and specific AI failure modes.
What I’m instead thinking about is this: We sure do seem to be working with the wrong basic concepts today when we try to think about what agency is, as seen by the fact that these concepts don’t transfer well to the more realistic embedded framework.

This is also the topic of The Rocket Alignment Problem.

Comment by robbbb on Follow-Up to Petrov Day, 2019 · 2019-09-29T18:37:24.222Z · score: 10 (6 votes) · LW · GW

FWIW, I thought the ritual this year was fine and I'm not sure adding a cash prize to the ritual itself will be communicating the right lesson. It then starts to feel like a ritual about 'do we care more about symbolism than about saving lives?', rather than a ritual about coordination.

Comment by robbbb on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T18:00:18.927Z · score: 6 (3 votes) · LW · GW

No. This is maybe clearer given the parenthetical I edited in. Speaking for myself, Critch's recommendations in https://www.lesswrong.com/posts/7uJnA3XDpTgemRH2c/critch-on-career-advice-for-junior-ai-x-risk-concerned seemed broadly reasonable to me, though I'm uncertain about those too and I don't know of a 'MIRI consensus view' on Critch's suggestions.

I feel pretty confident about "this is a line of thinking that's reasonable and healthy to be able to entertain, alongside lots of other complicated case-by-case factors that all need to be weighed by each actor", and then I don't know how to translate that into concrete recommendations for arbitrary LW users.

Comment by robbbb on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T02:11:40.326Z · score: 2 (1 votes) · LW · GW
It seems to me if you’re someone who has done a PhD in ML or is very good at ML, but you currently can’t get a position that seems especially safety-focused or that is going to disproportionately affect safety more than capabilities, it is probably still good to take a job that just advances AI in general, mostly because you’ll be reaching the cutting edge potentially of what’s going on and improving your career capital a lot and having relevant understanding.

(The following is an off-the-cuff addition that occurred to me while reading this -- it's something I've thought about frequently, but it's intended as something to chew on, not as an endorsement or disavowal of any specific recommendation by Rob W or Paul above.)

The cobbled-together model of Eliezer in my head wants to say something like: 'In the Adequate World, the foremost thing in everyone's heads is "I at least won't destroy the world by my own hands", because that's the bare-minimum policy each individual would want everyone else to follow. This should probably also be in the back of everyone's heads in the real world, at least as a weight on the scale and a thought that's fine and not-arrogant to factor in.'

Comment by robbbb on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T01:49:45.953Z · score: 2 (1 votes) · LW · GW

Note that although my views are much closer to Paul’s than to Pushmeet’s here, I’m posting this because I found it a useful summary of some ML perspectives and disagreements on AI safety, not because I’m endorsing the claims above.

Some disagreements that especially jumped out at me: I'd treat it as a negative update if I learned that AI progress across the board had sped up, and I wouldn't agree with "even absent the actions of the longtermists, there’s a reasonably good chance that everything would just be totally fine".

Comment by robbbb on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-28T23:57:36.962Z · score: 7 (4 votes) · LW · GW

Seems like unilateralism and coordination failure is a good way of summing up humanity's general plight re nuclear weapons, which makes it relevant to a day called "Petrov Day" in a high-level way. Putting the emphasis here makes the holiday feel more like "a holiday about x-risk and a thanksgiving for our not having died to nuclear war", and less like "a holiday about the virtues of Stanislav Petrov and emulating his conduct".

If Petrov's decision was correct, or incorrect-but-reflecting-good-virtues, the relevant virtue is something like "heroic responsibility", not "refusal to be a unilateralist". I could imagine a holiday that focuses instead on heroic responsibility, or that has a dual focus. ('Lord, grant me the humility to cooperate in good equilibria, the audacity to defect from bad ones, and the wisdom to know the difference.') I'm not sure which of these options is most useful.

Comment by robbbb on The Zettelkasten Method · 2019-09-23T19:35:22.493Z · score: 2 (1 votes) · LW · GW

Hmm, now I want to try this with a wiki with a precommitment to stick to a certain word count and hierarchical organization.

Comment by robbbb on Rob B's Shortform Feed · 2019-09-23T16:56:36.999Z · score: 17 (5 votes) · LW · GW

[Epistemic status: Thinking out loud]

If the evolutionary logic here is right, I'd naively also expect non-human animals to suffer more to the extent they're (a) more social, and (b) better at communicating specific, achievable needs and desires.

There are reasons the logic might not generalize, though. Humans have fine-grained language that lets us express very complicated propositions about our internal states. That puts a lot of pressure on individual humans to have a totally ironclad, consistent "story" they can express to others. I'd expect there to be a lot more evolutionary pressure to actually experience suffering, since a human will be better at spotting holes in the narratives of a human who fakes it (compared to, e.g., a bonobo trying to detect whether another bonobo is really in that much pain).

It seems like there should be an arms race across many social species to give increasingly costly signals of distress, up until the costs outweigh the amount of help they can hope to get. But if you don't have the language to actually express concrete propositions like "Bob took care of me the last time I got sick, six months ago, and he can attest that I had a hard time walking that time too", then those costly signals might be mostly or entirely things like "shriek louder in response to percept X", rather than things like "internally represent a hard-to-endure pain-state so I can more convincingly stick to a verbal narrative going forward about how hard-to-endure this was".

Comment by robbbb on Rob B's Shortform Feed · 2019-09-23T16:45:59.284Z · score: 17 (5 votes) · LW · GW

Rolf Degen, summarizing part of Barbara Finlay's "The neuroscience of vision and pain":

Humans may have evolved to experience far greater pain, malaise and suffering than the rest of the animal kingdom, due to their intense sociality giving them a reasonable chance of receiving help.

From the paper:

Several years ago, we proposed the idea that pain, and sickness behaviour had become systematically increased in humans compared with our primate relatives, because human intense sociality allowed that we could ask for help and have a reasonable chance of receiving it. We called this hypothesis ‘the pain of altruism’ [68]. This idea derives from, but is a substantive extension of Wall’s account of the placebo response [43]. Starting from human childbirth as an example (but applying the idea to all kinds of trauma and illness), we hypothesized that labour pains are more painful in humans so that we might get help, an ‘obligatory midwifery’ which most other primates avoid and which improves survival in human childbirth substantially ([67]; see also [69]). Additionally, labour pains do not arise from tissue damage, but rather predict possible tissue damage and a considerable chance of death. Pain and the duration of recovery after trauma are extended, because humans may expect to be provisioned and protected during such periods. The vigour and duration of immune responses after infection, with attendant malaise, are also increased. Noisy expression of pain and malaise, coupled with an unusual responsivity to such requests, was thought to be an adaptation.
We noted that similar effects might have been established in domesticated animals and pets, and addressed issues of ‘honest signalling’ that this kind of petition for help raised. No implication that no other primate ever supplied or asked for help from any other was intended, nor any claim that animals do not feel pain. Rather, animals would experience pain to the degree it was functional, to escape trauma and minimize movement after trauma, insofar as possible.

Finlay's original article on the topic: "The pain of altruism".

Comment by robbbb on The Zettelkasten Method · 2019-09-22T17:10:23.789Z · score: 6 (3 votes) · LW · GW

Notably, people could commit in this thread to trying this method for some length of time and then writing up their experience with it for LW. That would help address some of the obvious selection effect.

Comment by robbbb on Question about a past donor to MIRI. · 2019-09-21T03:44:52.205Z · score: 16 (4 votes) · LW · GW

(I work at MIRI.)

News that SIAI received funds from Epstein actually came as a surprise to us. (We found out about this a few days before OP's question went up.) Epstein had previously approached us in 2016 looking for organizations to donate to, and we decided against pursuing the option; we didn't realize there was any previous interaction between MIRI/SIAI and Epstein or his foundations.

The 2009 donation was brought to our attention when someone sent us a Miami Herald article that included SIAI in a spreadsheet of organizations that received money from one of Epstein's foundations. We couldn't initially find evidence of the donation in our records, so we had to go digging a bit; it was apparently seed money for OpenCog while they were getting up and running rather than money for "real" SIAI stuff, hence current staff being out of the loop.

Comment by robbbb on Realism and Rationality · 2019-09-16T16:16:29.001Z · score: 6 (3 votes) · LW · GW

Sayre-McCord in SEP's "Moral Realism" article:

Moral realists are those who think that [...] moral claims do purport to report facts and are true if they get the facts right. Moreover, they hold, at least some moral claims actually are true. [...]
As a result, those who reject moral realism are usefully divided into (i) those who think moral claims do not purport to report facts in light of which they are true or false (noncognitivists) and (ii) those who think that moral claims do carry this purport but deny that any moral claims are actually true (error theorists).

Joyce in SEP's "Moral Anti-Realism" article:

Traditionally, to hold a realist position with respect to X is to hold that X exists in a mind-independent manner (in the relevant sense of “mind-independence”). On this view, moral anti-realism is the denial of the thesis that moral properties—or facts, objects, relations, events, etc. (whatever categories one is willing to countenance)—exist mind-independently. This could involve either (1) the denial that moral properties exist at all, or (2) the acceptance that they do exist but that existence is (in the relevant sense) mind-dependent. Barring various complications to be discussed below, there are broadly two ways of endorsing (1): moral noncognitivism and moral error theory. Proponents of (2) may be variously thought of as moral non-ojectivists, or idealists, or constructivists.

So, everyone defines "non-realism" so as to include error theory and non-cognitivism; some people define it so as to also include all or most views on which moral properties are in some sense "subjective."

These ambiguities seem like good reasons to just avoid the term "realism" and talk about more specific positions, though I guess it works to think about a sliding scale where substantive realism is at one extreme, error theory and non-cognitivism are at the other extreme, and remaining views are somewhere in the middle.

Comment by robbbb on Realism and Rationality · 2019-09-16T16:00:47.936Z · score: 8 (4 votes) · LW · GW

Example: Eliezer's Extrapolated Volition is easy to round off to "constructivism", By Which It May Be Judged to "substantive realism", and Orthogonality Thesis and The Gift We Give To Tomorrow to "subjectivism". I'm guessing it's not a coincidence that those are also the most popular answers in the poll above, and that no one of them has majority support.

(Though I don't think I could have made a strong prediction like this a priori. If non-cognitivism or error theory had done better, someone could have said "well, of course!", citing LessWrong's interest in signaling or their general reductionist/eliminativist/anti-supernaturalist tendencies.)

Comment by robbbb on Realism and Rationality · 2019-09-16T15:33:10.101Z · score: 20 (9 votes) · LW · GW

The most popular meta-ethical views on LessWrong seem to be relatively realist ones, with views like non-cognitivism and error theory getting significantly less support. From the 2016 LessWrong diaspora survey (excluding people who didn't pick one of the options):

  • 772 respondents (39.5%) voted for "Constructivism: Some moral statements are true, and the truth of a moral statement is determined by whether an agent would accept it if they were undergoing a process of rational deliberation. 'Murder is wrong' can mean something like 'Societal agreement to the rule "do
    not murder" is instrumentally rational'."
  • 550 respondents (28.2%) voted for "Subjectivism: Some moral statements are true, but not universally, and the truth of a moral statement is determined by non-universal opinions or prescriptions, and there is no nonattitudinal determinant of rightness and wrongness. 'Murder is wrong' means something like
    'My culture has judged murder to be wrong' or 'I've judged murder to be wrong'."
  • 346 respondents (17.7%) voted for "Substantive realism: Some moral statements are true, and the truth of a moral statement is determined by mind-independent moral properties. 'Murder is wrong' means that murder has an objective mind-independent property of wrongness that we discover by empirical investigation, intuition, or some other method."
  • 186 respondents (9.5%) voted for "Non-cognitivism: Moral statements don't express propositions and can neither be true nor false. 'Murder is wrong' means something like 'Boo murder!'."
  • 99 respondents (5.1%) voted for "Error theory: Moral statements have a truth-value, but attempt to describe features of the world that don't exist. 'Murder is wrong' and 'Murder is right' are both false statements because moral rightness and wrongness aren't features that exist."

I suspect that a lot of rationalists would be happy to endorse any of the above five views in different contexts or on different framings, and would say that real-world moral judgment is complicated and doesn't cleanly fit into exactly one of these categories. E.g., I think Luke Muehlhauser's Pluralistic Moral Reductionism is just correct.

Comment by robbbb on Rob B's Shortform Feed · 2019-09-08T20:18:23.896Z · score: 2 (1 votes) · LW · GW

Old discussion of this on LW: https://www.lesswrong.com/s/fqh9TLuoquxpducDb/p/synsRtBKDeAFuo7e3

Comment by robbbb on Rob B's Shortform Feed · 2019-09-08T02:58:32.070Z · score: 23 (7 votes) · LW · GW

Facebook comment I wrote in February, in response to the question 'Why might having beauty in the world matter?':

I assume you're asking about why it might be better for beautiful objects in the world to exist (even if no one experiences them), and not asking about why it might be better for experiences of beauty to exist.

[... S]ome reasons I think this:

1. If it cost me literally nothing, I feel like I'd rather there exist a planet that's beautiful, ornate, and complex than one that's dull and simple -- even if the planet can never be seen or visited by anyone, and has no other impact on anyone's life. This feels like a weak preference, but it helps get a foot in the door for beauty.

(The obvious counterargument here is that my brain might be bad at simulating the scenario where there's literally zero chance I'll ever interact with a thing; or I may be otherwise confused about my values.)

2. Another weak foot-in-the-door argument: People seem to value beauty, and some people claim to value it terminally. Since human value is complicated and messy and idiosyncratic (compare person-specific ASMR triggers or nostalgia triggers or culinary preferences) and terminal and instrumental values are easily altered and interchanged in our brain, our prior should be that at least some people really do have weird preferences like that at least some of the time.

(And if it's just a few other people who value beauty, and not me, I should still value it for the sake of altruism and cooperativeness.)

3. If morality isn't "special" -- if it's just one of many facets of human values, and isn't a particularly natural-kind-ish facet -- then it's likelier that a full understanding of human value would lead us to treat aesthetic and moral preferences as more coextensive, interconnected, and fuzzy. If I can value someone else's happiness inherently, without needing to experience or know about it myself, it then becomes harder to say why I can't value non-conscious states inherently; and "beauty" is an obvious candidate. My preferences aren't all about my own experiences, and they aren't simple, so it's not clear why aesthetic preferences should be an exception to this rule.

4. Similarly, if phenomenal consciousness is fuzzy or fake, then it becomes less likely that our preferences range only and exactly over subjective experiences (or their closest non-fake counterparts). Which removes the main reason to think unexperienced beauty doesn't matter to people.

Combining the latter two points, and the literature on emotions like disgust and purity which have both moral and non-moral aspects, it seems plausible that the extrapolated versions of preferences like "I don't like it when other sentient beings suffer" could turn out to have aesthetic aspects or interpretations like "I find it ugly for brain regions to have suffering-ish configurations".

Even if consciousness is fully a real thing, it seems as though a sufficiently deep reductive understanding of consciousness should lead us to understand and evaluate consciousness similarly whether we're thinking about it in intentional/psychologizing terms or just thinking about the physical structure of the corresponding brain state. We shouldn't be more outraged by a world-state under one description than under an equivalent description, ideally.

But then it seems less obvious that the brain states we care about should exactly correspond to the ones that are conscious, with no other brain states mattering; and aesthetic emotions are one of the main ways we relate to things we're treating as physical systems.

As a concrete example, maybe our ideal selves would find it inherently disgusting for a brain state that sort of almost looks conscious to go through the motions of being tortured, even when we aren't the least bit confused or uncertain about whether it's really conscious, just because our terminal values are associative and symbolic. I use this example because it's an especially easy one to understand from a morality- and consciousness-centered perspective, but I expect our ideal preferences about physical states to end up being very weird and complicated, and not to end up being all that much like our moral intuitions today.

Addendum: As always, this kind of thing is ridiculously speculative and not the kind of thing to put one's weight down on or try to "lock in" for civilization. But it can be useful to keep the range of options in view, so we have them in mind when we figure out how to test them later.

Comment by robbbb on [Link] Book Review: Reframing Superintelligence (SSC) · 2019-08-30T23:35:47.251Z · score: 7 (5 votes) · LW · GW

It might also be worth comparing CAIS and "tool AI" to Paul Christiano's IDA and the desiderata MIRI tends to talk about (task-directed AGI [1,2,3], mild optimization, limited AGI).

At a high level, I tend to think of Christiano and Drexler as both approaching alignment from very much the right angle, in that they're (a) trying to break apart the vague idea of "AGI reasoning" into smaller parts, and (b) shooting for a system that won't optimize harder (or more domain-generally) than we need for a given task. From conversations with Nate, one way I'd summarize MIRI-cluster disagreements with Christiano and Drexler's proposals is that MIRI people don't tend to think these proposals decompose cognitive work enough. Without a lot more decomposition/understanding, either the system as a whole won't be capable enough, or it will be capable by virtue of atomic parts that are smart enough to be dangerous, where safety is a matter of how well we can open those black boxes.

In my experience people use "tool AI" to mean a bunch of different things, including things MIRI considers very important and useful (like "only works on a limited task, rather than putting any cognitive work into more general topics or trying to open-endedly optimize the future") as well as ideas that don't seem relevant or that obscure where the hard parts of the problem probably are.

Comment by robbbb on how should a second version of "rationality: A to Z" look like? · 2019-08-24T12:13:15.639Z · score: 2 (1 votes) · LW · GW

For Facebook, I use FBPurity to block my news feed. Then if there are particular individuals I especially want to follow, I add them to a Facebook List.

Comment by robbbb on Partial summary of debate with Benquo and Jessicata [pt 1] · 2019-08-17T22:44:48.018Z · score: 9 (4 votes) · LW · GW

For 'things that aren't an accident but aren't necessarily conscious or endorsed', another option might be to use language like 'decision', 'action', 'choice', etc. but flagged in a way that makes it clear you're not assuming full consciousness. Like 'quasi-decision', 'quasi-action', 'quasi-conscious'... Applied to Zack's case, that might suggest a term like 'quasi-dissembling' or 'quasi-misleading'. 'Dissonant communication' comes to mind as another idea.

When I want to emphasize that there's optimization going on but it's not necessarily conscious, I sometimes speak impersonally of "Bob's brain is doing X", or "a Bob-part/agent/subagent is doing X".

Comment by robbbb on Partial summary of debate with Benquo and Jessicata [pt 1] · 2019-08-17T13:03:57.623Z · score: 29 (8 votes) · LW · GW

I personally wouldn't point to "When Will AI Exceed Human Performance?" as an exemplar on this dimension, because it isn't clear about the interesting implications of the facts it's reporting. Katja's take-away from the paper was:

In the past, it seemed pretty plausible that what AI researchers think is a decent guide to what’s going to happen. I think we've pretty much demonstrated that that’s not the case. I think there are a variety of different ways we might go about trying to work out what AI timelines are like, and talking to experts is one of them; I think we should weight that one down a lot.

I don't know whether Katja's co-authors agree with her about that summary, but if there's disagreement, I think the paper still could have included more discussion of the question and which findings look relevant to it.

The actual Discussion section makes the opposite argument instead, listing a bunch of reasons to think AI experts are good at foreseeing AI progress. The introduction says "To prepare for these challenges, accurate forecasting of transformative AI would be invaluable. [...] The predictions of AI experts provide crucial additional information." And the paper includes a list of four "key findings", none of which even raise the question of survey respondents' forecasting chops, and all of which are worded in ways that suggest we should in fact put some weight on the respondents' views (sometimes switching between the phrasing 'researchers believe X' and 'X is true').

The abstract mentions the main finding that undermines how believable the responses are, but does so in such a way that someone reading through quickly might come away with the opposite impression. The abstract's structure is:

To adapt public policy, we need to better anticipate [AI advances]. Researchers predict [A, B, C, D, E, and F]. Researchers believe [G and H]. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.

If it slips past your attention that G and H are massively inconsistent, it's easy for the reader to come away thinking the abstract is saying 'Here's a list of of credible statements from experts about their area of expertise' as opposed to 'Here's a demonstration that what AI researchers think is not a decent guide to what's going to happen'.

Comment by robbbb on Occam's Razor: In need of sharpening? · 2019-08-06T22:11:52.359Z · score: 8 (4 votes) · LW · GW

Humans might not be a low-level atom, but obviously we have to privilege the hypothesis 'something human-like did this' if we've already observed a lot of human-like things in our environment.

Suppose I'm a member of a prehistoric tribe, and I see a fire in the distance. It's fine for me to say 'I have a low-ish prior on a human starting the fire, because (AFAIK) there are only a few dozen humans in the area'. And it's fine for me to say 'I've never seen a human start a fire, so I don't think a human started this fire'. But it's not fine for me to say 'It's very unlikely a human started that fire, because human brains are more complicated than other phenomena that might start fires', even if I correctly intuit how and why humans are more complicated than other phenomena.

The case of Thor is a bit more complicated, because gods are different from humans. If Eliezer and cousin_it disagree on this point, maybe Eliezer would say 'The complexity of the human brain is the biggest reason why you shouldn't infer that there are other, as-yet-unobserved species of human-brain-ish things that are very different from humans', and maybe cousin_it would say 'No, it's pretty much just the differentness-from-observed-humans (on the "has direct control over elemental forces" dimension) that matters, not the fact that it has a complicated brain.'

If that's a good characterization of the disagreement, then it seems like Eliezer might say 'In ancient societies, it was much more reasonable to posit mindless "supernatural" phenomena (i.e., mindless physical mechanisms wildly different from anything we've observed) than to posit intelligent supernatural phenomena.' Whereas the hypothetical cousin-it might say that ancient people didn't have enough evidence to conclude that gods were any more unlikely than mindless mechanisms that were similarly different from experience. Example question: what probability should ancient people have assigned to

The regular motion of the planets is due to a random process plus a mindless invisible force, like the mindless invisible force that causes recently-cooked food to cool down all on its own.

vs.

The regular motion of the planets is due to deliberate design / intelligent intervention, like the intelligent intervention that arranges and cooks food.
Comment by robbbb on AI Alignment Open Thread August 2019 · 2019-08-06T18:36:35.494Z · score: 3 (2 votes) · LW · GW

Also the discussion of deconfusion research in https://intelligence.org/2018/11/22/2018-update-our-new-research-directions/ and https://www.lesswrong.com/posts/Gg9a4y8reWKtLe3Tn/the-rocket-alignment-problem , and the sketch of 'why this looks like a hard problem in general' in https://www.lesswrong.com/posts/zEvqFtT4AtTztfYC4/optimization-amplifies and https://arbital.com/p/aligning_adds_time/ .

Comment by robbbb on AI Alignment Open Thread August 2019 · 2019-08-06T18:30:47.722Z · score: 13 (4 votes) · LW · GW

MIRIx events are funded by MIRI, but we don't decide the topics or anything. I haven't taken a poll of MIRI researchers to see how enthusiastic different people are about formal verification, but AFAIK Nate and Eliezer don't see it as super relevant. See https://www.lesswrong.com/posts/xCpuSfT5Lt6kkR3po/my-take-on-agent-foundations-formalizing-metaphilosophical#cGuMRFSi224RCNBZi and the idea of a "safety-story" in https://www.lesswrong.com/posts/8gqrbnW758qjHFTrH/security-mindset-and-ordinary-paranoia for better attempts to characterize what MIRI is looking for.

ETA: From the end of the latter dialogue,

In point of fact, the real reason the author is listing out this methodology is that he's currently trying to do something similar on the problem of aligning Artificial General Intelligence, and he would like to move past “I believe my AGI won't want to kill anyone” and into a headspace more like writing down statements such as “Although the space of potential weightings for this recurrent neural net does contain weight combinations that would figure out how to kill the programmers, I believe that gradient descent on loss function L will only access a result inside subspace Q with properties P, and I believe a space with properties P does not include any weight combinations that figure out how to kill the programmer.”
Though this itself is not really a reduced statement and still has too much goal-laden language in it.

Rather than putting the emphasis on being able to machine-verify all important properties of the system, this puts the emphasis on having strong technical insight into the system; I usually think of formal proofs more as a means to that end. (Again caveating that some people at MIRI might think of this differently.)

Comment by robbbb on Feedback Requested! Draft of a New About/Welcome Page for LessWrong · 2019-06-02T01:34:36.448Z · score: 4 (2 votes) · LW · GW

A tricky thing about this is that there's an element of cognitive distortion in how most people evaluate these questions, and play-acting at "this distortion makes sense" can worsen the distortion (at the same time that it helps win more trust from people who have the distortion).

If it turned out to be a good idea to try to speak to this perspective, I'd recommend first meditating on a few reversal tests. Like: "Hmm, I wouldn't feel any need to add a disclaimer here if the text I was recommending were The Brothers Karamazov, though I'd want to briefly say why it's relevant, and I might worry about the length. I'd feel a bit worried about recommending a young adult novel, even an unusually didactic one, because people rightly expect YA novels to be optimized for less useful and edifying things than the "literary classics" reference class. The insights tend to be shallower and less common. YA novels and fanfiction are similar in all those respects, and they provoke basically the same feeling in me, so I can maybe use that reversal test to determine what kinds of disclaimers or added context make sense here."

Comment by robbbb on FB/Discord Style Reacts · 2019-06-01T22:43:33.318Z · score: 2 (1 votes) · LW · GW

(If I want to express stronger gratitude than that, I'd rather write it out.)

Comment by robbbb on FB/Discord Style Reacts · 2019-06-01T22:42:28.296Z · score: 2 (1 votes) · LW · GW

On slack, Thumbs Up, OK, and Horns hand signs meet all my minor needs for thanking people.

Comment by robbbb on Drowning children are rare · 2019-05-30T01:28:16.487Z · score: 6 (3 votes) · LW · GW

Can't individuals just list 'Reign of Terror' and then specify in their personalized description that they have a high bar for terror?

Comment by robbbb on Coherent decisions imply consistent utilities · 2019-05-14T19:46:56.205Z · score: 5 (3 votes) · LW · GW

We'd talked about getting a dump out as well, and your plan sounds great to me! The LW team should get back to you with a list at some point (unless they think of a better idea).

Comment by robbbb on Coherent decisions imply consistent utilities · 2019-05-14T03:44:21.316Z · score: 16 (8 votes) · LW · GW

I asked Eliezer if it made sense to cross-post this from Arbital, and did the cross-posting when he approved. I'm sorry it wasn't clear that this was a cross-post! I intended to make this clearer, but my idea was bad (putting the information on the sequence page) and I also implemented it wrong (the sequence didn't previously display on the top of this post).

This post was originally written as a nontechnical introduction to expected utility theory and coherence arguments. Although it begins in media res stylistically, it doesn't have any prereqs or context beyond "this is part of a collection of introductory resources covering a wide variety of technical and semitechnical topics."

Per the first sentence, the main purpose is for this to be a linkable resource for conversations/inquiry about human rationality and conversations/inquiry about AGI:

So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'. And before we even ask what those are, we might first ask, Why?

There have been loose plans for a while to cross-post content from Arbital to LW (maybe all of it; maybe just the best or most interesting stuff), but as I mentioned downthread, we're doing more cross-post experiments sooner than we would have because Arbital's been having serious performance issues.