Realism about rationality

post by ricraz · 2018-09-16T10:46:29.239Z · score: 136 (55 votes) · LW · GW · 71 comments

This is a link post for http://thinkingcomplete.blogspot.com/2018/09/rational-and-real.html

Epistemic status: trying to vaguely gesture at vague intuitions. What CFAR calls “purple”. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post.

There’s a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is.

Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: "momentum", "evolutionary fitness", and "intelligence". These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There's a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn't just because biologists haven't figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated "function" which basically requires you to describe that organism's entire phenotype, genotype and environment.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It's a mindset which makes the following ideas seem natural:

To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they're all false. In fact, from a historical point of view I’m quite optimistic about using maths to describe things in general. But starting from that historical baseline, I’m inclined to adjust downwards on questions related to formalising rational thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks). It's true that "messy" human intelligence is able to generalise to a wide variety of domains it hadn't evolved to deal with, which supports rationality realism, but analogously an animal can be evolutionarily fit in novel environments without implying that fitness is easily formalisable.

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

Another gesture towards the thing: a popular metaphor for Kahneman and Tversky's dual process theory is a rider trying to control an elephant. Implicit in this metaphor is the localisation of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your CEV feel less well-defined? I claim that the latter perspective is just as sensible as the former, and perhaps even more so - see, for example, Paul Christiano's model of the mind, which leads him to conclude that "imagining conscious deliberation as fundamental, rather than a product and input to reflexes that actually drive behavior, seems likely to cause confusion."

These ideas have been stewing in my mind for a while, but the immediate trigger for this post was a conversation about morality which went along these lines:

R (me): Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.
O (a friend): You can’t just accept a contradiction! It’s like saying “I have an intuition that 51 is prime, so I’ll just accept that as an axiom.”
R: Morality isn’t like maths. It’s more like having tastes in food, and then having preferences that the tastes have certain consistency properties - but if your tastes are strong enough, you might just ignore some of those preferences.
O: For me, my meta-level preferences about the ways to reason about ethics (e.g. that you shouldn’t allow contradictions) are so much stronger than my object-level preferences that this wouldn’t happen. Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.
R: Actually, I think a much smarter agent could still be weirdly modular like humans are, and work in such a way that describing it as having “beliefs” is still a very lossy approximation. And it’s plausible that there’s no canonical way to “scale me up”.

I had a lot of difficulty in figuring out what I actually meant during that conversation, but I think a quick way to summarise the disagreement is that O is a rationality realist, and I’m not. This is not a problem, per se: I'm happy that some people are already working on AI safety from this mindset, and I can imagine becoming convinced that rationality realism is a more correct mindset than my own. But I think it's a distinction worth keeping in mind, because assumptions baked into underlying worldviews are often difficult to notice, and also because the rationality community has selection effects favouring this particular worldview even though it doesn't necessarily follow from the community's founding thesis (that humans can and should be more rational).

71 comments

Comments sorted by top scores.

comment by Viliam · 2018-09-16T20:51:51.718Z · score: 36 (16 votes) · LW · GW

I have an intuition that the "realism about rationality" approach will lead to success, even if it will have to be dramatically revised on the way.

To explain, imagine that centuries years ago there are two groups trying to find out how the planets move. Group A says: "Obviously, planets must move according to some simple mathematical rule. The simplest mathematical shape is a circle, therefore planets move in circles. All we have to do is find out the exact diameter of each circle." Group B says: "No, you guys underestimate the complexity of the real world. The planets, just like everything in nature, can only be approximated by a rule, but there are always exceptions and unpredictability. You will never find a simple mathematical model to describe the movement of the planets."

The people who finally find out how the planets move will be spiritual descendants of the group A. Even if on the way they will have to add epicycles, and then discard the idea of circles, which seems like total failure of the original group. The problem with the group B is that it has no energy to move forward.

The right moment to discard a simple model is when you have enough data to build a more complex model.

comment by ricraz · 2018-09-16T21:11:56.684Z · score: 41 (20 votes) · LW · GW
The people who finally find out how the planets move will be spiritual descendants of the group A. ... The problem with the group B is that it has no energy to move forward.

In this particular example, it's true that group A was more correct. This is because planetary physics can be formalised relatively easily, and also because it's a field where you can only observe and not experiment. But imagine the same conversation between sociologists who are trying to find out what makes people happy, or between venture capitalists trying to find out what makes startups succeed. In those cases, Group B can move forward using the sort of "energy" that biologists and inventors and entrepreneurs have, driven an experimental and empirical mindset. Whereas Group A might spend a long time writing increasingly elegant equations which rely on unjustified simplifications.

Instinctively reasoning about intelligence using analogies from physics instead of the other domains I mentioned above is a very good example of rationality realism.

comment by jamii · 2018-09-20T11:15:20.312Z · score: 4 (4 votes) · LW · GW

Uncontrolled argues along similar lines - that the physics/chemistry model of science, where we get to generalize a compact universal theory from a number of small experiments, is simply not applicable to biology/psychology/sociology/economics and that policy-makers should instead rely more on widespread, continuous experiments in real environments to generate many localized partial theories.

A prototypical argument is the paradox-of-choice jam experiment, which has since become solidified in pop psychology. But actual supermarkets run many 1000s of in-situ experiments and find that it actually depends on the product, the nature of the choices, the location of the supermarket, the time of year etc.

comment by Rob Bensinger (RobbBB) · 2018-09-20T20:33:09.746Z · score: 18 (7 votes) · LW · GW
Uncontrolled argues along similar lines - that the physics/chemistry model of science, where we get to generalize a compact universal theory from a number of small experiments, is simply not applicable to biology/psychology/sociology/economics and that policy-makers should instead rely more on widespread, continuous experiments in real environments to generate many localized partial theories.

I'll note that (non-extreme) versions of this position are consistent with ideas like "it's possible to build non-opaque AGI systems." The full answer to "how do birds work?" is incredibly complex, hard to formalize, and dependent on surprisingly detailed local conditions that need to be discovered empirically. But you don't need to understand much of that complexity at all to build flying machines with superavian speed or carrying capacity, or to come up with useful theory and metrics for evaluating "goodness of flying" for various practical purposes; and the resultant machines can be a lot simpler and more reliable than a bird, rather than being "different from birds but equally opaque in their own alien way".

This isn't meant to be a response to the entire "rationality non-realism" suite of ideas, or a strong argument that AGI developers can steer toward less opaque systems than AlphaZero; it's just me noting a particular distinction that I particularly care about.

The relevant realism-v.-antirealism disagreement won't be about "can machines serve particular functions more transparently than biological organs that happen to serve a similar function (alongside many other functions)?". In terms of the airplane analogy, I expect disagreements like "how much can marginal effort today increase transparency once we learn how to build airplanes?", "how much useful understanding are we currently missing about how airplanes work?", and "how much of that understanding will we develop by default on the path toward building airplanes?".

comment by binary_doge · 2018-10-01T16:01:11.323Z · score: 3 (2 votes) · LW · GW

"This is because planetary physics can be formalized relatively easily" - they can now, and could when they were, but not before. One can argue that we thought many "complex" and very "human" abilities could not be algroithmically emulated in the past, and recent advances in AI (with neural nets and all that) have proven otherwise. If a program can do/predict something, there is a set of mechanical rules that explain it. The set might not be as elegant as Newton's laws of motion, but it is still a set of equations nonetheless. The idea behind Villam's comment (I think) is that in the future someone might say, the same way you just did, that "We can formalize how happy people generally are in a given society because that's relatively easy, but what about something truly complex like what an individual might imagine if we read him a specific story?".

In other words, I don't see the essential differentiation between biology and sociology questions and physics questions, that you try to point to. In the post itself you also talk about moral preference, and I tend to agree with you that some people just have very individually strongly valued axioms that might contradict themselves or others, but it doesn't in itself mean that questions about rationality differ from questions about, say, molecular biology, in the sense that they can be hypothetically answered to a satisfactory level of accuracy.

comment by DragonGod · 2018-09-30T19:56:23.323Z · score: 1 (1 votes) · LW · GW

Group A was most successful in the field of computation, so I have high confidence that their approach would be successful in intelligence as well (especially in intelligence of artificial agents).

comment by drossbucket · 2018-09-17T05:51:32.092Z · score: 6 (4 votes) · LW · GW

This is the most compelling argument I've been able to think of too when I've tried before. Feynman has a nice analogue of it within physics in The Character of Physical Law:

... it would have been no use if Newton had simply said, 'I now understand the planets', and for later men to try to compare it with the earth's pull on the moon, and for later men to say 'Maybe what holds the galaxies together is gravitation'. We must try that. You could say 'When you get to the size of the galaxies, since you know nothing about it, anything can happen'. I know, but there is no science in accepting this type of limitation.

I don't think it goes through well in this case, for the reasons ricraz outlines in their reply. Group B already has plenty of energy to move forward, from taking our current qualitative understanding and trying to build more compelling explanatory models and find new experimental tests. It's Group A that seems rather mired in equations that don't easily connect.

Edit: I see I wrote about something similar before, in a rather rambling way.

comment by abramdemski · 2018-09-24T18:38:09.419Z · score: 34 (12 votes) · LW · GW

Rationality realism seems like a good thing to point out which might be a crux for a lot of people, but it doesn't seem to be a crux for me.

I don't think there's a true rationality out there in the world, or a true decision theory out there in the world, or even a true notion of intelligence out there in the world. I work on agent foundations because there's still something I'm confused about even after that, and furthermore, AI safety work seems fairly hopeless while still so radically confused about the-phenomena-which-we-use-intelligence-and-rationality-and-agency-and-decision-theory-to-describe. And, as you say, "from a historical point of view I’m quite optimistic about using maths to describe things in general".

comment by romeostevensit · 2018-09-16T23:04:23.719Z · score: 30 (13 votes) · LW · GW

I really like the compression "There's no canonical way to scale me up."

I think it captures a lot of the important intuitions here.

comment by Benito · 2018-09-17T00:51:29.243Z · score: 1 (2 votes) · LW · GW

+1

comment by Benito · 2018-09-17T02:38:24.532Z · score: 27 (6 votes) · LW · GW

I think I want to split up ricraz's examples in the post into two subclasses, defined by two questions.

The first asks, given that there are many different AGI architectures one could scale up into, are some better than others? (My intuition is both that there are better ones than others, and also that there are many who are on the pareto frontier.) And is there any sort of simple ways to determine about why one is better than another? This leads to saying the following examples from the OP:

There is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general; there is an “ideal” decision theory; the idea that AGI will very likely be an “agent”; the idea that Turing machines and Kolmogorov complexity are foundational for epistemology; the idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.

The second asks - suppose that some architectures are better than others, and suppose there are some simple explanations about why some are better than others. How practical is it to talk of me in this way today? Here's some concrete examples of things I might do:

Given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints; the idea that Aumann's agreement theorem is relevant to humans; the idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors; the idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).

If I am to point to two examples that feel very concrete to me, I might ask:

  • Is the reasoning that Harry is doing in Chapter 86: Multiple Hypothesis Testing [LW · GW] useful or totally insane?
  • When one person says "I guess we'll have to agree to disagree" and the second person says "Actually according to Aumann's Agreement Theorem, we can't" is the second person making a type error?

Certainly the first person is likely mistaken if they're saying "In principle no exchange of evidence could cause us to agree", but perhaps the second person is also mistaken, in implying that it makes any sense to model their disagreement in terms of idealised, scaled-up, rational agents rather than the weird bag of meat and neuroscience that we actually are - for which Aumann's Agreement Theorem certainly has not been proven.

To be clear: the two classes of examples come from roughly the same generator, and advances in our understanding of one can lead to advances in the other. I just often draw from fairly different reference classes of evidence for updating on them (examples: For the former, Jaynes, Shannon, Feynman. For the latter, Kahneman & Tversky and Tooby & Cosmides).

comment by Benquo · 2018-09-17T12:17:13.169Z · score: 21 (5 votes) · LW · GW
When one person says "I guess we'll have to agree to disagree" and the second person says "Actually according to Aumann's Agreement Theorem, we can't" is the second person making a type error?

Making a type error is not easy to distinguish from attempting to shift frame. (If it were, the frame control wouldn't be very effective.) In the example Eliezer gave from the sequences, he was shifting frame from one that implicitly acknowledges interpretive labor as a cost, to one that demands unlimited amounts of interpretive labor by assuming that we're all perfect Bayesians (and therefore have unlimited computational ability, memory, etc).

This is a big part of the dynamic underlying mistake vs conflict theory.

comment by Benquo · 2018-09-17T12:18:04.525Z · score: 9 (4 votes) · LW · GW

Eliezer's behavior in the story you're alluding to only seems "rational" insofar as we think the other side ends up with a better opinion - I can easily imagine a structurally identical interaction where the protagonist manipulates someone into giving up on a genuine but hard to articulate objection, or proceeding down a conversational path they're ill-equipped to navigate, thus "closing the sale."

comment by gjm · 2018-09-18T11:52:47.801Z · score: 11 (7 votes) · LW · GW

It's not at all clear that improving the other person's opinion was really one of Eliezer's goals on this occasion, as opposed to showing up the other person's intellectual inferiority. He called the post "Bayesian Judo", and highlighted how his showing-off impressed someone of the opposite sex.

He does also suggest that in the end he and the other person came to some sort of agreement -- but it seems pretty clear that the thing they agreed on had little to do with the claim the other guy had originally been making, and that the other guy's opinion on that didn't actually change. So I think an accurate, though arguably unkind, summary of "Bayesian Judo" goes like this: "I was at a party, I got into an argument with a religious guy who didn't believe AI was possible, I overwhelmed him with my superior knowledge and intelligence, he submitted to my manifest superiority, and the whole performance impressed a woman". On this occasion, helping the other party to have better opinions doesn't seem to have been a high priority.

comment by Said Achmiz (SaidAchmiz) · 2018-09-17T03:47:45.611Z · score: 18 (7 votes) · LW · GW

When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?

Note: I confess to being a bit surprised that you picked this example. I’m not quite sure whether you picked a bad example for your point (possible) or whether I’m misunderstanding your point (also possible), but I do think that this question is interesting all on its own, so I’m going to try and answer it.

Here’s a joke that you’ve surely heard before—or have you?

Three mathematicians walk into a bar. The bartender asks them, “Do you all want a beer?”

The first mathematician says, “I don’t know.”

The second mathematician says, “I don’t know.”

The third mathematician says, “I don’t know.”

The lesson of this joke applies to the “according to Aumann’s Agreement Theorem …” case.

When someone says “I guess we’ll have to agree to disagree” and their interlocutor responds with “Actually according to Aumann’s Agreement Theorem, we can’t”, I don’t know if I’d call this a “type error”, precisely (maybe it is; I’d have to think about it more carefully); but the second person is certainly being ridiculous. And if I were the first person in such a situation, my response might be something along these lines:

“Really? We can’t? We can’t what, exactly? For example, I could turn around and walk away. Right? Surely, the AAT doesn’t say that I will be physically unable to do that? Or does it, perhaps, say that either you or I or both of us will be incapable of interacting amicably henceforth, and conversing about all sorts of topics other than this one? But if not, then what on Earth could you have meant by your comment?

“I mean… just what, exactly, did you think I meant, when I suggested that we agree to disagree? Did you take me to be claiming that (a) the both of us are ideal Bayesian reasoners, and (b) we have common knowledge of our posterior probabilities of the clearly expressible proposition the truth of which we are discussing, but (c) our posterior probabilities, after learning this, should nonetheless differ? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate of the probability that I actually meant to make that specific, precisely technical statement?”

And so on. The questions are rhetorical, of course. Anyone with half an ounce of common sense (not to mention anyone with an actual understanding of the AAT!) understands perfectly well that the Theorem is totally inapplicable to such cases.

(Of course, in some sense this is all moot. The one who says “actually, according to the AAT…” doesn’t really think that his interlocutor meant all of that. He’s not really making any kind of error… except, possibly, a tactical one—but perhaps not even that.)

comment by Benito · 2018-09-17T05:44:02.002Z · score: 16 (5 votes) · LW · GW

Firstly, I hadn't heard the joke before, and it made me chuckle to myself.

Secondly, I loved this comment, for very accurately conveying the perspective I felt like ricraz was trying to defend wrt realism about rationality.

Let me say two (more) things in response:

Firstly, I was taking the example directly from Eliezer [LW · GW].

I said, "So if I make an Artificial Intelligence that, without being deliberately preprogrammed with any sort of script, starts talking about an emotional life that sounds like ours, that means your religion is wrong."
He said, "Well, um, I guess we may have to agree to disagree on this."
I said: "No, we can't, actually. There's a theorem of rationality called Aumann's Agreement Theorem which shows that no two rationalists can agree to disagree. If two people disagree with each other, at least one of them must be doing something wrong."

(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)

Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.

Suppose one person is trying to ignore a small piece of evidence against a cherished position, and a second person says to the them "I know you've ignored this piece of evidence, but you can't do that because it is Bayesian evidence - it is the case that you're more likely to see this occur in worlds where your belief is false than in worlds where it's true, so the correct epistemic move here is to slightly update against your current belief."

If I may clumsily attempt to wrangle your example to my own ends, might they not then say:

“I mean… just what, exactly, did you think I meant, when I said this wasn't any evidence at all? Did you take me to be claiming that (a) I am an ideal Bayesian reasoner, and (b) I have observed evidence that occurs in more worlds where my belief is true than if it is false, but (c) my posterior probability, after learning this, should still equal my prior probability? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate that I actually meant to make that specific, precisely technical statement?”

and further

I am not a rational agent. I am a human, and my mind does not satisfy the axioms of probability theory; therefore it is nonsensical to attempt to have me conform my speech patterns and actions to these logical formalisms.

Bayes' theorem applies if your beliefs update according to very strict axioms, but it's not at all obvious to me that the weird fleshy thing in my head currently conforms to those axioms. Should I nonetheless try to? And if so, why shouldn't I for AAT?

Aumann's Agreement Theorem is true if we are rational (bayesian) agents. There a large other number of theorems that apply to rational agents too, and it seems that sometimes people want to use these abstract formalisms to guide behaviour and sometimes not, and having a principled stance here about when and when not to use them seems useful and important.

comment by Said Achmiz (SaidAchmiz) · 2018-09-17T06:36:04.803Z · score: 13 (8 votes) · LW · GW

Well, I guess you probably won’t be surprised to hear that I’m very familiar with that particular post of Eliezer’s, and instantly thought of it when I read your example. So, consider my commentary with that in mind!

(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)

Well, whether Eliezer was using the AAT correctly rather depends on what he meant by “rationalist”. Was he using it as a synonym for “perfect Bayesian reasoner”? (Not an implausible reading, given his insistence elsewhere on the term “aspiring rationalist” for mere mortals like us, and, indeed, like himself.) If so, then certainly what he said about the Theorem was true… but then, of course, it would be wholly inappropriate to apply it in the actual case at hand (especially since his interlocutor was, I surmise, some sort of religious person, and plausibly not even an aspiring rationalist).

If, instead, Eliezer was using “rationalist” to refer to mere actual humans of today, such as himself and the fellow he was conversing with, then his description of the AAT was simply inaccurate.

Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.

Indeed not. The critical point is this: there is a difference between trying to use Bayesian reasoning and intepreting people’s comments to refer to Bayesian reasoning. Whether you do the former is between you and your intellectual conscience, so to speak. Whether you do the latter, on the other hand, is a matter of both pragmatics (is this any kind of a good idea?) and of factual accuracy (are you correctly understand what someone is saying?).

So the problem with your example, and with your point, is the equivocation between two questions:

  1. “I’m not a perfect Bayesian reasoner, but shouldn’t I try to be?” (And the third-person variant, which is isomorphic to the first-person variant to whatever degree your goals and that of your advisee/victim are aligned.)

  2. “My interlocutor is not speaking with the assumption that we’re perfect Bayesian reasoners, nor is he referring to agreement or belief or anything else in any kind of a strict, technical, Bayesian sense, but shouldn’t I assume that he is, thus ascribing meaning to his words that is totally different than his intended meaning?”

The answer to the first question is somewhere between “Uh, sure, why not, I guess? That’s your business, anyway” and “Yes, totally do that! Tsuyoku naritai, and all that!”.

The answer to the second question is “No, that is obviously a terrible idea. Never do that.”

comment by c0rw1n · 2018-09-17T04:10:49.865Z · score: 2 (4 votes) · LW · GW
  • Aumann's agreement theorem says that two people acting rationally (in a certain precise sense) and with common knowledge of each other's beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesian rationalists with common priors, and if they each have common knowledge of their individual posterior probabilities, then their posteriors must be equal.

With common priors.

This is what does all the work there! If the disagreeers have non-equal priors on one of the points, then of course they'll have different posteriors.

Of course applying Bayes' Theorem with the same inputs is going to give the same outputs, that's not even a theorem, that's an equals sign.

If the disagreeers find a different set of parameters to be relevant, and/or the parameters they both find relevant do not have the same values, the outputs will differ, and they will continue to disagree.

comment by Benquo · 2018-09-17T12:21:47.452Z · score: 22 (6 votes) · LW · GW

Relevant: Why Common Priors

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-09-18T11:17:55.720Z · score: 29 (9 votes) · LW · GW

Although I don't necessarily subscribe to the precise set of claims characterized as "realism about rationality", I do think this broad mindset is mostly correct, and the objections outlined in this essay are mostly wrong.

There’s a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn’t just because biologists haven’t figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated “function” which basically requires you to describe that organism’s entire phenotype, genotype and environment.

This seems entirely wrong to me. Evolution definitely should be studied using mathematical models, and although I am not an expert in that, AFAIK this approach is fairly standard. "Fitness" just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a "descendant". The latter is not as unambiguously defined as "momentum" but under normal conditions it is quite precise. The actual structure and dynamics of biological organisms and their environment is very complicated, but this does not preclude the abstract study of evolution, i.e. understanding which sort of dynamics are possible in principle (for general environments) and in which way they depend on the environment etc. Applying this knowledge to real-life evolution is not trivial (and it does require a lot of complementary empirical research), as is the application of theoretical knowledge in any domain to "messy" real-life examples, but that doesn't mean such knowledge is useless. On the contrary, such knowledge is often essential to progress.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It’s a mindset which makes the following ideas seem natural: The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/​or intelligence in general. (I don’t count brute force approaches like AIXI for the same reason I don’t consider physics a simple yet powerful description of biology)...

I wonder whether the OP also doesn't count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology. Indeed, it's hard to imagine we would achieve the modern level of understanding chemistry without understanding at least non-relativistic quantum mechanics, and it's hard to imagine we would make much headway in molecular biology without chemistry, thermodynamics et cetera.

This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks).

Once again, the OP uses the concept of "messiness" in a rather ambiguous way. It is true that human and animal intelligence is "messy" in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.

The disagreement here seems to revolve around the question of, when should we expect to have a simple theory for a given phenomenon (i.e. when does Occam's razor apply)? It seems clear that we should expect to have a simple theory of e.g. fundumental physics, but not a simple equation for the coastline of Africa. The difference is, physics is a unique object that has a fundumental role, whereas Africa is just one arbitrary continent among the set of all continents on all planets in the universe throughout its lifetime and all Everett branches. Therefore, we don't expect a simple description of Africa, but we do expect a relatively simple description of planetary physics that would tell us which continent shapes are possible and which are more likely.

Now, "rationality" and "intelligence" are in some sense even more fundumental than physics. Indeed, rationality is what tells us how to form correct beliefs, i.e. how to find the correct theory of physics. Looking an anthropic paradoxes, it is even arguable that making decisions is even more fundumental than forming beliefs (since anthropic paradoxes are situations in which assigning subjective probabilities seems meaningless but the correct decision is still well-defined via "functional decision theory" or something similar). Therefore, it seems like there has to be a simple theory of intelligence, even if specific instances of intelligence are complex by virtue of their adaptation to specific computational hardware, specific utility function (or maybe some more general concept of "values"), somewhat specific (although still fairly diverse) class of environments, and also by virtue of arbitrary flaws in their design (that are still mild enough to allow for intelligent behavior).

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

This line of thought would benefit from more clearly delineating descriptive versus prescriptive. The question we are trying to answer is: "if we build a powerful goal-oriented agent, what goal system should we give it?" That is, it is fundamentally a prescriptive rather than descriptive question. It seems rather clear that the best choice of goal system would be in some sense similar to "human goals". Moreover, it seems that if possibilities A and B are such that is ill-defined whether humans (or at least, those humans that determine the goal system of the powerful agent) prefer A or B, then there is no moral significance to choosing between A and B in the target goal system. Therefore, we only need to determine "human goals" within the precision to which they are actually well-defined, not within absolute precision.

Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.

The question is not whether it is "fine". The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be "X", it might be "Y", it might be "both actions are equally good", or it might be even "Z" for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.

comment by cousin_it · 2018-09-18T12:26:39.118Z · score: 11 (7 votes) · LW · GW

It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence.

I used to think the same way, but the OP made me have a crisis of faith, and now I think the opposite way.

Sure, an animal brain solving an animal problem is messy. But a general purpose computer solving a simple mathematical problem can be just as messy. The algorithm for multiplying matrices in O(n^2.8) is more complex than the algorithm for doing it in O(n^3), and the algorithm with O(n^2.4) is way more complex than that. As I said in the other comment, "algorithms don't get simpler as they get better".

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-09-18T13:06:23.072Z · score: 17 (5 votes) · LW · GW

I don't know a lot about the study of matrix multiplication complexity, but I think that one of the following two possibilities is likely to be true:

  1. There is some and an algorithm for matrix multiplication of complexity for any s.t. no algorithm of complexity exists (AFAIK, the prevailing conjecture is ). This algorithm is simple enough for human mathematicians to find it, understand it and analyze its computational complexity. Moreover, there is a mathematical proof of its optimality that is simple enough for human mathematicians to find and understand.
  2. There is a progression of algorithms for lower and lower exponents that increases in description complexity without bound as the exponent approaches from above, and the problem of computing a program with given exponent is computationally intractable or even uncomputable. This fact has a mathematical proof that is simple enough for human mathematicians to find and understand.

Moreover, if we only care about having a polynomial time algorithm with some exponent then the solution is simple (and doesn't require any astronomical coefficients like Levin search; incidentally, the algorithm is also good enough for most real world applications). In either case, the computational complexity of matrix multiplication is understandable in the sense I expect intelligence to be understandable.

So, it is possible that there is a relatively simple and effective algorithm for intelligence (although I still expect a lot of "messy" tweaking to get a good algorithm for any specific hardware architecture; indeed, computational complexity is only defined up to a polynomial if you don't specify a model of computation), or it is possible that there is a progression of increasingly complex and powerful algorithms that are very expensive to find. In the latter case, long AGI timelines become much more probable since biological evolution invested an enormous amount of resources in the search which we cannot easily match. In either case, there should be a theory that (i) defines what intelligence is (ii) predicts how intelligence depends on parameters such as description complexity and computational complexity.

comment by cousin_it · 2018-09-18T14:06:15.913Z · score: 0 (4 votes) · LW · GW

A good algorithm can be easy to find, but not simple in the other senses of the word. Machine learning can output an algorithm that seems to perform well, but has a long description and is hard to prove stuff about. The same is true for human intelligence. So we might not be able to find an algorithm that's as strong as human intelligence but easier to prove stuff about.

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-09-18T14:52:15.333Z · score: 13 (6 votes) · LW · GW

Machine learning uses data samples about an unknown phenomenon to extrapolate and predict the phenomenon in new instances. Such algorithms can have provable guarantees regarding the quality of the generalization: this is exactly what computational learning theory is about. Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks. And even so, there is already some progress. People have been making buildings and cannons before Newtonian mechanics, engines before thermodynamics and ways of using chemical reactions before quantum mechanics or modern atomic theory. The fact you can do something using trial and error doesn't mean trial and error is the only way to do it.

comment by cousin_it · 2018-09-18T19:36:20.129Z · score: 2 (2 votes) · LW · GW

Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks.

I think "inherent mysteriousness" is also possible. Some complex things are intractable to prove stuff about.

comment by DragonGod · 2018-09-30T20:26:00.098Z · score: 2 (2 votes) · LW · GW

I don't see why better algorithms being more complex is a problem?

comment by DragonGod · 2018-09-30T20:24:50.308Z · score: 4 (2 votes) · LW · GW

I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.

Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation.

The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren't part of the territory itself.

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-10-01T20:55:18.066Z · score: 8 (4 votes) · LW · GW

Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by "physics" you, by definition, refer to the territory, then it seems to miss my point about Occam's razor. Occam's razor says that the map should be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff induction (which is a model of rationality) is the ultimate physical law: it is a mathematical rule of making predictions that generates all the other rules we can come up with. Such a point of view, although in some sense justified, at present would be impractical: this is because we know how to compute using actual physical models (including running computer simulations), but not so much using models of rationality. But this is just another way of saying we haven't constructed AGI yet.

I don't think it's meaningful to say that "weird physics may enable super Turing computation." Hypercomputation is just a mathematical abstraction. We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain is not a hypercomputer, we can never fully test such a theory. This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind, then from our subjective point of view, the world only contains computable processes.

comment by cousin_it · 2018-10-01T21:29:35.479Z · score: 7 (3 votes) · LW · GW

If your mind was computable but the external world had lots of seeming hypercomputation (e.g. boxes for solving the halting problem were sold on every corner and were apparently infallible), would you prefer to build an AI that used a prior over hypercomputable worlds, or an AI that used Solomonoff induction because it's the ultimate physical law?

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-10-01T21:48:17.574Z · score: 4 (3 votes) · LW · GW

What does it mean to have a box for solving the halting problem? How do you know it really solves the halting problem? There are some computable tests we can think of, but they would be incomplete, and you would only verify that the box satisfies those computable tests, not that is "really" a hypercomputer. There would be a lot of possible boxes that don't solve the halting problem that pass the same computable tests.

If there is some powerful computational hardware available, I would want the AI the use that hardware. If you imagine the hardware as being hypercomputers, then you can think of such an AI as having a "prior over hypercomputable worlds". But you can alternatively think of it as reasoning using computable hypotheses about the correspondence between the output of this hardware and the output of its sensors. The latter point of view is better, I think, because you can never know the hardware is really a hypercomputer.

comment by cousin_it · 2018-10-02T15:37:46.755Z · score: 5 (2 votes) · LW · GW

Hmm, that approach might be ruling out not only hypercomputers, but also sufficiently powerful conventional computers (anything stronger than PSPACE maybe?) because your mind isn't large enough to verify their strength. Is that right?

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-10-03T09:47:20.369Z · score: 1 (1 votes) · LW · GW

In some sense, yes, although for conventional computers you might settle on very slow verification. Unless you mean that, your mind has only finite memory/lifespan and therefore you cannot verify an arbitrary conventional computer within any given credence, which is also true. Under favorable conditions, you can quickly verify something in PSPACE (using interactive proof protocols), and given extra assumptions you might be able to do better (if you have two provers that cannot communicate you can do NEXP, or if you have a computer whose memory you can reliably delete you can do an EXP-complete language), however it is not clear whether you can be justifiably highly certain of such extra assumptions.

See also my reply to lbThingrb [LW · GW].

comment by lbThingrb · 2018-10-03T01:43:17.370Z · score: 1 (1 votes) · LW · GW

This can’t be right ... Turing machines are assumed to be able to operate for unbounded time, using unbounded memory, without breaking down or making errors. Even finite automata can have any number of states and operate on inputs of unbounded size. By your logic, human minds shouldn’t be modeling physical systems using such automata, since they exceed the capabilities of our brains.

It’s not that hard to imagine hypothetical experimental evidence that would make it reasonable to believe that hypercomputers could exist. For example, suppose someone demonstrated a physical system that somehow emulated a universal Turing machine with infinite tape, using only finite matter and energy, and that this system could somehow run the emulation at an accelerating rate, such that it computed n steps in seconds. (Let’s just say that it resets to its initial state in a poof of pixie dust if the TM doesn’t halt after one second.)

You could try to reproduce this experiment and test it on various programs whose long-term behavior is predictable, but you could only test it on a finite (to say nothing of computable) set of such inputs. Still, if no one could come up with a test that stumped it, it would be reasonable to conclude that it worked as advertised. (Of course, some alternative explanation would be more plausible at first, given that the device as described would contradict well established physical principles, but eventually the weight of evidence would compel one to rewrite physics instead.)

One could hypothesize that the device only behaved as advertised on inputs for which human brains have the resources to verify the correctness of its answers, but did something else on other inputs, but you could just as well say that about a normal computer. There’d be no reason to believe such an alternative model, unless it was somehow more parsimonious. I don’t know any reason to think that theories that don’t posit uncomputable behavior can always be found which are at least as simple as a given theory that does.

Having said all that, I’m not sure any of it supports either side of the argument over whether there’s an ideal mathematical model of general intelligence, or whether there’s some sense in which intelligence is more fundamental than physics. I will say that I don’t think the Church-Turing thesis is some sort of metaphysical necessity baked into the concept of rationality. I’d characterize it as an empirical claim about (1) human intuition about what constitutes an algorithm, and (2) contingent limitations imposed on machines by the laws of physics.

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-10-03T09:35:43.902Z · score: 4 (3 votes) · LW · GW

It is true that a human brain is more precisely described as a finite automaton than a Turing machine. And if we take finite lifespan into account, then it's not even a finite automaton. However, these abstractions are useful models since they become accurate in certain asymptotic limits that are sufficiently useful to describe reality. On the other hand, I doubt that there is a useful approximation in which the brain is a hypercomputer (except maybe some weak forms of hypercomputation like non-uniform computation / circuit complexity).

Moreover, one should distinguish between different senses in which we can be "modeling" something. The first sense is the core, unconscious ability of the brain to generate models, and in particular that which we experience as intuition. This ability can (IMO) be thought of as some kind of machine learning algorithm, and, I doubt that hypercomputation is relevant there in any way. The second sense is the "modeling" we do by manipulating linguistic (symbolic) constructs in our conscious mind. These constructs might be formulas in some mathematical theory, including formulas that represent claims about uncomputable objects. However, these symbolic manipulations are just another computable process, and it is only the results of these manipulations that we use to generate predictions and/or test models, since this is the only access we have to those uncomputable objects.

Regarding your hypothetical device, I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC? (In particular, the latter could tell you that some Turing machine halts when it "really" doesn't, because in the model it halts after some non-standard number of computing steps.) More generally, given an uncomputable function and a system under test , there is no sequence of computable tests that will allow you to form some credence about the hypothesis s.t. this credence will converge to when the hypothesis is true and when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable ) when you can devise such a sequence of tests. (Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify is in the class, for example the class of all functions s.t. it is consistent with ZFC that is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.)

My point is, the Church-Turing thesis implies (IMO) that the mathematical model of rationality/intelligence should be based on Turing machines at most, and this observation does not strongly depend on assumptions about physics. (Well, if hypercomputation is physically possible, and realized in the brain, and there is some intuitive part of our mind that uses hypercomputation in a crucial way, then this assertion would be wrong. That would contradict my own intuition about what reasoning is (including intuitive reasoning), besides everything we know about physics, but obviously this hypothesis has some positive probability.)

comment by lbThingrb · 2018-10-04T21:12:44.094Z · score: 1 (1 votes) · LW · GW

I didn't mean to suggest that the possibility of hypercomputers should be taken seriously as a physical hypothesis, or at least, any more seriously than time machines, perpetual motion machines, faster-than-light, etc. And I think it's similarly irrelevant to the study of intelligence, machine or human. But in my thought experiment, the way I imagined it working was that, whenever the device's universal-Turing-machine emulator halted, you could then examine its internal state as thoroughly as you liked, to make sure everything was consistent with the hypothesis that it worked as specified (and the non-halting case could be ascertained by the presence of pixie dust 🙂). But since its memory contents upon halting could be arbitrarily large, in practice you wouldn't be able to examine it fully even for individual computations of sufficient complexity. Still, if you did enough consistency checks on enough different kinds of computations, and the cleverest scientists couldn't come up with a test that the machine didn't pass, I think believing that the machine was a true halting-problem oracle would be empirically justified.

It's true that a black box oracle could output a nonstandard "counterfeit" halting function which claimed that some actually non-halting TMs do halt, only for TMs that can't be proved to halt within ZFC or any other plausible axiomatic foundation humans ever come up with, in which case we would never know that it was lying to us. It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases. For example, if it claimed that some actually non-halting TM M halted, we could feed it a program that emulated M and output the number of steps M took to halt. That program would also have to halt, and output some specific number n. In principle, we could then try emulating M for n steps on a regular computer, observe that M hadn't reached a halting state, and conclude that the device was lying to us. If n were large enough, that wouldn't be feasible, but it's a decisive test that a normal computer could execute in principle. I suppose my magical device could instead do something like leave an infinite output string in memory, that a normal computer would never know was infinite, because it could only ever examine finitely much of it. But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We'll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?

Actually, hypothesizing that my device "computed" a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable. A better skeptical hypothesis would be that the device passed off some actually halting TMs as non-halting, but only in cases where the shortest proof that any of those TMs would have halted eventually was too long for humans to have discovered yet. I don't know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis). Intuitively, though, it seems to me that, if you went long enough without finding proof that the device wasn't a true hypercomputer, continuing to insist that such proof would be found at some future time would start to sound like a God-of-the-gaps argument. I think this reasoning is valid even in a hypothetical universe in which human brains couldn't do anything Turing machines can't do, but other physical systems could. I admit that's a nontrivial, contestable conclusion. I'm just going on intuition here.

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-10-05T09:31:15.447Z · score: 1 (1 votes) · LW · GW

Nearly everything you said here was already addressed in my previous comment. Perhaps I didn't explain myself clearly?

It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases.

I wrote before that "I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC?"

So, the realization of a particular hypercomputer in a non-standard model of ZFC would pass all of your tests. You could examine its internal state or its output any way you like (i.e. ask any question that can be formulated in the language of ZFC) and everything you see would be consistent with ZFC. The number of steps for a machine that shouldn't halt would be a non-standard number, so it would not fit on any finite storage. You could examine some finite subset of its digits (either from the end or from the beginning), for example, but that would not tell you the number is non-standard. For any question of the form "is larger than some known number ?" the answer would always be "yes".

But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?

Once again, there is a difference of principle. I wrote before that: "...given an uncomputable function and a system under test , there is no sequence of computable tests that will allow you to form some credence about the hypothesis s.t. this credence will converge to when the hypothesis is true and when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable ) when you can devise such a sequence of tests."

So, with normal computers you can become increasingly certain your hypothesis regarding the computer is true (even if you never become literally 100% certain, except in the limit), whereas with a hypercomputer you cannot.

Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable.

Yes, I already wrote that: "Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify is in the class, for example the class of all functions s.t. it is consistent with ZFC that is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself."

So, yes, you could theoretically become certain the device is a hypercomputer (although reaching high certainly would take very long time), without knowing precisely which hypercomputer it is, but that doesn't mean you need to add non-computable hypotheses to your "prior", since that knowledge would still be expressible as a computable property of the world.

I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis).

Literal Solomonoff induction (or even bounded versions of Solomonoff induction) is probably not the ultimate "true" model of induction, I was just using it as a simple example before. The true model will allow expressing hypotheses such as "all the even-numbered bits in the sequence are ", which involve computable properties of the environment that do not specify it completely. Making this idea precise is somewhat technical.

comment by Chris Hibbert (chris-hibbert-1) · 2018-09-22T16:50:03.657Z · score: 1 (1 votes) · LW · GW
The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be "X", it might be "Y", it might be "both actions are equally good", or it might be even "Z" for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.

This reminds me of my rephrasing of the description of epistemology. The standard description started out as "the science of knowledge" or colloquially, "how do we know what we know". I've maintained, since reading Bartley ("The Retreat to Commitment"), that the right description is "How do we decide what to believe?" So your final sentence seems right to me, but that's different from the rest of your argument, which presumes that there's a "right" answer and our job is finding it. Our job is finding a decision procedure, and studying what differentiates "right" answers from "wrong" answers is useful fodder for that, but it's not the actual goal.

comment by ricraz · 2018-09-18T23:42:19.879Z · score: 1 (5 votes) · LW · GW
"Fitness" just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a "descendant". The latter is not as unambiguously defined as "momentum" but under normal conditions it is quite precise.

Similarly, you can define intelligence as expected performance on a broad suite of tasks. However, what I was trying to get at with "define its fitness in terms of more basic traits" is being able to build a model of how it can or should actually work, not just specify measurement criteria.

I wonder whether the OP also doesn't count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology.

I do consider computational learning theory to be evidence for rationality realism. However, I think it's an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents - to my knowledge it hasn't played an important role in the success of deep learning, for instance. It may be analogous to mathematical models of evolution, which are certainly true but don't help you build better birds.

However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.
...
Now, "rationality" and "intelligence" are in some sense even more fundamental than physics... Therefore, it seems like there has to be a simple theory of intelligence.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it's not the case. Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns - something we're nowhere near able to formalise. I plan to write a follow-up post which describes my reasons for being skeptical about rationality realism in more detail.

We only need to determine "human goals" within the precision to which they are actually well-defined, not within absolute precision.

I agree, but it's plausible that they are much less well-defined than they seem. The more we learn about neuroscience, the more the illusion of a unified self with coherent desires breaks down. There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-09-19T12:27:46.878Z · score: 11 (5 votes) · LW · GW

...what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.

Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!

I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance.

It plays a minor role in deep learning, in the sense that some "deep" algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.

It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

Mathematical models of evolution might help you to build better evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case.

I don't think it's a mere restatement? I am trying to show that "rationality realism" is what you should expect based on Occam's razor, which is a fundamental principle of reason. Possibly I just don't understand your position. In particular, I don't know what epistemology is like in the world you imagine. Maybe it's a subject for your next essay.

Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns

This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does not mean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.

There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

The response might depend on the framing if you're asked a question and given 10 seconds to answer it. If you're allowed to deliberate on the question, and in particular consider alternative framings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn't really change anything. We can still ask the question "given the ability to optimize any utility function over the world now, what utility function should we choose?" Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.

comment by ricraz · 2018-09-19T20:08:28.646Z · score: -1 (3 votes) · LW · GW
It seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
...
It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like "species will evolve faster when there are predators in their environment" and "species which use sexual reproduction will be able to adapt faster to novel environments". The analogous abstract theory of intelligence can tell us things like "agents will be less able to achieve their goals when they are opposed by other agents" and "agents with more compute will perform better in novel environments". These sorts of conclusions are not very useful for safety.

I don't think it's a mere restatement? I am trying to show that "rationality realism" is what you should expect based on Occam's razor, which is a fundamental principle of reason.

Sorry, my response was a little lazy, but at the same time I'm finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn't seem to me that this implies it's simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don't have time to think too much more about this now; will cover it in a follow-up.

You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.

Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation - evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.

We can still ask the question "given the ability to optimize any utility function over the world now, what utility function should we choose?" Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.

I'm going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I'm not sure there's any possible utility function which we'd actually be satisfied with maximising.

comment by Vanessa Kosoy (vanessa-kosoy) · 2018-09-20T11:13:11.137Z · score: 15 (5 votes) · LW · GW

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

As a matter of fact, I emphatically do not agree. "Birds" are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a "theory of self-sustaining, self-replicating machines".

Let's consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don't want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is "simple": a spaceship or, let's say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.

And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon's work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.

Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what "AI safety" means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.

comment by Said Achmiz (SaidAchmiz) · 2018-09-16T16:44:36.277Z · score: 28 (12 votes) · LW · GW

Excellent post!

I find myself agreeing with much of what you say, but there are a couple of things which strike me as… not quite fitting (at least, into the way I have thought about these issues), and also I am somewhat skeptical about whether your attempt at conceptually unifying these concerns—i.e., the concept of “rationality realism”—quite works. (My position on this topic is rather tentative, I should note; all that’s clear to me is that there’s much here that’s confusing—which is, however, itself a point of agreement with the OP, and disagreement with “rationality realists”, who seem much more certain of their view than the facts warrant.)

Some specific points:

… suppose that you just were your system 1, and that your system 2 was mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths)

This seems to me to be a fundamentally confused proposition. Regardless of whether Hanson is right about how our minds work (and I suspect he is right to a large degree, if not quite entirely right), the question of who we are seems to be a matter of choosing which aspect(s) of our minds’ functioning to endorse as ego-syntonic. Under this view, it is nonsensical to speak of a scenario where it “turns out” that I “am just my system 1”.

O: […] Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.

Your quoted reply to this is good, but I just want to note that it’s almost not even necessary. The simpler reply of “you have literally no way of knowing that, and what you just said is completely 100% wild speculation, about a scenario that you don’t even know is possible” would also be sufficient.

(Also, what on earth does “break your thought process” even mean? And what good is being “much more intelligent” if something can “break your thought process” that would leave a “less intelligent” mind unharmed? Etc., etc.)

(That’s all for now, though I may have more to say about this later. For now, I’ll only say again that it’s a startlingly good crystallization of surprisingly many disagreements I’ve had with people on and around Less Wrong, and I’m excited to see this approach to the topic explored further.)

comment by ricraz · 2018-09-16T21:01:33.313Z · score: 17 (8 votes) · LW · GW

Thanks for the helpful comment! I'm glad other people have a sense of the thing I'm describing. Some responses:

I am somewhat skeptical about whether your attempt at conceptually unifying these concerns—i.e., the concept of “rationality realism”—quite works.

I agree that it's a bit of a messy concept. I do suspect, though, that people who see each of the ideas listed above as "natural" do so because of intuitions that are similar both across ideas and across people. So even if I can't conceptually unify those intuitions, I can still identify a clustering.

Regardless of whether Hanson is right about how our minds work (and I suspect he is right to a large degree, if not quite entirely right), the question of who we are seems to be a matter of choosing which aspect(s) of our minds’ functioning to endorse as ego-syntonic [LW · GW]. Under this view, it is nonsensical to speak of a scenario where it “turns out” that I “am just my system 1”.

I was a bit lazy in expressing it, but I think that the underlying idea makes sense (and have edited to clarify a little). There are certain properties we consider key to our identities, like consistency and introspective access. If we find out that system 2 has much less of those than we thought, then that should make us shift towards identifying more with our system 1s. Also, the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice. But I could imagine finding out that this part of my brain is basically just driven by signalling, and then it wouldn't even endorse itself. That also seems like a reason to default to identifying more with your system 1.

Also, what on earth does “break your thought process” even mean?

An analogy: in maths, a single contradiction "breaks the system" because it can propagate into any other proofs and lead to contradictory conclusions everywhere. In humans, it doesn't, because we're much more modular and selectively ignore things. So the relevant question is something like "Are much more intelligent systems necessarily also more math-like, in that they can't function well without being internally consistent?"

comment by Said Achmiz (SaidAchmiz) · 2018-09-16T21:25:48.763Z · score: 15 (6 votes) · LW · GW

I agree that it’s a bit of a messy concept. I do suspect, though, that people who see each of the ideas listed above as “natural” do so because of intuitions that are similar both across ideas and across people. So even if I can’t conceptually unify those intuitions, I can still identify a clustering.

For the record, and in case I didn’t get this across—I very much agree that identifying this clustering is quite valuable.

As for the challenge of conceptual unification, we ought, I think, to treat it as a separate and additional challenge (and, indeed, we must be open to the possibility that a straightforward unification is not, after all, appropriate).

I was a bit lazy in expressing it, but I think that the underlying idea makes sense (and have edited to clarify a little). There are certain properties we consider key to our identities, like consistency and introspective access. If we find out that system 2 has much less of those than we thought, then that should make us shift towards identifying more with our system 1s. Also, the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice. But I could imagine finding out that this part of my brain is basically just driven by signalling, and then it wouldn’t even endorse itself. That also seems like a reason to default to identifying more with your system 1.

I don’t want to go too far down this tangent, as it is not really critical to your main point, but I actually don’t agree with the claim “the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice”; that is why I was careful to speak of endorsing aspects of our minds’ functioning, rather than identifying with parts of ourselves. I’ve spoken of, elsewhere, of my skepticism toward the notion of conceptually dividing one’s own mind, and then selecting one of the sections to identify with. But this is a complex topic, and deserves dedicated treatment; best to set it aside for now, I think.

So the relevant question is something like “Are much more intelligent systems necessarily also more math-like, in that they can’t function well without being internally consistent?”

I think that this formulation makes sense.

To me, then, it suggests some obvious follow-up questions, which I touched upon in my earlier reply:

In what sense, exactly, are these purportedly “more intelligent” systems actually “more intelligent”, if they lack the flexibility and robustness of being able to hold contradictions in one’s mind? Or is this merely a flaw in human mental architecture? Might it, rather, be the case that these “more intelligent” systems are simply better than human-like minds at accomplishing their goals, in virtue of their intolerance for inconsistency? But it is not clear how such a claim survives the observation that humans are often inconsistent in what our goals are; it is not quite clear what it means to better accomplish inconsistent goals by being more consistent…

To put it another way, there seems to be some manner of sleight of hand (perhaps an unconscious one) being performed with the concept of “intelligence”. I can’t quite put my finger on the nature of the trick, but something, clearly, is up.

comment by cousin_it · 2018-09-20T09:42:58.929Z · score: 18 (6 votes) · LW · GW

The idea that there is an “ideal” decision theory.

There are many classes of decision problems that allow optimal solutions, but none of them can cover all of reality, because in reality an AI can be punished for having any given decision theory. That said, the design space of decision theories has sweet spots. For example, future AIs will likely face an environment where copying and simulation is commonplace, and we've found simple decision theories that allow for copies and simulations. Looking for more such sweet spots is fun and fruitful.

comment by ricraz · 2018-09-20T18:17:43.842Z · score: 1 (1 votes) · LW · GW

Imo we haven't found a simple decision theory that allows for copies and simulations. We've found a simple rule that works in limiting cases, but is only well-defined for identical copies (modulo stochasticity). My expectation that FDT will be rigorously extended from this setting is low, for much the same reason that I don't expect a rigorous definition of CDT. You understand FDT much better than I do, though - would you say that's a fair summary?

comment by cousin_it · 2018-09-20T20:16:35.684Z · score: 8 (4 votes) · LW · GW

If all agents involved in a situation share the same utility function over outcomes, we should be able to make them coordinate despite having different source code. I think that's where one possible boundary will settle, and I expect the resulting theory to be simple. Whereas in case of different utility functions we enter the land of game theory, where I'm pretty sure there can be no theory of unilateral decision making.

comment by ricraz · 2018-09-20T21:23:48.944Z · score: 1 (1 votes) · LW · GW

I'm not convinced by the distinction you draw. Suppose you simulate me at slightly less than perfect fidelity. The simulation is an agent with a (slightly) different utility function to me. Yet this seems like a case where FDT should be able to say relevant things.

In Abram's words [LW · GW],

FDT requires a notion of logical causality, which hasn't appeared yet.

I expect that logical causality will be just as difficult to formalise as normal causality, and in fact that no "correct" formalisation exists for either.

comment by cousin_it · 2018-09-18T09:23:24.994Z · score: 17 (6 votes) · LW · GW

Great post, thank you for writing this! Your list of natural-seeming ideas is very thought provoking.

The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/​or intelligence in general.

I used to think that way, but now I agree with your position more. Something like Bayesian rationality is a small piece that many problems have in common, but any given problem will have lots of other structure to be exploited as well. In many AI problems, like recognizing handwriting or playing board games, that lets you progress faster than if you'd tried to start with the Bayesian angle.

We could still hope that the best algorithm for any given problem will turn out to be simple. But that seems unlikely, judging from both AI tasks like MNIST, where neural nets beat anything hand-coded, and non-AI tasks like matrix multiplication, where asymptotically best algorithms have been getting more and more complex. As a rule, algorithms don't get simpler as they get better.

comment by Vladimir_Nesov · 2018-09-18T14:38:11.736Z · score: 16 (3 votes) · LW · GW

I'm not sure what you changed your mind about. Some of the examples you give are unconvincing, as they do have simple meta-algorithms that both discover the more complicated better solutions and analyse their behavior. My guess is that the point is that for example looking into nuance of things like decision theory is an endless pursuit, with more and more complicated solutions accounting for more and more unusual aspects of situations (that can no longer be judged as clearly superior), and no simple meta-algorithm that could've found these more complicated solutions, because it wouldn't know what to look for. But that's content of values, the thing you look for in human behavior, and we need at least a poor solution to the problem of making use of that. Perhaps you mean that even this poor solution is too complicated for humans to discover?

comment by ricraz · 2018-09-18T23:48:36.518Z · score: 1 (1 votes) · LW · GW

There's a difference between discovering something and being able to formalise it. We use the simple meta-algorithm of gradient descent to train neural networks, but that doesn't allow us to understand their behaviour.

Also, meta-algorithms which seem simple to us may not in fact be simple, if our own minds are complicated to describe.

comment by TurnTrout · 2018-09-18T13:01:08.302Z · score: 12 (3 votes) · LW · GW

My impression is that an overarching algorithm would allow the agent to develop solutions for the specialized tasks, not that it would directly constitute a perfect solution. I don’t quite understand your position here – would you mind elaborating?

comment by cousin_it · 2018-09-19T23:43:08.765Z · score: 5 (2 votes) · LW · GW

My position goes something like this.

There are many problems to be solved. Each problem may or may not have regularities to be exploited. Some regularities are shared among many problems, like Bayes structure, but others are unique. Solving a problem in reasonable time might require exploiting multiple regularities in it, so Bayes structure alone isn't enough. There's no algorithm for exploiting all regularities in all problems in reasonable time (this is similar to P≠NP). You can combine algorithms for exploiting a bunch of regularities, ending up with a longer algorithm that can't be compressed very much and doesn't have any simple core. Human intelligence could be like that: a laundry list of algorithms that exploit specific regularities in our environment.

comment by romeostevensit · 2018-09-23T02:51:23.948Z · score: 9 (2 votes) · LW · GW

> algorithms don't get simpler as they get better.

or s you minimize cost along one dimension costs get pushed into other dimensions. Aether variables apply at the level of representation too.

comment by Wei_Dai · 2018-11-13T01:44:32.297Z · score: 9 (3 votes) · LW · GW

It’s a mindset which makes the following ideas seem natural

I think within "realism about rationality" there are at least 5 plausible positions one could take on other metaethical issues, some of which do not agree with all the items on your list, so it's not really a single mindset. See this post [LW · GW], where I listed those 5 positions along with the denial of "realism about rationality" as the number 6 position (which I called normative anti-realism), and expressed my uncertainty as to which is the right one.

comment by Kaj_Sotala · 2018-09-20T08:11:17.276Z · score: 9 (5 votes) · LW · GW

Curated this post for:

  • Having a very clear explanation of what feels like a central disagreement in many discussions, which has been implicit in many previous conversations but not explicitly laid out.
  • Having lots of examples of what kinds of ideas this mindset makes seem natural.
  • Generally being the kind of a post which I expect to be frequently referred back to as the canonical explanation of the thing.
comment by Raemon · 2018-09-17T03:56:11.150Z · score: 8 (4 votes) · LW · GW

Although not exactly the central point, seemed like a good time to link back to "Do you identify as the elephant or the rider? [LW · GW]"

comment by TruePath · 2018-09-19T03:04:43.211Z · score: 7 (5 votes) · LW · GW

First, let me say I 100% agree with the idea that there is a problem in the rationality community of viewing rationality as something like momentum or gold (I named my blog rejectingrationality after this phenomena and tried to deal with it in my first post).

However, I'm not totally sure everything you say falls under that concept. In particular, I'd say that rationality realism is something like the belief that there is a fact of the matter about how best to form beliefs or take actions in response to a particular set of experiences and that many facts about this (going far beyond don't be dutch booked). With the frequent additional belief that what is rational to do in response to various kind of experiences can be inferred by a priori considerations, e.g., think about all the ways that rule X might lead you wrong in certain possible situations so X can't be rational.

When I've raised this issue in the past the response I've gotten from both Yudkowsky and Hanson is: "But of course we can try to be less wrong," i.e., have less false beliefs. And of course that is true but that's a very different notion than the notion of rationality used by rationality realists and misses the way that much of the rationality's community's talk about rationality isn't about literally being less wrong but about classify rules for reaching beliefs into rational and irrational even when they don't disagree in the actual world.

In particular, if all I'm doing is analyzing how to be less wrong I can't criticize people who dogmatically believe things that happen to be true. After all, if god does exist, than dogmatically believing he does makes the people who do less wrong. Similarly the various critiques of human psychological dispositions as leading us to make wrong choices in some kinds of cases isn't sufficient if those cases are rare and cases where it yields better results are common. However, those who are rationality realists suggest that there is some fact of the matter which makes these belief forming strategies irrational and thus appropriate to eschew and criticize. But, ultimately, aside from merely avoiding getting dutch booked, no rule for belief forming can assure it is less wrong than another in all possible worlds.

comment by linkhyrule5 · 2018-09-22T18:46:17.526Z · score: 6 (3 votes) · LW · GW

I was kind of iffy about this post until the last point, which immediately stood out to me as something I vehemently disagree with. Whether or not humans naturally have values or are consistent is irrelevant -- that which is not required will happen only at random and thus tend not to happen at all, and so if you aren't very very careful to actually make sure you're working in a particular coherent direction, you're probably not working nearly as efficiently as you could be and may in fact be running in circles without noticing.

comment by Kaj_Sotala · 2018-09-17T13:56:46.513Z · score: 6 (3 votes) · LW · GW

I like this post and the concept in general, but would prefer slightly different terminology. To me, a mindset being called "realism about rationality" implies that this is the realistic, or correct mindset to have; a more neutral name would feel appropriate. Maybe something like "'rationality is math' mindset" or "'intelligence is intelligible' mindset"?

comment by ricraz · 2018-09-17T18:40:36.302Z · score: 6 (3 votes) · LW · GW

Thanks for the link, I hadn't seen that paper before and it's very interesting.

A mindset being called "realism about rationality" implies that this is the realistic, or correct mindset to have.

I chose "rationality realism" as a parallel to "moral realism", which I don't think carries the connotations you mentioned. I do like "intelligence is intelligible" as an alternative alliteration, and I guess Anna et al. have prior naming rights. I think it would be a bit confusing to retitle my post now, but happy to use either going forward.

comment by Kaj_Sotala · 2018-09-17T20:16:50.524Z · score: 5 (2 votes) · LW · GW
I hadn't seen that paper before and it's very interesting.

Glad you liked it!

I chose "rationality realism" as a parallel to "moral realism", which I don't think carries the connotations you mentioned.

I guess you could infer that "just as moral realism implies that objective morality is real, rationality realism implies that objective rationality is real", but that interpretation didn't even occur to me before reading this comment. And also importantly, "rationality realism" wasn't the term that you used in the post; you used "realism about rationality". "Realism about morality" would also have a different connotation than "moral realism" does.

comment by Benquo · 2018-09-17T21:06:12.110Z · score: 3 (2 votes) · LW · GW

I realized a few paragraphs in that this was meant to be parallel to "moral realism," and I agree that a title of "rationality realism" would have been clearer.

comment by drossbucket · 2018-09-17T05:48:51.070Z · score: 6 (4 votes) · LW · GW

Thanks for writing this, it's a very concise summary of the parts of LW I've never been able to make sense of, and I'd love to have a better understanding of what makes the ideas in your bullet-pointed list appealing to those who tend towards 'rationality realism'. (It's sort of a background assumption in most LW stuff, so it's hard to find places where it's explicitly justified.)

Also:

What CFAR calls “purple”.

Is there any online reference explaining this?

comment by ricraz · 2018-09-19T01:38:05.547Z · score: 6 (4 votes) · LW · GW

I had a quick look for an online reference to link to before posting this, and couldn't find anything. It's not a particularly complicated theory, though: "purple" ideas are vague, intuitive, pre-theoretic; "orange" ones are explicable, describable and model-able. A lot of AI safety ideas are purple, hence why CFAR tells people not just to ignore them like they would in many technical contexts.

I'll publish a follow-up post with arguments for and against realism about rationality.

comment by drossbucket · 2018-09-19T05:08:19.066Z · score: 1 (1 votes) · LW · GW

Thanks for the explanation!

comment by avturchin · 2018-09-20T23:22:17.704Z · score: 4 (3 votes) · LW · GW

Some other ideas for the list of the "rationality realism":

  • Probability actually exists, and there is a correct theory of it.
  • Humans have values.
  • Rationality could be presented as a short set of simple rules.
  • Occam razor implies that simplest explanation is the correct one.
  • Intelligence could be measured by a single scalar - IQ.
comment by ricraz · 2018-09-21T09:02:37.044Z · score: 1 (1 votes) · LW · GW

These ideas are definitely pointing in the direction of rationality realism. I think most of them are related to items on my list, although I've tried to phrase them in less ambiguous ways.

comment by DragonGod · 2018-09-30T08:07:14.674Z · score: 2 (2 votes) · LW · GW

I consider myself a rational realist, but I don't believe some of the things you attribute to rational realism (particularly concerning morality) and particularly concerning consciousness. I don't think there's a true decision theory or true morality, but I do think that you could find systems of reasoning that are provably optimal within certain formal models.

There is no sense in which our formal models are true, but as long as they have high predictive power the models would be useful, and that I think is all that matters.