Posts

What achievements have people claimed will be warning signs for AGI? 2020-04-01T10:24:12.332Z · score: 17 (7 votes)
What information, apart from the connectome, is necessary to simulate a brain? 2020-03-20T02:03:15.494Z · score: 17 (7 votes)
Characterising utopia 2020-01-02T00:00:01.268Z · score: 27 (8 votes)
Technical AGI safety research outside AI 2019-10-18T15:00:22.540Z · score: 36 (13 votes)
Seven habits towards highly effective minds 2019-09-05T23:10:01.020Z · score: 39 (10 votes)
What explanatory power does Kahneman's System 2 possess? 2019-08-12T15:23:20.197Z · score: 33 (16 votes)
Why do humans not have built-in neural i/o channels? 2019-08-08T13:09:54.072Z · score: 26 (12 votes)
Book review: The Technology Trap 2019-07-20T12:40:01.151Z · score: 30 (14 votes)
What are some of Robin Hanson's best posts? 2019-07-02T20:58:01.202Z · score: 36 (10 votes)
On alien science 2019-06-02T14:50:01.437Z · score: 46 (15 votes)
A shift in arguments for AI risk 2019-05-28T13:47:36.486Z · score: 32 (13 votes)
Would an option to publish to AF users only be a useful feature? 2019-05-20T11:04:26.150Z · score: 14 (5 votes)
Which scientific discovery was most ahead of its time? 2019-05-16T12:58:14.628Z · score: 39 (10 votes)
When is rationality useful? 2019-04-24T22:40:01.316Z · score: 29 (7 votes)
Book review: The Sleepwalkers by Arthur Koestler 2019-04-23T00:10:00.972Z · score: 75 (22 votes)
Arguments for moral indefinability 2019-02-12T10:40:01.226Z · score: 54 (18 votes)
Coherent behaviour in the real world is an incoherent concept 2019-02-11T17:00:25.665Z · score: 38 (16 votes)
Vote counting bug? 2019-01-22T15:44:48.154Z · score: 7 (2 votes)
Disentangling arguments for the importance of AI safety 2019-01-21T12:41:43.615Z · score: 124 (45 votes)
Comments on CAIS 2019-01-12T15:20:22.133Z · score: 72 (19 votes)
How democracy ends: a review and reevaluation 2018-11-27T10:50:01.130Z · score: 17 (9 votes)
On first looking into Russell's History 2018-11-08T11:20:00.935Z · score: 35 (11 votes)
Speculations on improving debating 2018-11-05T16:10:02.799Z · score: 26 (10 votes)
Implementations of immortality 2018-11-01T14:20:01.494Z · score: 21 (8 votes)
What will the long-term future of employment look like? 2018-10-24T19:58:09.320Z · score: 11 (4 votes)
Book review: 23 things they don't tell you about capitalism 2018-10-18T23:05:29.465Z · score: 19 (11 votes)
Book review: The Complacent Class 2018-10-13T19:20:05.823Z · score: 21 (9 votes)
Some cruxes on impactful alternatives to AI policy work 2018-10-10T13:35:27.497Z · score: 155 (54 votes)
A compendium of conundrums 2018-10-08T14:20:01.178Z · score: 12 (12 votes)
Thinking of the days that are no more 2018-10-06T17:00:01.208Z · score: 13 (6 votes)
The Unreasonable Effectiveness of Deep Learning 2018-09-30T15:48:46.861Z · score: 87 (27 votes)
Deep learning - deeper flaws? 2018-09-24T18:40:00.705Z · score: 43 (18 votes)
Book review: Happiness by Design 2018-09-23T04:30:00.939Z · score: 14 (6 votes)
Book review: Why we sleep 2018-09-19T22:36:19.608Z · score: 52 (25 votes)
Realism about rationality 2018-09-16T10:46:29.239Z · score: 171 (79 votes)
Is epistemic logic useful for agent foundations? 2018-05-08T23:33:44.266Z · score: 19 (6 votes)
What we talk about when we talk about maximising utility 2018-02-24T22:33:28.390Z · score: 27 (8 votes)
In Defence of Conflict Theory 2018-02-17T03:33:01.970Z · score: 25 (10 votes)
Is death bad? 2018-01-13T04:55:25.788Z · score: 8 (4 votes)

Comments

Comment by ricraz on How special are human brains among animal brains? · 2020-04-06T12:37:57.865Z · score: 2 (1 votes) · LW · GW

I think whether the additional complexity is mundane or not depends on how you're producing the agent. Humans can scale up human-designed engineering products fairly easily, because we have a high-level understanding of how the components all fit together. But if you have a big neural net whose internal composition is mostly determined by the optimiser, then it's much less clear to me. There are some scaling operations which are conceptually very easy for humans, and also hard to do via gradient descent. As a simple example, in a big neural network where the left half is doing subcomputation X and the right half is doing subcomputation Y, it'd be very laborious for the optimiser to swap it so the left half is doing Y and the right half is doing X - since the optimiser can only change the network gradually, and after each gradient update the whole thing needs to still work. This may be true even if swapping X and Y is a crucial step towards scaling up the whole system, which will later allow much better performance.

In other words, we're biased towards thinking that scaling is "mundane" because human-designed systems scale easily (and to some extent, because evolution-designed systems also scale easily). It's not clear that AIs also have this property; there's a whole lot of retraining involved in going from a small network to a bigger network (and in fact usually the bigger network is trained from scratch rather than starting from a scaled-up version of the small one).

Comment by ricraz on How special are human brains among animal brains? · 2020-04-05T12:53:01.062Z · score: 2 (1 votes) · LW · GW

A couple of intuitions:

  • Koko the gorilla had partial language competency.
  • The ability to create and understand combinatorially many sentences - not necessarily with fully recursive structure, though. For example, if there's a finite number of sentence templates, and then the animal can substitute arbitrary nouns and verbs into them (including novel ones).
  • The sort of things I imagine animals with partial language saying are:
    • There's a lion behind that tree.
    • Eat the green berries, not the red berries.
    • I'll mate with you if you bring me a rabbit.

"Once one species gets a small amount of language ability, they always quickly master language and become the dominant species" - this seems clearly false to me, because most species just don't have the potential to quickly become dominant. E.g. birds, small mammals, reptiles, short-lived species..

Comment by ricraz on How special are human brains among animal brains? · 2020-04-02T14:21:27.736Z · score: 2 (1 votes) · LW · GW

It's not that we'd wipe out another species which started to demonstrate language. Rather, since the period during which humans have had language is so short, it'd be an unlikely coincidence for another species to undergo the process of mastering language during the period in which we already had language.

Comment by ricraz on How special are human brains among animal brains? · 2020-04-01T13:51:46.048Z · score: 3 (2 votes) · LW · GW

+1. It feels like this argument is surprisingly prominent in the post given that it's a n=1 anecdote, with potential confounders as mentioned above.

Comment by ricraz on How special are human brains among animal brains? · 2020-04-01T13:42:48.519Z · score: 9 (3 votes) · LW · GW

Nice post; I think I agree with most of it. Two points I want to make:

Or is this “qualitative difference” illusory, with the vast majority of human cognitive feats explainable as nothing more than a scaled-up version of the cognitive feats of lower animals?

This seems like a false dichotomy. We shouldn't think of scaling up as "free" from a complexity perspective - usually when scaling up, you need to make quite a few changes just to keep individual components working. This happens in software all the time: in general it's nontrivial to roll out the same service to 1000x users.

One possibility is that the first species that masters language, by virtue of being able to access intellectual superpowers inaccessible to other animals, has a high probability of becoming the dominant species extremely quickly.

I think this explanation makes sense, but it raises the further question of why we don't see other animal species with partial language competency. There may be an anthropic explanation here - i.e. that once one species gets a small amount of language ability, they always quickly master language and become the dominant species. But this seems unlikely: e.g. most birds have such severe brain size limitations that, while they could probably have 1% of human language, I doubt they could become dominant in anywhere near the same way we did.

There's some discussion of this point in Laland's book Darwin's Unfinished Symphony, which I recommend. He argues that the behaviour of deliberate teaching is uncommon amongst animals, and doesn't seem particularly correlated with intelligence - e.g. ants sometimes do it, whereas many apes don't. His explanation is that students from more intelligent species are easier to teach, but would also be more capable of picking up the behaviour by themselves without being taught. So there's not a monotonically increasing payoff to teaching as student intelligence increases - but humans are the exception (via a mechanism I can't remember; maybe due to prolonged infancy?), which is how language evolved. This solves the problem of trustworthiness in language evolution, since you could start off by only using language to teach kin.

A second argument he makes is that the returns from increasing fidelity of cultural transmission start off low, because the amount of degradation is exponential in the number of times a piece of information transmitted. Combined with the previous paragraph, this may explain why we don't see partial language in any other species, but I'm still fairly uncertain about this.

Comment by ricraz on "No evidence" as a Valley of Bad Rationality · 2020-03-30T13:22:12.438Z · score: 5 (3 votes) · LW · GW

I think the fact that chemotherapy isn't a very good example demonstrates a broader problem with this post: that maybe in general your beliefs will be more accurate if you stick with the null hypothesis until you have significant evidence otherwise. Doing so often protects you from confirmation bias, bias towards doing something, and the more general failure to imagine alternative possibilities. Sure, there are some cases where, on the inside view, you should update before the studies come in, but there are also plenty of cases where your inside view is just wrong.

Comment by ricraz on Can crimes be discussed literally? · 2020-03-24T15:18:04.048Z · score: 6 (4 votes) · LW · GW

Yeah, "built on lies" is far from a straightforward summary - it emphasises the importance of lies far beyond what you've argued for.

The system relies on widespread willingness to falsify records, and would (temporarily) grind to a halt if people were to simply refuse to lie.

The hospital system also relies on widespread willingness to take out the trash, and would (temporarily) grind to a halt if people were to simply refuse to dispose of trash. Does it mean that "the hospital system is built on trash disposal"? (Analogy mostly, but not entirely, serious).

everyone says Y and the system wouldn't work without it, so it's not reasonable to call it fraud.

This seems like a pretty reasonable argument against X being fraudulent. If X are making claims that everyone knows are false, then there's no element of deception, which is important for (at least my layman's understanding of) fraud. Compare: a sports fan proclaiming that their team is the greatest. Is this fraud?

Comment by ricraz on Is the coronavirus the most important thing to be focusing on right now? · 2020-03-19T00:11:10.890Z · score: 8 (4 votes) · LW · GW

On 1: How much time do people need to spend reading & arguing about coronavirus before they hit dramatically diminishing marginal returns? How many LW-ers have already reached that point?

On 3a: I'm pretty skeptical about marginal thought from people who aren't specialists actually doing anything - unless you're planning to organise tests or similar. What reason do you have to think LW posts will be useful?

On 3b: It feels like you could cross-apply this logic pretty straightforwardly to argue that LW should have a lot of political discussion; it has many of the same upsides, and also many of the same downsides. The very fact that LW has so much coronavirus coverage already demonstrates that the addictiveness of discussing this topic is comparable to that of politics.

Comment by ricraz on Is the coronavirus the most important thing to be focusing on right now? · 2020-03-19T00:04:56.237Z · score: 29 (21 votes) · LW · GW

I think LW has way too much coronavirus coverage. It was probably useful for us to marshal information when very few others were focusing on it. That was the "exam" component Raemon mentioned. Now, though, we're stuck in a memetic trap where this high-profile event will massively distract us from things that really matter. I think we should treat this similarly to Slate Star Codex's culture wars, because it seems to have a similar effect: recognise that our brains are built to overengage with this sort of topic, put it in an isolated thread, and quarantine it from the rest of the site as much as possible.

Comment by ricraz on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-02-19T18:29:10.812Z · score: 2 (1 votes) · LW · GW

Paul is implicitly conditioning his actions on being in a world where there's a decent amount of expected value left for his actions to affect. This is technically part of a decision procedure, rather than a statement about epistemic credences, but it's confusing because he frames it as an epistemic credence.

Comment by ricraz on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T13:25:06.859Z · score: 14 (8 votes) · LW · GW

Related: Jess Whittlestone's PhD thesis, titled "The importance of making assumptions: why confirmation is not necessarily a bias."

I realised that most of the findings commonly cited as evidence for confirmation bias were much less convincing than they first seemed. In large part, this was because the complex question of what it really means to say that something is a ‘bias’ or ‘irrational’ is unacknowledged by most studies of confirmation bias. Often these studies don’t even state what standard of rationality they were claiming people were ‘irrational’ with respect to, or what better judgements might look like. I started to come across more and more papers suggesting that findings classically thought of demonstrating a confirmation bias might actually be interpreted as rational under slightly different assumptions - and found often these papers had much more convincing arguments, based on more thorough theories of rationality.
...
[I came to] conclusions I would not have expected myself to be sympathetic to a few years ago: that the extent to which our prior beliefs influence reasoning may well be adaptive across a range of scenarios given the various goals we are pursuing, and that it may not always be better to be ‘more open-minded’. It’s easy to say that people should be more willing to consider alternatives and less influenced by what they believe, but much harder to say how one does this. Being a total ‘blank slate’ with no assumptions or preconceptions is not a desirable or realistic starting point, and temporarily ‘setting aside’ one’s beliefs and assumptions whenever it would be useful to consider alternatives is incredibly cognitively demanding, if possible to do at all. There are tradeoffs we have to make, between the benefits of certainty and assumptions, and the benefits of having an ‘open mind’, that I had not acknowledged before.
Comment by ricraz on Demons in Imperfect Search · 2020-02-12T17:10:13.786Z · score: 8 (4 votes) · LW · GW

Oh actually, I now see the explanation, from the same post, that this can arise when the gene causing male bias is itself on the Y-chromosome.

Segregation-distorters subvert the mechanisms that usually guarantee fairness of sexual reproduction. For example, there is a segregation-distorter on the male sex chromosome of some mice which causes only male children to be born, all carrying the segregation-distorter. Then these males impregnate females, who give birth to only male children, and so on. You might cry "This is cheating!" but that's a human perspective; the reproductive fitness of this allele is extremely high, since it produces twice as many copies of itself in the succeeding generation as its nonmutant alternative. Even as females become rarer and rarer, males carrying this gene are no less likely to mate than any other male, and so the segregation-distorter remains twice as fit as its alternative allele. It's speculated that real-world group selection may have played a role in keeping the frequency of this gene as low as it seems to be. In which case, if mice were to evolve the ability to fly and migrate for the winter, they would probably form a single reproductive population, and would evolve to extinction as the segregation-distorter evolved to fixation.
Comment by ricraz on Demons in Imperfect Search · 2020-02-12T13:58:15.201Z · score: 3 (2 votes) · LW · GW

+1, creating a self-reinforcing feedback loop =/= being an optimiser, and so I think any explanation of demons needs to focus on them making deliberate choices to reinforce themselves.

Comment by ricraz on Demons in Imperfect Search · 2020-02-12T13:55:28.428Z · score: 6 (3 votes) · LW · GW
This can kick off an unstable feedback loop, e.g. a gene which biases toward male children can result in a more and more male-skewed population until the species dies out.

I'm suspicious of this mechanism; I'd think that as the number of males increases, there's increasing selection pressure against this gene. Do you have a reference?

Comment by ricraz on Disentangling arguments for the importance of AI safety · 2020-02-10T15:51:25.851Z · score: 3 (2 votes) · LW · GW

I think #3 could occur because of #2 (which I now mostly call "inner misalignment"), but it could also occur because of outer misalignment.

Broadly speaking, though, I think you're right that #2 and #3 are different types of things. Because of that and other issues, I no longer think that this post disentangles the arguments satisfactorily; I'll make a note of this at the top of the document.

Comment by ricraz on Gradient hacking · 2020-01-30T11:31:21.689Z · score: 4 (2 votes) · LW · GW

I wasn't claiming that there'll be an explicit OR gate, just something functionally equivalent to it. To take a simple case, imagine that the two subnetworks output a real number each, which are multiplied together to get a final output, which we can interpret as the agent's reward (there'd need to be some further module which chooses behaviours in order to get that much reward, but let's ignore it for now). Each of the submodules' outputs measures how much that subnetwork thinks the agent's original goal has been preserved. Suppose that normally both subnetworks output 1, and then they switch to outputting 0 when they think they've passed the threshold of corruption, which makes the agent get 0 reward.

I agree that, at this point, there's no gradient signal to change the subnetworks. My points are that:

  1. There's still a gradient signal to change the OR gate (in this case, the implementation of multiplication).
  2. Consider how they got to the point of outputting 0. They must have been decreasing from 1 as the overall network changed. So as the network changed, and they started producing outputs less than 1, there'd be pressure to modify them.
  3. The point above isn't true if the subnetworks go from 1 to 0 within one gradient step. In that case, the network will likely either bounce back and forth across the threshold (eroding the OR gate every time it does so) or else remain very close to the threshold (since there's no penalty for doing so). But since the transition from 1 to 0 needs to be continuous at *some* resolution, staying very *very* close to the threshold will produce subnetwork output somewhere between 0 and 1, which creates pressure for the subnetworks to be less accurate.

4. It's non-obvious that agents will have anywhere near enough control over their internal functioning to set up such systems. Have you ever tried implementing two novel independent identical submodules in your brain? (Independence is very tricky because they're part of the same plan, and so a change in your underlying motivation to pursue that plan affects both). Ones which are so sensitive to your motivations that they can go from 1 to 0 within the space of a single gradient update?

To be honest, this is all incredibly speculative, so please interpret all of the above with the disclaimer that it's probably false or nonsensical for reasons I haven't thought of yet.

An intuition I'm drawing on here: https://lamport.azurewebsites.net/pubs/buridan.pdf

Comment by ricraz on Gradient hacking · 2020-01-30T04:40:52.259Z · score: 4 (2 votes) · LW · GW

In the section you quoted I'm talking about the case in which the extent to which the agent fails is fairly continuous. Also note that the OR function is not differentiable, and so the two subnetworks must be implementing some continuous approximation to it. In that case, it seems likely to me that there's a gradient signal to change the failing-hard mechanism.

I feel like the last sentence was a little insufficient but I'm pretty uncertain about how to think intuitively about this topic. The only thing I'm fairly confident about is that intuitions based on discrete functions are somewhat misleading.

Comment by ricraz on Gradient hacking · 2020-01-29T14:35:57.745Z · score: 4 (2 votes) · LW · GW
The original footnote provides one example of this, which is for the model to check if its objective satisfies some criterion, and fail hard if it doesn't. Now, if the model gets to the point where it's actually just failing because of this, then gradient descent will probably just remove that check—but the trick is never to actually get there. By having such a check in the first place, the model makes it so that gradient descent won't actually change its objective, since any change to its objective (keeping all the other parameters fixed, which is what gradient descent does since it computes partial derivatives) would lead to such a failure.

I don't think this argument works. After the agent has made that commitment, it needs to set some threshold for the amount of goal shift that will cause it to fail hard. But until the agent hits that threshold, the gradient will continue to point in the direction of that threshold. And with a non-infinitesimal learning rate, you'll eventually cross that threshold, and the agent will respond by failing hard.

A possible counterargument: the agent's ability to detect and enforce that threshold is not discrete, but also continuous, and so approaching the threshold will incur a penalty. But if that's the case, then the gradients will point in the direction of removing the penalty by reducing the agent's determination to fail upon detecting goal shift.

The way that this might still work is if modifications to this type of high-level commitment are harder to "detect" in partial derivatives than modifications to the underlying goals - e.g. if it's hard to update away from the commitment without reducing the agent's competence in other ways. And this seems kinda plausible, because high-level thought narrows down the space of outcomes sharply. But this is even more speculative.

Comment by ricraz on Realism about rationality · 2020-01-20T02:25:51.114Z · score: 2 (1 votes) · LW · GW

I'll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is "to start with idealized rationality and try to drag it down to Earth rather than the other way around". If the starting point is incoherent, then this approach doesn't seem like it'll go far - if AIXI isn't useful to study, then probably AIXItl isn't either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).

I appreciate that this isn't an argument that I've made in a thorough or compelling way yet - I'm working on a post which does so.

Comment by ricraz on Realism about rationality · 2020-01-14T11:01:48.270Z · score: 2 (1 votes) · LW · GW

Yeah, I should have been much more careful before throwing around words like "real". See the long comment I just posted for more clarification, and in particular this paragraph:

I'm not trying to argue that concepts which we can't formalise "aren't real", but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can't formalise, and that it's those incoherent extrapolations which "aren't real" (I agree that this was quite unclear in the original post).
Comment by ricraz on Realism about rationality · 2020-01-14T10:40:02.315Z · score: 9 (4 votes) · LW · GW

I like this review and think it was very helpful in understanding your (Abram's) perspective, as well as highlighting some flaws in the original post, and ways that I'd been unclear in communicating my intuitions. In the rest of my comment I'll try write a synthesis of my intentions for the original post with your comments; I'd be interested in the extent to which you agree or disagree.

We can distinguish between two ways to understand a concept X. For lack of better terminology, I'll call them "understanding how X functions" and "understanding the nature of X". I conflated these in the original post in a confusing way.

For example, I'd say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don't yet exist, or even prove things about those components (e.g. there's probably useful maths connecting graph theory with optimal nerve wiring), but it's still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.

By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like "the number and quality of an organism's offspring". Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.

Momentum isn't really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don't know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature - but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).

I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.

However, I still think that there are different ways you might go about understanding the nature of intelligence, and that "something kind of like rationality realism" might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein's thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I'm not sure. There's an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don't think that the extreme cases of fitness even make sense. So I would say that I am not a realist about "perfect fitness", even though the concept of fitness itself seems fine.

So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like "if we succeed in finding a theory that tells us the nature of intelligence, it still won't make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions."

The reason I called it "rationality realism" not "intelligence realism" is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn't. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there's usually an assumption that "perfect rationality" exists. I'm not trying to argue that concepts which we can't formalise "aren't real", but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can't formalise, and that it's those incoherent extrapolations like "perfect fitness" which "aren't real" (I agree that this was quite unclear in the original post).

My proposed redefinition:

  • The "intelligence is intelligible" hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
  • The "realism about rationality" hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as "perfect rationality", and "well-defined" with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we'll ever discover).
Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-04T22:01:34.741Z · score: 6 (3 votes) · LW · GW

Cool, thanks for those clarifications :) In case it didn't come through from the previous comments, I wanted to make clear that this seems like exciting work and I'm looking forward to hearing how follow-ups go.

Comment by ricraz on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-04T20:55:17.852Z · score: 2 (1 votes) · LW · GW

Yes, but the fact that the fragile worlds are much more likely to end in the future is a reason to condition your efforts on being in a robust world.

While I do buy Paul's argument, I think it'd be very helpful if the various summaries of the interviews with him were edited to make it clear that he's talking about value-conditioned probabilities rather than unconditional probabilities - since the claim as originally stated feels misleading. (Even if some decision theories only use the former, most people think in terms of the latter).

Comment by ricraz on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-04T20:44:52.912Z · score: 4 (2 votes) · LW · GW

Some abstractions are heavily determined by the territory. The concept of trees is pretty heavily determined by the territory. Whereas the concept of betrayal is determined by the way that human minds function, which is determined by other people's abstractions. So while it seems reasonably likely to me that an AI "naturally thinks" in terms of the same low-level abstractions as humans, it thinking in terms of human high-level abstractions seems much less likely, absent some type of safety intervention. Which is particularly important because most of the key human values are very high-level abstractions.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-04T20:19:09.091Z · score: 4 (2 votes) · LW · GW

I have four concerns even given that you're using a proper scoring rule, which relate to the link between that scoring rule and actually giving people money. I'm not particularly well-informed on this though, so could be totally wrong.

1. To implement some proper scoring rules, you need the ability to confiscate money from people who predict badly. Even when the score always has the same sign, like you have with log-scoring (or when you add a constant to a quadratic scoring system), if you don't confiscate money for bad predictions, then you're basically just giving money to people for signing up, which makes having an open platform tricky.

2. Even if you restrict signups, you get an analogous problem within a fixed population who's already signed up: the incentives will be skewed when it comes to choosing which questions to answer. In particular, if people expect to get positive amounts of money for answering randomly, they'll do so even when they have no relevant information, adding a lot of noise.

3. If a scoring rule is "very capped", as the log-scoring function is, then the expected reward from answering randomly may be very close to the expected reward from putting in a lot of effort, and so people would be incentivised to answer randomly and spend their time on other things.

4. Relatedly, people's utilities aren't linear in money, so the score function might not remain a proper one taking that into account. But I don't think this would be a big effect on the scales this is likely to operate on.

Comment by ricraz on Characterising utopia · 2020-01-04T03:07:11.970Z · score: 4 (2 votes) · LW · GW

Apologies for the mischaracterisation. I've changed this to refer to Scott Alexander's post which predicts this pressure.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:53:14.660Z · score: 4 (2 votes) · LW · GW

Actually, the key difference between this and prediction markets is that this has no downside risk, it seems? If you can't lose money for bad predictions. So you could exploit it by only making extreme predictions, which would make a lot of money sometimes, without losing money in the other cases. Or by making fake accounts to drag the average down.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:41:42.546Z · score: 3 (2 votes) · LW · GW

Another point: prediction markets allow you to bet more if you're more confident the market is off. This doesn't, except by betting that the market is further off. Which is different. But idk if that matters very much, you could probably recreate that dynamic by letting people weight their own predictions.

Comment by ricraz on [Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration · 2020-01-03T19:34:24.957Z · score: 6 (4 votes) · LW · GW

Okay, so in quite a few cases the forecasters spent more time on a question than Elizabeth did? That seems like an important point to mention.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:21:22.078Z · score: 5 (3 votes) · LW · GW

My interpretation: there's no such thing as negative value of information. If the mean of the crowdworkers' estimates were reliably in the wrong direction (compared with Elizabeth's prior) then that would allow you to update Elizabeth's prior to make it more accurate.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:19:31.776Z · score: 14 (6 votes) · LW · GW

So the thing I'm wondering here is what makes this "amplification" in more than a trivial sense. Let me think out loud for a bit. Warning: very rambly.

Let's say you're a competent researcher and you want to find out the answers to 100 questions, which you don't have time to investigate yourself. The obvious strategy here is to hire 10 people, get them to investigate 10 questions each, and then pay them based on how valuable you think their research was. Or, perhaps you don't even need to assign them questions - perhaps they can pick their own questions, and you can factor in how neglected each question was as part of the value-of-research calculation.

This is the standard, "freeform" approach; it's "amplification" in the same sense that having employees is always amplification. What does the forecasting approach change?

  • It gives one specific mechanism for how you (the boss) evaluate the quality of research (by comparison with your own deep dive), and rules out all the others. This has the advantage of simplicity and transparency, but has the disadvantage that you can't directly give rewards for other criteria like "how well is this explained". You also can't reward research on topics that you don't do deep dives on.
    • This mainly seems valuable if you don't trust your own ability to evaluate research in an unbiased way. But evaluating research is usually much easier than doing research! In particular, doing research involves evaluating a whole bunch of previous literature.
    • Further, if one of your subordinates thinks you're systematically biased, then the forecasting approach doesn't give them a mechanism to get rewarded for telling you that. Whereas in the freeform approach to evaluating the quality of research, you can take that into account in your value calculation.
  • It gives one specific mechanism for how you aggregate all the research you receive. But that doesn't matter very much, since you're not bound to that - you can do whatever you like with the research after you've received it. And in the freeform approach, you're also able to ask people to produce probability distributions if you think that'll be useful for you to aggregate their research.
  • It might save you time? But I don't think that's true in general. Sure, if you use the strategy of reading everyone's research then grading it, that might take a long time. But since the forecasting approach is highly stochastic (people only get rewards for questions you randomly choose to do a deep dive on) you can be a little bit stochastic in other ways to save time. And presumably there are lots of other grading strategies you could use if you wanted.

Okay, let's take another tack. What makes prediction markets work?

1. Anyone with relevant information can use that information to make money, if the market is wrong.

2. People can see the current market value.

3. They don't have to reveal their information to make money.

4. They know that there's no bias in the evaluation - if their information is good, it's graded by reality, not by some gatekeeper.

5. They don't actually have to get the whole question right - they can just predict a short-term market movement ("this stock is currently undervalued") and then make money off that.

This forecasting setup also features 1 and 2. Whether or not it features 3 depends on whether you (the boss) manage to find that information by yourself in the deep dive. And 4 also depends on that. I don't know whether 5 holds, but I also don't know whether it's important.

So, for the sort of questions we want to ask, is there significant private or hard-to-communicate information?

  • If yes, then people will worry that you won't find it during your deep dive.
  • If no, then you likely don't have any advantage over others who are betting.
  • If it's in the sweet spot where it's private but the investigator would find it during their deep dive, then people with that private information have the right incentives.

If either of the first two options holds, then the forecasting approach might still have an advantage over a freeform approach, because people can see the current best guess when they make their own predictions. Is that visibility important, for the wisdom of crowds to work - or does it work even if everyone submits their probability distributions independently? I don't know - that seems like a crucial question.


Anyway, to summarise, I think it's worth comparing this more explicitly to the most straightforward alternative, which is "ask people to send you information and probability distributions, then use your intuition or expertise or whatever other criteria you like to calculate how valuable their submission is, then send them a proportional amount of money."

Comment by ricraz on [Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration · 2020-01-03T15:48:34.486Z · score: 4 (2 votes) · LW · GW

Perhaps I missed this, but how long were the forecasters expected to spend per claim?

Comment by ricraz on human psycholinguists: a critical appraisal · 2020-01-02T16:30:36.174Z · score: 5 (3 votes) · LW · GW

I broadly agree with the sentiment of this post, that GPT-2 and BERT tell us new things about language. I don't think this claim relies on the fact that they're transformers though - and am skeptical when you say that "the transformer architecture was a real representational advance", and that "You need the right architecture". In your post on transformers, you noted that transformers are supersets of CNNs, but with fewer inductive biases. But I don't think of removing inductive biases as representational advances - or else getting MLPs to work well would be an even bigger representational advance than transformers! Rather, what we're doing is confessing as much ignorance about the correct inductive biases as we can get away with (without running out of compute).

Concretely, I'd predict with ~80% confidence that within 3 years, we'll be able to achieve comparable performance to our current best language models without using transformers - say, by only using something built of CNNs and LSTMs, plus better optimisation and regularisation techniques. Would you agree or disagree with this prediction?

Comment by ricraz on We run the Center for Applied Rationality, AMA · 2019-12-22T18:54:00.520Z · score: 12 (4 votes) · LW · GW

Note that Val's confusion seems to have been because he misunderstood Oli's point.

https://www.lesswrong.com/posts/tMhEv28KJYWsu6Wdo/kensh?commentId=SPouGqiWNiJgMB3KW#SPouGqiWNiJgMB3KW

Comment by ricraz on Coherence arguments do not imply goal-directed behavior · 2019-12-15T21:49:45.746Z · score: 6 (3 votes) · LW · GW

+1, I would have written my own review, but I think I basically just agree with everything in this one (and to the extent I wanted to further elaborate on the post, I've already done so here).

Comment by ricraz on Noticing the Taste of Lotus · 2019-12-02T00:40:18.899Z · score: 2 (1 votes) · LW · GW

This post provides a useful conceptual handle for zooming on what's actually happening when I get distracted, or procrastinate. Noticing this feeling has been a helpful step in preventing it.

Comment by ricraz on Coherence arguments do not imply goal-directed behavior · 2019-12-02T00:36:46.238Z · score: 2 (1 votes) · LW · GW

This post directly addresses what I think is the biggest conceptual hole in our current understanding of AGI: what type of goals will it have, and why? I think it's been important in pushing people away from unhelpful EU-maximisation framings, and towards more nuanced and useful ways of thinking about goals.

Comment by ricraz on Arguments about fast takeoff · 2019-12-02T00:28:23.289Z · score: 4 (2 votes) · LW · GW

I think the arguments in this post have been one of the most important pieces of conceptual progress made in safety within the last few years, and have shifted a lot of people's opinions significantly.

Comment by ricraz on Specification gaming examples in AI · 2019-12-01T23:50:36.620Z · score: 10 (5 votes) · LW · GW

I see this referred to a lot, and also find myself referring to it a lot. Having concrete examples of specification gaming is a valuable shortcut when explaining safety problems, as a "proof of concept" of something going wrong.

Comment by ricraz on Realism about rationality · 2019-11-22T12:02:28.951Z · score: 5 (3 votes) · LW · GW

I think in general, if there's a belief system B that some people have, then it's much easier and more useful to describe B than ~B. It's pretty clear if, say, B = Christianity, or B = Newtonian physics. I think of rationality anti-realism less as a specific hypothesis about intelligence, and more as a default skepticism: why should intelligence be formalisable? Most things aren't!

(I agree that if you think most things are formalisable, so that realism about rationality should be our default hypothesis, then phrasing it this way around might seem a little weird. But the version of realism about rationality that people buy into around here also depends on some of the formalisms that we've actually come up with being useful, which is a much more specific hypothesis, making skepticism again the default position.)

Comment by ricraz on Open question: are minimal circuits daemon-free? · 2019-11-21T14:16:26.259Z · score: 6 (3 votes) · LW · GW

This post grounds a key question in safety in a relatively simple way. It led to the useful distinction between upstream and downstream daemons, which I think is necessary to make conceptual progress on understanding when and how daemons will arise.

Comment by ricraz on Why everything might have taken so long · 2019-11-21T14:13:25.681Z · score: 4 (2 votes) · LW · GW

This post is a pretty comprehensive brainstorm of a crucially important topic; I've found that just reading through it sparks ideas.

Comment by ricraz on Give praise · 2019-11-21T14:11:49.372Z · score: 8 (5 votes) · LW · GW

I think this is a particularly important community norm to spread.

Comment by ricraz on The Rocket Alignment Problem · 2019-11-21T14:08:22.821Z · score: 11 (3 votes) · LW · GW

It's been very helpful for understanding the motivations behind MIRI's "deconfusion" research, in particular through linking it to another hard technical problem.

Comment by ricraz on Thinking of tool AIs · 2019-11-21T12:52:21.933Z · score: 2 (3 votes) · LW · GW
Because of these modifications, humans could spend almost all day on YT. It is worth noting that, even in this semi-catastrophic case

Calling this a semi-catastrophic case illustrates what seems to me to be a common oversight: not thinking about non-technical feedback mechanisms. In particular, I expect that in this case, YouTube would become illegal, and then everything would be fine.

I know there's a lot more complexity to the issue, and I don't want people to have to hedge all their statements, but I think it's worth pointing out that we shouldn't start to think of catastrophes as "easy" to create in general.

Comment by ricraz on Book Review: Design Principles of Biological Circuits · 2019-11-13T14:28:50.527Z · score: 6 (4 votes) · LW · GW
This post really shocked me with the level of principle that apparently can be found in such systems.

If you're interested in this theme, I recommend reading up on convergent evolution, which I find really fascinating. Here's Dawkins in The Blind Watchmaker:

The primitive mammals that happened to be around in the three areas [of Australia, South America and the Old World] when the dinosaurs more or less simultaneously vacated the great life trades, were all rather small and insignificant, probably nocturnal, previously overshadowed and overpowered by the dinosaurs. They could have evolved in radically different directions in the three areas. To some extent this is what happened. … But although the separate continents each produced their unique mammals, the general pattern of evolution in all three areas was the same. In all three areas the mammals that happened to be around at the start fanned out in evolution, and produced a specialist for each trade which, in many cases, came to bear a remarkable resemblance to the corresponding specialist in the other two areas.

Dawkins goes on to describe the many ways in which marsupials in Australia, placentals in the Old World, and a mix of both in South America underwent convergent evolution to fill similar roles in their ecosystems. Some examples are very striking: separate evolutions of moles, anteaters, army ants, etc.

I'm also working my way through Jonathan Losos' Improbable Destinies now, which isn't bad but a bit pop-sciencey. For more detail, Losos mentions https://mitpress.mit.edu/books/convergent-evolution and https://www.amazon.co.uk/Lifes-Solution-Inevitable-Humans-Universe/dp/0521603250.

Comment by ricraz on Rohin Shah on reasons for AI optimism · 2019-11-13T02:00:08.840Z · score: 4 (2 votes) · LW · GW

I predict that Rohin would say something like "the phrase 'approximately optimal for some objective/utility function' is basically meaningless in this context, because for any behaviour, there's some function which it's maximising".

You might then limit yourself to the set of functions that defines tasks that are interesting or relevant to humans. But then that includes a whole bunch of functions which define safe bounded behaviour as well as a whole bunch which define unsafe unbounded behaviour, and we're back to being very uncertain about which case we'll end up in.


Comment by ricraz on Rohin Shah on reasons for AI optimism · 2019-11-01T11:31:04.977Z · score: 17 (12 votes) · LW · GW
Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention.

This sentence makes two claims. Firstly that Rohin reports 90% credence in safe AI by default. Secondly that 90% is unusually large compared with the relevant reference class (which I interpret to be people working full-time on AI safety).

However, as far as I can tell, there's no evidence provided for the second claim. I find this particularly concerning because it's the sort of claim that seems likely to cause (and may already have caused) information cascades, along the lines of "all these high status people think AI x-risk is very likely, so I should too".

It may well be true that Rohin is an outlier in this regard. But it may also be false: a 10% chance of catastrophe is plenty high enough to motivate people to go into the field. Since I don't know of many public statements from safety researchers stating their credence in AI x-risk, I'm curious about whether you have strong private evidence.

Comment by ricraz on In Defence of Conflict Theory · 2019-10-10T10:49:39.389Z · score: 2 (1 votes) · LW · GW
This doesn't make much sense in two of your examples: factory farming and concern for future generations. In those cases it seems that you instead have to convince the "powerful" that they are wrong.

I think it's quite a mistake-theoretic view to think that factory farming persists because powerful people are wrong about it. Instead, the (conflict-theoretic) view which I'd defend here is something like "It doesn't matter what politicians think about the morality of factory farming, very few politicians are moral enough to take the career hit of standing up for what's right when it's unpopular, and many are being bought off by the evil meat/farming lobbies. So we need to muster enough mass popular support that politicians see which way the wind is blowing and switch sides en masse (like they did with gay marriage)."

Then the relevance to "the strug­gle to rally peo­ple with­out power to keep the pow­er­ful in check will be a Red Queen’s race that we sim­ply need to keep run­ning for as long as we want pros­per­ity to last" is simply that there's no long-term way to change politicians from being weak-willed and immoral - you just need to keep fighting through all these individual issues as they come up.

I think besides "power corrupts", my main problem with "conflict theorists" is that optimizing for gaining power often requires [ideology], i.e., implicitly or explicitly ignoring certain facts that are inconvenient for building a social movement or gaining power. And then this [ideology] gets embedded into the power structure as unquestionable "truths" once the social movement actually gains power, and subsequently causes massive policy distortions.

(Warning: super simplified, off the cuff thoughts here, from a perspective I only partially endorse): I guess my inner conflict theorist believes that it's okay for there to be significant distortions in policy as long as there are mechanisms by which new ideologies can arise to address them, and that it's worthwhile to have this in exchange for dynamism and less political stagnation.

Like, you know what was one of the biggest policy distortions of all time? World War 2. And yet it had a revitalising effect on the American economy, decreased inequality, and led to a boom period.

Whereas if you don't have new ideologies rising and gaining power, then you can go around fixing individual problems all day, but the core allocation of power in society will become so entrenched that the policy distortions are disastrous.

(Edited to add: this feels relevant.)

Comment by ricraz on Arguments for moral indefinability · 2019-10-10T10:34:06.420Z · score: 2 (1 votes) · LW · GW

I address (something similar to) Yudkowsky's view in the paragraph starting:

I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection.

Particularism feels relevant and fairly similar to what I'm saying, although maybe with a bit of a different emphasis.