Posts

Characterising utopia 2020-01-02T00:00:01.268Z · score: 26 (7 votes)
Technical AGI safety research outside AI 2019-10-18T15:00:22.540Z · score: 36 (13 votes)
Seven habits towards highly effective minds 2019-09-05T23:10:01.020Z · score: 39 (10 votes)
What explanatory power does Kahneman's System 2 possess? 2019-08-12T15:23:20.197Z · score: 33 (16 votes)
Why do humans not have built-in neural i/o channels? 2019-08-08T13:09:54.072Z · score: 26 (12 votes)
Book review: The Technology Trap 2019-07-20T12:40:01.151Z · score: 30 (14 votes)
What are some of Robin Hanson's best posts? 2019-07-02T20:58:01.202Z · score: 36 (10 votes)
On alien science 2019-06-02T14:50:01.437Z · score: 46 (15 votes)
A shift in arguments for AI risk 2019-05-28T13:47:36.486Z · score: 27 (11 votes)
Would an option to publish to AF users only be a useful feature? 2019-05-20T11:04:26.150Z · score: 14 (5 votes)
Which scientific discovery was most ahead of its time? 2019-05-16T12:58:14.628Z · score: 39 (10 votes)
When is rationality useful? 2019-04-24T22:40:01.316Z · score: 29 (7 votes)
Book review: The Sleepwalkers by Arthur Koestler 2019-04-23T00:10:00.972Z · score: 75 (22 votes)
Arguments for moral indefinability 2019-02-12T10:40:01.226Z · score: 53 (17 votes)
Coherent behaviour in the real world is an incoherent concept 2019-02-11T17:00:25.665Z · score: 38 (16 votes)
Vote counting bug? 2019-01-22T15:44:48.154Z · score: 7 (2 votes)
Disentangling arguments for the importance of AI safety 2019-01-21T12:41:43.615Z · score: 122 (43 votes)
Comments on CAIS 2019-01-12T15:20:22.133Z · score: 72 (19 votes)
How democracy ends: a review and reevaluation 2018-11-27T10:50:01.130Z · score: 17 (9 votes)
On first looking into Russell's History 2018-11-08T11:20:00.935Z · score: 35 (11 votes)
Speculations on improving debating 2018-11-05T16:10:02.799Z · score: 26 (10 votes)
Implementations of immortality 2018-11-01T14:20:01.494Z · score: 21 (8 votes)
What will the long-term future of employment look like? 2018-10-24T19:58:09.320Z · score: 11 (4 votes)
Book review: 23 things they don't tell you about capitalism 2018-10-18T23:05:29.465Z · score: 19 (11 votes)
Book review: The Complacent Class 2018-10-13T19:20:05.823Z · score: 21 (9 votes)
Some cruxes on impactful alternatives to AI policy work 2018-10-10T13:35:27.497Z · score: 151 (53 votes)
A compendium of conundrums 2018-10-08T14:20:01.178Z · score: 12 (12 votes)
Thinking of the days that are no more 2018-10-06T17:00:01.208Z · score: 13 (6 votes)
The Unreasonable Effectiveness of Deep Learning 2018-09-30T15:48:46.861Z · score: 86 (26 votes)
Deep learning - deeper flaws? 2018-09-24T18:40:00.705Z · score: 42 (17 votes)
Book review: Happiness by Design 2018-09-23T04:30:00.939Z · score: 14 (6 votes)
Book review: Why we sleep 2018-09-19T22:36:19.608Z · score: 52 (25 votes)
Realism about rationality 2018-09-16T10:46:29.239Z · score: 168 (76 votes)
Is epistemic logic useful for agent foundations? 2018-05-08T23:33:44.266Z · score: 19 (6 votes)
What we talk about when we talk about maximising utility 2018-02-24T22:33:28.390Z · score: 27 (8 votes)
In Defence of Conflict Theory 2018-02-17T03:33:01.970Z · score: 25 (10 votes)
Is death bad? 2018-01-13T04:55:25.788Z · score: 8 (4 votes)

Comments

Comment by ricraz on Realism about rationality · 2020-01-14T11:01:48.270Z · score: 2 (1 votes) · LW · GW

Yeah, I should have been much more careful before throwing around words like "real". See the long comment I just posted for more clarification, and in particular this paragraph:

I'm not trying to argue that concepts which we can't formalise "aren't real", but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can't formalise, and that it's those incoherent extrapolations which "aren't real" (I agree that this was quite unclear in the original post).
Comment by ricraz on Realism about rationality · 2020-01-14T10:40:02.315Z · score: 9 (4 votes) · LW · GW

I like this review and think it was very helpful in understanding your (Abram's) perspective, as well as highlighting some flaws in the original post, and ways that I'd been unclear in communicating my intuitions. In the rest of my comment I'll try write a synthesis of my intentions for the original post with your comments; I'd be interested in the extent to which you agree or disagree.

We can distinguish between two ways to understand a concept X. For lack of better terminology, I'll call them "understanding how X functions" and "understanding the nature of X". I conflated these in the original post in a confusing way.

For example, I'd say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don't yet exist, or even prove things about those components (e.g. there's probably useful maths connecting graph theory with optimal nerve wiring), but it's still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.

By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like "the number and quality of an organism's offspring". Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.

Momentum isn't really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don't know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature - but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).

I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.

However, I still think that there are different ways you might go about understanding the nature of intelligence, and that "something kind of like rationality realism" might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein's thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I'm not sure. There's an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don't think that the extreme cases of fitness even make sense. So I would say that I am not a realist about "perfect fitness", even though the concept of fitness itself seems fine.

So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like "if we succeed in finding a theory that tells us the nature of intelligence, it still won't make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions."

The reason I called it "rationality realism" not "intelligence realism" is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn't. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there's usually an assumption that "perfect rationality" exists. I'm not trying to argue that concepts which we can't formalise "aren't real", but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can't formalise, and that it's those incoherent extrapolations like "perfect fitness" which "aren't real" (I agree that this was quite unclear in the original post).

My proposed redefinition:

  • The "intelligence is intelligible" hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
  • The "realism about rationality" hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as "perfect rationality", and "well-defined" with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we'll ever discover).
Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-04T22:01:34.741Z · score: 6 (3 votes) · LW · GW

Cool, thanks for those clarifications :) In case it didn't come through from the previous comments, I wanted to make clear that this seems like exciting work and I'm looking forward to hearing how follow-ups go.

Comment by ricraz on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-04T20:55:17.852Z · score: 2 (1 votes) · LW · GW

Yes, but the fact that the fragile worlds are much more likely to end in the future is a reason to condition your efforts on being in a robust world.

While I do buy Paul's argument, I think it'd be very helpful if the various summaries of the interviews with him were edited to make it clear that he's talking about value-conditioned probabilities rather than unconditional probabilities - since the claim as originally stated feels misleading. (Even if some decision theories only use the former, most people think in terms of the latter).

Comment by ricraz on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-04T20:44:52.912Z · score: 4 (2 votes) · LW · GW

Some abstractions are heavily determined by the territory. The concept of trees is pretty heavily determined by the territory. Whereas the concept of betrayal is determined by the way that human minds function, which is determined by other people's abstractions. So while it seems reasonably likely to me that an AI "naturally thinks" in terms of the same low-level abstractions as humans, it thinking in terms of human high-level abstractions seems much less likely, absent some type of safety intervention. Which is particularly important because most of the key human values are very high-level abstractions.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-04T20:19:09.091Z · score: 4 (2 votes) · LW · GW

I have four concerns even given that you're using a proper scoring rule, which relate to the link between that scoring rule and actually giving people money. I'm not particularly well-informed on this though, so could be totally wrong.

1. To implement some proper scoring rules, you need the ability to confiscate money from people who predict badly. Even when the score always has the same sign, like you have with log-scoring (or when you add a constant to a quadratic scoring system), if you don't confiscate money for bad predictions, then you're basically just giving money to people for signing up, which makes having an open platform tricky.

2. Even if you restrict signups, you get an analogous problem within a fixed population who's already signed up: the incentives will be skewed when it comes to choosing which questions to answer. In particular, if people expect to get positive amounts of money for answering randomly, they'll do so even when they have no relevant information, adding a lot of noise.

3. If a scoring rule is "very capped", as the log-scoring function is, then the expected reward from answering randomly may be very close to the expected reward from putting in a lot of effort, and so people would be incentivised to answer randomly and spend their time on other things.

4. Relatedly, people's utilities aren't linear in money, so the score function might not remain a proper one taking that into account. But I don't think this would be a big effect on the scales this is likely to operate on.

Comment by ricraz on Characterising utopia · 2020-01-04T03:07:11.970Z · score: 4 (2 votes) · LW · GW

Apologies for the mischaracterisation. I've changed this to refer to Scott Alexander's post which predicts this pressure.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:53:14.660Z · score: 4 (2 votes) · LW · GW

Actually, the key difference between this and prediction markets is that this has no downside risk, it seems? If you can't lose money for bad predictions. So you could exploit it by only making extreme predictions, which would make a lot of money sometimes, without losing money in the other cases. Or by making fake accounts to drag the average down.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:41:42.546Z · score: 3 (2 votes) · LW · GW

Another point: prediction markets allow you to bet more if you're more confident the market is off. This doesn't, except by betting that the market is further off. Which is different. But idk if that matters very much, you could probably recreate that dynamic by letting people weight their own predictions.

Comment by ricraz on [Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration · 2020-01-03T19:34:24.957Z · score: 5 (3 votes) · LW · GW

Okay, so in quite a few cases the forecasters spent more time on a question than Elizabeth did? That seems like an important point to mention.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:21:22.078Z · score: 5 (3 votes) · LW · GW

My interpretation: there's no such thing as negative value of information. If the mean of the crowdworkers' estimates were reliably in the wrong direction (compared with Elizabeth's prior) then that would allow you to update Elizabeth's prior to make it more accurate.

Comment by ricraz on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-03T19:19:31.776Z · score: 14 (6 votes) · LW · GW

So the thing I'm wondering here is what makes this "amplification" in more than a trivial sense. Let me think out loud for a bit. Warning: very rambly.

Let's say you're a competent researcher and you want to find out the answers to 100 questions, which you don't have time to investigate yourself. The obvious strategy here is to hire 10 people, get them to investigate 10 questions each, and then pay them based on how valuable you think their research was. Or, perhaps you don't even need to assign them questions - perhaps they can pick their own questions, and you can factor in how neglected each question was as part of the value-of-research calculation.

This is the standard, "freeform" approach; it's "amplification" in the same sense that having employees is always amplification. What does the forecasting approach change?

  • It gives one specific mechanism for how you (the boss) evaluate the quality of research (by comparison with your own deep dive), and rules out all the others. This has the advantage of simplicity and transparency, but has the disadvantage that you can't directly give rewards for other criteria like "how well is this explained". You also can't reward research on topics that you don't do deep dives on.
    • This mainly seems valuable if you don't trust your own ability to evaluate research in an unbiased way. But evaluating research is usually much easier than doing research! In particular, doing research involves evaluating a whole bunch of previous literature.
    • Further, if one of your subordinates thinks you're systematically biased, then the forecasting approach doesn't give them a mechanism to get rewarded for telling you that. Whereas in the freeform approach to evaluating the quality of research, you can take that into account in your value calculation.
  • It gives one specific mechanism for how you aggregate all the research you receive. But that doesn't matter very much, since you're not bound to that - you can do whatever you like with the research after you've received it. And in the freeform approach, you're also able to ask people to produce probability distributions if you think that'll be useful for you to aggregate their research.
  • It might save you time? But I don't think that's true in general. Sure, if you use the strategy of reading everyone's research then grading it, that might take a long time. But since the forecasting approach is highly stochastic (people only get rewards for questions you randomly choose to do a deep dive on) you can be a little bit stochastic in other ways to save time. And presumably there are lots of other grading strategies you could use if you wanted.

Okay, let's take another tack. What makes prediction markets work?

1. Anyone with relevant information can use that information to make money, if the market is wrong.

2. People can see the current market value.

3. They don't have to reveal their information to make money.

4. They know that there's no bias in the evaluation - if their information is good, it's graded by reality, not by some gatekeeper.

5. They don't actually have to get the whole question right - they can just predict a short-term market movement ("this stock is currently undervalued") and then make money off that.

This forecasting setup also features 1 and 2. Whether or not it features 3 depends on whether you (the boss) manage to find that information by yourself in the deep dive. And 4 also depends on that. I don't know whether 5 holds, but I also don't know whether it's important.

So, for the sort of questions we want to ask, is there significant private or hard-to-communicate information?

  • If yes, then people will worry that you won't find it during your deep dive.
  • If no, then you likely don't have any advantage over others who are betting.
  • If it's in the sweet spot where it's private but the investigator would find it during their deep dive, then people with that private information have the right incentives.

If either of the first two options holds, then the forecasting approach might still have an advantage over a freeform approach, because people can see the current best guess when they make their own predictions. Is that visibility important, for the wisdom of crowds to work - or does it work even if everyone submits their probability distributions independently? I don't know - that seems like a crucial question.


Anyway, to summarise, I think it's worth comparing this more explicitly to the most straightforward alternative, which is "ask people to send you information and probability distributions, then use your intuition or expertise or whatever other criteria you like to calculate how valuable their submission is, then send them a proportional amount of money."

Comment by ricraz on [Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration · 2020-01-03T15:48:34.486Z · score: 4 (2 votes) · LW · GW

Perhaps I missed this, but how long were the forecasters expected to spend per claim?

Comment by ricraz on human psycholinguists: a critical appraisal · 2020-01-02T16:30:36.174Z · score: 5 (3 votes) · LW · GW

I broadly agree with the sentiment of this post, that GPT-2 and BERT tell us new things about language. I don't think this claim relies on the fact that they're transformers though - and am skeptical when you say that "the transformer architecture was a real representational advance", and that "You need the right architecture". In your post on transformers, you noted that transformers are supersets of CNNs, but with fewer inductive biases. But I don't think of removing inductive biases as representational advances - or else getting MLPs to work well would be an even bigger representational advance than transformers! Rather, what we're doing is confessing as much ignorance about the correct inductive biases as we can get away with (without running out of compute).

Concretely, I'd predict with ~80% confidence that within 3 years, we'll be able to achieve comparable performance to our current best language models without using transformers - say, by only using something built of CNNs and LSTMs, plus better optimisation and regularisation techniques. Would you agree or disagree with this prediction?

Comment by ricraz on We run the Center for Applied Rationality, AMA · 2019-12-22T18:54:00.520Z · score: 12 (4 votes) · LW · GW

Note that Val's confusion seems to have been because he misunderstood Oli's point.

https://www.lesswrong.com/posts/tMhEv28KJYWsu6Wdo/kensh?commentId=SPouGqiWNiJgMB3KW#SPouGqiWNiJgMB3KW

Comment by ricraz on Coherence arguments do not imply goal-directed behavior · 2019-12-15T21:49:45.746Z · score: 6 (3 votes) · LW · GW

+1, I would have written my own review, but I think I basically just agree with everything in this one (and to the extent I wanted to further elaborate on the post, I've already done so here).

Comment by ricraz on Noticing the Taste of Lotus · 2019-12-02T00:40:18.899Z · score: 2 (1 votes) · LW · GW

This post provides a useful conceptual handle for zooming on what's actually happening when I get distracted, or procrastinate. Noticing this feeling has been a helpful step in preventing it.

Comment by ricraz on Coherence arguments do not imply goal-directed behavior · 2019-12-02T00:36:46.238Z · score: 2 (1 votes) · LW · GW

This post directly addresses what I think is the biggest conceptual hole in our current understanding of AGI: what type of goals will it have, and why? I think it's been important in pushing people away from unhelpful EU-maximisation framings, and towards more nuanced and useful ways of thinking about goals.

Comment by ricraz on Arguments about fast takeoff · 2019-12-02T00:28:23.289Z · score: 4 (2 votes) · LW · GW

I think the arguments in this post have been one of the most important pieces of conceptual progress made in safety within the last few years, and have shifted a lot of people's opinions significantly.

Comment by ricraz on Specification gaming examples in AI · 2019-12-01T23:50:36.620Z · score: 10 (5 votes) · LW · GW

I see this referred to a lot, and also find myself referring to it a lot. Having concrete examples of specification gaming is a valuable shortcut when explaining safety problems, as a "proof of concept" of something going wrong.

Comment by ricraz on Realism about rationality · 2019-11-22T12:02:28.951Z · score: 5 (3 votes) · LW · GW

I think in general, if there's a belief system B that some people have, then it's much easier and more useful to describe B than ~B. It's pretty clear if, say, B = Christianity, or B = Newtonian physics. I think of rationality anti-realism less as a specific hypothesis about intelligence, and more as a default skepticism: why should intelligence be formalisable? Most things aren't!

(I agree that if you think most things are formalisable, so that realism about rationality should be our default hypothesis, then phrasing it this way around might seem a little weird. But the version of realism about rationality that people buy into around here also depends on some of the formalisms that we've actually come up with being useful, which is a much more specific hypothesis, making skepticism again the default position.)

Comment by ricraz on Open question: are minimal circuits daemon-free? · 2019-11-21T14:16:26.259Z · score: 6 (3 votes) · LW · GW

This post grounds a key question in safety in a relatively simple way. It led to the useful distinction between upstream and downstream daemons, which I think is necessary to make conceptual progress on understanding when and how daemons will arise.

Comment by ricraz on Why everything might have taken so long · 2019-11-21T14:13:25.681Z · score: 4 (2 votes) · LW · GW

This post is a pretty comprehensive brainstorm of a crucially important topic; I've found that just reading through it sparks ideas.

Comment by ricraz on Give praise · 2019-11-21T14:11:49.372Z · score: 8 (5 votes) · LW · GW

I think this is a particularly important community norm to spread.

Comment by ricraz on The Rocket Alignment Problem · 2019-11-21T14:08:22.821Z · score: 11 (3 votes) · LW · GW

It's been very helpful for understanding the motivations behind MIRI's "deconfusion" research, in particular through linking it to another hard technical problem.

Comment by ricraz on Thinking of tool AIs · 2019-11-21T12:52:21.933Z · score: 2 (3 votes) · LW · GW
Because of these modifications, humans could spend almost all day on YT. It is worth noting that, even in this semi-catastrophic case

Calling this a semi-catastrophic case illustrates what seems to me to be a common oversight: not thinking about non-technical feedback mechanisms. In particular, I expect that in this case, YouTube would become illegal, and then everything would be fine.

I know there's a lot more complexity to the issue, and I don't want people to have to hedge all their statements, but I think it's worth pointing out that we shouldn't start to think of catastrophes as "easy" to create in general.

Comment by ricraz on Book Review: Design Principles of Biological Circuits · 2019-11-13T14:28:50.527Z · score: 6 (4 votes) · LW · GW
This post really shocked me with the level of principle that apparently can be found in such systems.

If you're interested in this theme, I recommend reading up on convergent evolution, which I find really fascinating. Here's Dawkins in The Blind Watchmaker:

The primitive mammals that happened to be around in the three areas [of Australia, South America and the Old World] when the dinosaurs more or less simultaneously vacated the great life trades, were all rather small and insignificant, probably nocturnal, previously overshadowed and overpowered by the dinosaurs. They could have evolved in radically different directions in the three areas. To some extent this is what happened. … But although the separate continents each produced their unique mammals, the general pattern of evolution in all three areas was the same. In all three areas the mammals that happened to be around at the start fanned out in evolution, and produced a specialist for each trade which, in many cases, came to bear a remarkable resemblance to the corresponding specialist in the other two areas.

Dawkins goes on to describe the many ways in which marsupials in Australia, placentals in the Old World, and a mix of both in South America underwent convergent evolution to fill similar roles in their ecosystems. Some examples are very striking: separate evolutions of moles, anteaters, army ants, etc.

I'm also working my way through Jonathan Losos' Improbable Destinies now, which isn't bad but a bit pop-sciencey. For more detail, Losos mentions https://mitpress.mit.edu/books/convergent-evolution and https://www.amazon.co.uk/Lifes-Solution-Inevitable-Humans-Universe/dp/0521603250.

Comment by ricraz on Rohin Shah on reasons for AI optimism · 2019-11-13T02:00:08.840Z · score: 4 (2 votes) · LW · GW

I predict that Rohin would say something like "the phrase 'approximately optimal for some objective/utility function' is basically meaningless in this context, because for any behaviour, there's some function which it's maximising".

You might then limit yourself to the set of functions that defines tasks that are interesting or relevant to humans. But then that includes a whole bunch of functions which define safe bounded behaviour as well as a whole bunch which define unsafe unbounded behaviour, and we're back to being very uncertain about which case we'll end up in.


Comment by ricraz on Rohin Shah on reasons for AI optimism · 2019-11-01T11:31:04.977Z · score: 15 (10 votes) · LW · GW
Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention.

This sentence makes two claims. Firstly that Rohin reports 90% credence in safe AI by default. Secondly that 90% is unusually large compared with the relevant reference class (which I interpret to be people working full-time on AI safety).

However, as far as I can tell, there's no evidence provided for the second claim. I find this particularly concerning because it's the sort of claim that seems likely to cause (and may already have caused) information cascades, along the lines of "all these high status people think AI x-risk is very likely, so I should too".

It may well be true that Rohin is an outlier in this regard. But it may also be false: a 10% chance of catastrophe is plenty high enough to motivate people to go into the field. Since I don't know of many public statements from safety researchers stating their credence in AI x-risk, I'm curious about whether you have strong private evidence.

Comment by ricraz on In Defence of Conflict Theory · 2019-10-10T10:49:39.389Z · score: 2 (1 votes) · LW · GW
This doesn't make much sense in two of your examples: factory farming and concern for future generations. In those cases it seems that you instead have to convince the "powerful" that they are wrong.

I think it's quite a mistake-theoretic view to think that factory farming persists because powerful people are wrong about it. Instead, the (conflict-theoretic) view which I'd defend here is something like "It doesn't matter what politicians think about the morality of factory farming, very few politicians are moral enough to take the career hit of standing up for what's right when it's unpopular, and many are being bought off by the evil meat/farming lobbies. So we need to muster enough mass popular support that politicians see which way the wind is blowing and switch sides en masse (like they did with gay marriage)."

Then the relevance to "the strug­gle to rally peo­ple with­out power to keep the pow­er­ful in check will be a Red Queen’s race that we sim­ply need to keep run­ning for as long as we want pros­per­ity to last" is simply that there's no long-term way to change politicians from being weak-willed and immoral - you just need to keep fighting through all these individual issues as they come up.

I think besides "power corrupts", my main problem with "conflict theorists" is that optimizing for gaining power often requires [ideology], i.e., implicitly or explicitly ignoring certain facts that are inconvenient for building a social movement or gaining power. And then this [ideology] gets embedded into the power structure as unquestionable "truths" once the social movement actually gains power, and subsequently causes massive policy distortions.

(Warning: super simplified, off the cuff thoughts here, from a perspective I only partially endorse): I guess my inner conflict theorist believes that it's okay for there to be significant distortions in policy as long as there are mechanisms by which new ideologies can arise to address them, and that it's worthwhile to have this in exchange for dynamism and less political stagnation.

Like, you know what was one of the biggest policy distortions of all time? World War 2. And yet it had a revitalising effect on the American economy, decreased inequality, and led to a boom period.

Whereas if you don't have new ideologies rising and gaining power, then you can go around fixing individual problems all day, but the core allocation of power in society will become so entrenched that the policy distortions are disastrous.

(Edited to add: this feels relevant.)

Comment by ricraz on Arguments for moral indefinability · 2019-10-10T10:34:06.420Z · score: 2 (1 votes) · LW · GW

I address (something similar to) Yudkowsky's view in the paragraph starting:

I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection.

Particularism feels relevant and fairly similar to what I'm saying, although maybe with a bit of a different emphasis.

Comment by ricraz on Realism and Rationality · 2019-09-22T07:35:30.331Z · score: 2 (1 votes) · LW · GW
If Alice doesn't mean for her second sentence to be totally redundant -- or if she is able to interpret Bob's response as an intelligible (if incorrect) statement of disagreement with her second sentence -- then that suggests her second sentence actually constitutes a substantively normative claim.

I don't think you can declare a sentence redundant without also considering the pragmatic aspects of meaning. In this example, Alice's second sentence is a stronger claim than the first, because it again contains an implicit clause: "If you want to get protein, and you don't have any other relevant goals, you should eat meat". Or maybe it's more like "If you want to get protein, and your other goals are standard ones, you should eat meat."

Compare: Alice says "Jumping off cliffs without a parachute is a quick way to feel very excited. If you want to feel excited, you should jump off cliffs without a parachute." Bob says "No you shouldn't, because you'll die." Alice's first sentence is true, and her second sentence is false, so they can't be equivalent - but both of them can be interpreted as goal-conditional empirical sentences. It's just the case that when you make broad statements, pragmatically you are assuming a "normal" set of goals.

If she is able to interpret Bob's response as an intelligible (if incorrect) statement of disagreement with her second sentence

It's not entirely unintelligible, because Alice is relying on an implicit premise of "standard goals" I mentioned above, and the reason people like Bob are so outspoken on this issue is because they're trying to change that norm of what we consider "standard goals". I do think that if Alice really understood normativity, she would tell Bob that she was trying to make a different type of claim to his one, because his was normative and hers wasn't - while conceding that he had reason to find the pragmatics of her sentence objectionable.

Also, though, you've picked a case where the disputed statement is often used both in empirical ways and in normative ways. This is the least clear sort of example (especially since, pragmatically, when you repeat almost the same thing twice, it makes people think you're implying something different). The vast majority of examples of people using "if you want..., then you should..." seem clearly empirical to me - including many that are in morally relevant domains, where the pragmatics make their empirical nature clear:

A: "If you want to murder someone without getting caught, you should plan carefully."

B: "No you shouldn't, because you shouldn't murder people."

A: "Well obviously you shouldn't murder people, but I'm just saying that if you wanted to, planning would make things much easier."

Comment by ricraz on Realism and Rationality · 2019-09-21T19:16:57.754Z · score: 2 (1 votes) · LW · GW
1. "Bayesian updating has a certain asymptoptic convergence property, in the limit of infinite experience and infinite compute. So if you want to understand the world, you should be a Bayesian."
If the first and second sentence were meant to communicate the same thing, then the second would be totally vacuous given the first.

I was a little imprecise in saying that they're exactly equivalent - the second sentence should also have a "in the limit of infinite compute" qualification. Or else we need a hidden assumption like "These asymptotic convergence properties give us reason to believe that even low-compute approximations to Bayesianism are very good ways to understand the world." This is usually left implicit, but it allows us to think of "if you want to understand the world, you should be (approximately) a Bayesian" as an empirical claim not a normative one. For this to actually be an example of normativity, it needs to be the case that some people consider this hidden assumption unnecessary and would endorse claims like "You should use low-compute approximations to Bayesianism because Bayesianism has certain asymptotic convergence properties, even if those properties don't give us any reason to think that low-compute approximations to Bayesianism help you understand the world better." Do you expect that people would endorse this?

Comment by ricraz on Realism and Rationality · 2019-09-21T09:25:15.599Z · score: 5 (2 votes) · LW · GW
But I do have the impression that many people would at least endorse this equally normative claim: "If you have the goal of understanding the world, you should be a Bayesian."

Okay, this seems like a crux of our disagreement. This statement seems pretty much equivalent to my statement #1 in almost all practical contexts. Can you point out how you think they differ?

I agree that some statements of that form seem normative: e.g. "You should go to Spain if you want to go to Spain". However, that seems like an exception to me, because it provides no useful information about how to achieve the goal, and so from contextual clues would be interpreted as "I endorse your desire to go to Spain". Consider instead "If you want to murder someone without getting caught, you should plan carefully", which very much lacks endorsement. Or even "If you want to get to the bakery, you should take a left turn here." How do you feel about the normativity of the last statement in particular? How does it practically differ from "The most convenient way to get to the bakery from here is to take a left turn"? Clearly that's something almost everyone is a realist about (assuming a shared understanding of "convenient") at Less Wrong and elsewhere.

In general -- at least in the context of the concepts/definitions in this post -- the inclusion of an "if" clause doesn't prevent a claim from being normative. So, for example, the claim "You should go to Spain if you want to go to Spain" isn't relevantly different from the claim "You should give money to charity if you have enough money to live comfortably."

I think there's a difference between a moral statement with conditions, and a statement about what is best to do given your goals (roughly corresponding to the difference between Kant's categorical and hypothetical imperatives). "You should give money to charity if you have enough money to live comfortably" is an example of the former - it's the latter which I'm saying aren't normative in any useful sense.

Comment by ricraz on Realism and Rationality · 2019-09-18T10:33:13.156Z · score: 5 (2 votes) · LW · GW

The quote from Eliezer is consistent with #1, since it's bad to undermine people's ability to achieve their goals.

More generally, you might believe that it's morally normative to promote true beliefs (e.g. because they lead to better outcomes) but not believe that it's epistemically normative, in a realist sense, to do so (e.g. the question I asked above, about whether you "should" have true beliefs even when there are no morally relevant consequences and it doesn't further your goals).

Comment by ricraz on Realism and Rationality · 2019-09-17T14:20:54.354Z · score: 2 (1 votes) · LW · GW

Upon further thought, maybe just splitting up #1 and #2 is oversimplifying. There's probably a position #1.5, which is more like "Words like "goals" and "beliefs" only make sense to the extent that they're applied to Bayesians with utility functions - every other approach to understanding agenthood is irredeemably flawed." This gets pretty close to normative realism because you're only left with one possible theory, but it's still not making any realist normative claims (even if you think that goals and beliefs are morally relevant, as long as you're also a moral anti-realist). Maybe a relevant analogy: you might believe that using any axioms except the ZFC axioms will make maths totally incoherent, while not actually holding any opinion on whether the ZFC axioms are "true".

Comment by ricraz on Realism and Rationality · 2019-09-17T14:06:06.931Z · score: 2 (1 votes) · LW · GW
In this case, I feel like there aren't actually that many people who identify as normative anti-realists (i.e., deny that any kind of normative facts exist).

What do you mean by a normative fact here? Could you give some examples?

Comment by ricraz on Realism and Rationality · 2019-09-17T13:45:31.307Z · score: 8 (4 votes) · LW · GW
It seems to me, rather, that people often talk about updating your credences in accordance with Bayes’ rule and maximizing the expected fulfillment of your current desires as the correct things to do.

It's important to disentangle two claims:

1. In general, if you have the goal of understanding the world, or any other goal that relies on doing so, being Bayesian will allow you to achieve it to a greater extent than any other approach (in the limit of infinite compute).

2. Regardless of your goals, you should be Bayesian anyway.

Believing #2 commits you to normative realism as I understand the term, but believing #1 doesn't - #1 is simply an empirical claim about what types of cognition tend to do best towards a broad class of goals. I think that many rationalists would defend #1, and few would defend #2 - if you disagree, I'd be interested in seeing examples of the latter. (One test is by asking "Aside from moral considerations, if someone's only goal is to have false beliefs right now, should they believe true things anyway?") Either way, I agree with Wei that distinguishing between moral normativity and epistemic normativity is crucial for fruitful discussions on this topic.

Another way of framing this distinction: assume there's one true theory of physics, call it T. Then someone might make the claim "Modelling the universe using T is the correct way to do so (in the limit of having infinite compute available)." This is analogous to claim #1, and believing this claim does not commit you to normative realism, because it does not imply that anyone should want to model the universe correctly.

It might also be useful to clarify that in ricraz's recent post criticizing "realism about rationality," several of the attitudes listed aren't directly related to "realism" in the sense of this post.

I would characterise "realism about rationality" as approximately equivalent to claim #1 above (plus a few other similar claims). In particular, it is a belief about whether there is a set of simple ideas which elegantly describe the sort of "agents" who do well at their "goals" - not a belief about the normative force of those ideas. Of course, under most reasonable interpretations of #2, the truth of #2 implies #1, but not vice versa.

Comment by ricraz on The Power to Solve Climate Change · 2019-09-12T20:02:43.399Z · score: 12 (4 votes) · LW · GW

This post says interesting and specific things about climate change, and then suddenly gets very dismissive and non-specific when it comes to individual action. And as you predict in your other posts, this leads to mistakes. You say "your causal model of how your actions will affect greenhouse gas concentrations is missing the concept of an economic equilibrium". But the whole problem of climate change is that the harm of carbon emissions affects the equilibrium point of economic activity so little. You even identify the key point ("our economy lets everyone emit carbon for free") without realizing that this implies replacement effects are very weak. Who will fly more if I fly less? In fact, since many industries have economies of scale, me flying less or eating less meat quite plausibly increases prices and decreases the carbon emissions of others.

And yes, there are complications - farm subsidies, discontinuities in response curves, etc. But decreasing personal carbon footprint also has effects on cultural norms which can add up to larger political change. That seems pretty important - even though, in general, it's the type of thing that it's very difficult to be specific about even for historical examples, let alone future ones. Dismissing these sort of effects feels very much like an example of the "valley of bad rationality".

Comment by ricraz on Concrete experiments in inner alignment · 2019-09-10T13:04:44.999Z · score: 3 (2 votes) · LW · GW
to what extent models tend to learn their goals internally vs. via reference to things in their environment

I'm not sure what this distinction is trying to refer to. Goals are both represented internally, and also refer to things in the agent's environments. Is there a tension there?

Comment by ricraz on Utility ≠ Reward · 2019-09-09T02:53:44.193Z · score: 2 (1 votes) · LW · GW

Yes, I'm assuming cumulatively-calculated reward. In general this is a fairly standard assumption (rewards being defined for every timestep is part of the definition of MDPs and POMDPs, and given that I don't see much advantage in delaying computing it until the end of the episode). For agents like AlphaGo observing these rewards obviously won't be very helpful though since those rewards are all 0 until the last timestep. But in general I expect rewards to occur multiple times per episode when training advanced agents, especially as episodes get longer.

Comment by ricraz on Utility ≠ Reward · 2019-09-08T18:16:34.897Z · score: 2 (1 votes) · LW · GW

In the context of reinforcement learning, it's literally just the reward provided by the environment, which is currently fed only to the optimiser, not to the agent. How to make those rewards good ones is a separate question being answered by research directions like reward modelling and IDA.

Comment by ricraz on Utility ≠ Reward · 2019-09-06T17:11:21.168Z · score: 8 (4 votes) · LW · GW
So the reward function can’t be the policy’s objective – one cannot be pursuing something one has no direct access to.

One question I've been wondering about recently is what happens if you actually do give an agent access to its reward during training. (Analogy for humans: a little indicator in the corner of our visual field that lights up whenever we do something that increases the number or fitness of our descendants). Unless the reward is dense and highly shaped, the agent still has to come up with plans to do well on difficult tasks, it can't just delegate those decisions to the reward information. Yet its judgement about which things are promising will presumably be better-tuned because of this extra information (although eventually you'll need to get rid of it in order for the agent to do well unsupervised).

On the other hand, adding reward to the agent's observations also probably makes the agent more likely to tamper with the physical implementation of its reward, since it will be more likely to develop goals aimed at the reward itself, rather than just the things the reward is indicating. (Analogy for humans: because we didn't have a concept of genetic fitness while evolving, it was hard for evolution to make us care about that directly. But if we'd had the indicator light, we might have developed motivations specifically directed towards it, and then later found out that the light was "actually" the output of some physical reward calculation).

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-06T14:11:16.497Z · score: 2 (1 votes) · LW · GW

I don't think I'm claiming that the value prop stories of bad startups will be low-delta overall, just that the delta will be more spread out and less specific. Because the delta of the cryobacterium article, multiplied by a million articles, is quite big, and Golden can say that this is what they'll achieve regardless of how bad they actually are. And more generally, the delta to any given consumer of a product that's better than all its competitors on several of the dimensions I listed above can be pretty big.

Rather, I'm claiming that there are a bunch of startups which will succeed because they do well on the types of things I listed above, and that the Value Prop Story sanity check can't distinguish between startups that will and won't do well on those things in advance. Consider a startup which claims that they will succeed over their competitors because they'll win at advertising. This just isn't the type of thing which we can evaluate well using the Value Prop Story test as you described it:

1. Winning at advertising isn't about providing more value for any given consumer - indeed, to the extent that advertising hijacks our attention, it plausibly provides much less value.

2. The explanation for why that startup thinks they will win on advertising might be arbitrarily non-specific. Maybe the founder has spent decades observing the world and building up strong intuitions about how advertising works, which it would take hours to explain. Maybe the advertising team is a strongly-bonded cohesive unit which the founder trusts deeply.

3. Startups which are going to win at advertising (or other aspects of high-quality non-customer-facing execution) might not even know anything about how well their competitors are doing on those tasks. E.g. I expect someone who's generically incredibly competent to beat their competitors in a bunch of ways even if they have no idea how good their competitors are. The value prop sanity check would reject this person. And if, like I argued above, being "generically incredibly competent" is one of the most important contributors to startup success, then rejecting this type of person makes the sanity check have a lot of false negatives, and therefore much less useful.

Comment by ricraz on Seven habits towards highly effective minds · 2019-09-06T13:39:32.183Z · score: 2 (1 votes) · LW · GW

Hmm, could you say more? I tend to think of social influences as good for propagating ideas - as opposed to generating new ones, which seems to depend more on the creativity of individuals or small groups.

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-05T19:15:50.387Z · score: 2 (1 votes) · LW · GW

I guess I want there to be a minimum lower standard for a Value Prop Story. If you are allowed to say things like "our product will look better and it will be cooler and customers will like our support experience more", then every startup ever has a value prop story. If we're allowing value prop stories of that low quality, then Golden's story could be "our articles will be better than Wikipedia's". Whereas when Liron said that 80% of startups don't have a value prop story, they seemed to be talking about a higher bar than that.

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-05T14:14:43.876Z · score: 8 (4 votes) · LW · GW

Intuitively I like this criterion, but it conflicts with another belief I have about startups, which is that the quality of execution is absolutely crucial. And high-quality execution is the sort of thing it's hard to tell a Value Prop Story about, because it looks like "a breadboard full of little bumps of value" rather than "a single sharp spike of value".

To be more specific, if a startup A has already created a MVP, and someone else wants to found startup B that does exactly the same thing because their team is better at:

  • UX design
  • Hiring
  • Coding
  • Minimising downtime
  • Advertising and publicity
  • Sales and partnerships
  • Fundraising
  • Budgeting
  • Customer support
  • Expanding internationally
  • Being cool

then I expect B to beat A despite not having a convincing Value Prop Story that can be explained in advance (or even in hindsight). And it seems like rather than being a rare exception, it's quite common for multiple startups to be competing in the same space and cloning each other's features, with success going to whoever executes best (more concretely: the many bike-sharing companies; food delivery companies; a bunch of banking startups in the UK; maybe FB vs MySpace?). In those cases, the lack of a Value Prop Story is a false negative and will lead you to underestimate the success of whichever company ends up winning.

Comment by ricraz on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-18T14:40:49.621Z · score: 7 (4 votes) · LW · GW

On 1: I think there's a huge amount for philosophers to do. I think of Dennett as laying some of the groundwork which will make the rest of that work easier (such as identifying that the key question is when it's useful to use an intentional stance, rather than trying to figure out which things are objectively "agents") but the key details are still very vague. Maybe the crux of our disagreement here is how well-specified "treating something as if it's a rational agent" actually is. I think that definitions in terms of utility functions just aren't very helpful, and so we need more conceptual analysis of what phrases like this actually mean, which philosophers are best-suited to provide.

On 2: you're right, as written it does subsume parts of your list. I guess when I wrote that I was thinking that most of the value would come from clarification of the most well-known arguments (i.e. the ones laid out in Superintelligence and What Failure Looks Like). I endorse philosophers pursuing all the items on your list, but from my perspective the disjoint items on my list are much higher priorities.

Comment by ricraz on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-17T21:45:46.559Z · score: 13 (14 votes) · LW · GW

Interestingly, I agree with you that philosophers could make important contributions to AI safety, but my list of things that I'd want them to investigate is almost entirely disjoint from yours.

The most important points on my list:

1. Investigating how to think about agency and goal-directed behaviour, along the lines of Dennett’s work on the intentional stance. How do they relate to intelligence and the ability to generalise across widely different domains? These are crucial concepts which are still very vague.

2. Lay out the arguments for AGI being dangerous as rigorously and comprehensively as possible, noting the assumptions which are being made and how plausible they are.

3. Evaluating the assumptions about the decomposability of cognitive work which underlie debate and IDA (in particular: the universality of humans, and the role of language).

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:59:19.790Z · score: 3 (2 votes) · LW · GW
But human nervous systems do have much higher bandwidth communication channels. We share them with the other mammals. It's the limbic system.

I'm quite uncertain about how high-bandwidth this actually is. I agree that in the first second of meeting someone, it's much more informative than language could be. Once the initial "first impression" has occurred, though, the rate of communication drops off sharply, and I think that language could overtake it after a few minutes. For example, it takes half a second to say "I'm nervous", and you can keep saying similarly-informative things for a long time: do you think you could get a new piece of similar information every half second for ten minutes via the limbic system?

(Note that I'm not necessarily saying people do communicate information about their emotions, personality and social status faster via language, just that they could).