Book review: The Technology Trap

2019-07-20T12:40:01.151Z · score: 15 (5 votes)
Comment by ricraz on How can guesstimates work? · 2019-07-11T23:48:54.035Z · score: 16 (6 votes) · LW · GW

Very interesting question - the sort that makes me swing between thinking it's brilliant and thinking it's nonsense. I do think you overstate your premise. In almost all of the examples given in The Secret of our Success, the relevant knowledge is either non-arbitrary (e.g. the whole passage about hunting seals makes sense, it's just difficult to acquire all that knowledge), or there's a low cost to failure (try a different wood for your arrows; if they don't fly well, go back to basics).

If I engage with the question as posed, though, my primary answer is simply that over time we became wealthy and technologically capable enough that we were able to replace all the natural things that might kill us with whatever we're confident won't kill us. Which is why you can improvise while cooking - all of the ingredients have been screened very hard for safety. This is closely related to your first hypothesis.

However, this still leaves open a slightly different question. The modern world is far too complicated for anyone to understand, and so we might wonder why incomprehensible emergent effects don't render our daily lives haphazard and illegible. One partial answer is that even large-scale components of the world (like countries and companies) were designed by humans. A second partial answer, though, is that even incomprehensible patterns and mechanisms in the modern world still interact with you via other people.

This has a couple of effects. Firstly, other people try to be legible, it's just part of human interaction. (If the manioc could bargain with you, it'd be much easier to figure out how to process it properly.)

Secondly, there's an illusion of transparency because we're so good at and so used to understanding other people. Social interactions are objectively very complicated: in fact, they're "cultural norms and processes which appear arbitrary, yet could have fatal consequences if departed from". Yet it doesn't feel like the reason I refrain from spitting on strangers is arbitrary (even though I couldn't explain the causal pathway by which people started considering it rude). Note also that the space of ideas that startups explore is heavily constrained by social norms and laws.

Thirdly, facts about other humans serve as semantic stop signs. Suppose your boss fires you, because you don't get along. There's a nearly unlimited amount of complexity which shaped your personality, and your boss' personality, and the fact that you ended up in your respective positions. But once you've factored it out into "I'm this sort of person, they're that sort of person", it feels pretty legible - much more than "some foods are eat-raw sorts of foods, other foods are eat-cooked sorts of foods". (Or at least, it feels much more legible to us today - maybe people used to find the latter explanation just as compelling). A related stop sign is the idea that "somebody knows" why each step of a complex causal chain happened, which nudges us away from thinking of the chain as a whole as illegible.

So I've given two reasons for increased legibility (humans building things, and humans explaining things), and two for the illusion of legibility (illusion of transparency, and semantic stop signs). I think on small scales, the former effects predominate. But on large scales, the latter predominate - the world seems more legible than it actually is. For example:

The world seems legible -- I can roughly predict how many planes fly every day by multiplying a handful rough numbers.

Roughly predicting the number of planes which fly every day is a very low bar! You can also predict the number of trees in a forest by multiplying a handful of numbers. This doesn't help you survive in that forest. What helps you survive in the forest is being able to predict the timing of storms and the local tiger population. In the modern world, what helps you thrive is being able to predict the timing of recessions and crime rate trends. I don't think we're any better at the latter two than our ancestors were at the former. In fact, the large-scale arcs of our lives are now governed to a much greater extent by very unpredictable and difficult-to-understand events, such as scientific discoveries, technological innovation and international relations.

In summary, technology has helped us replace individual objects in our environments with safer and more legible alternatives, and the emergent complexity which persists in our modern environments is now either mediated by people, or still very tricky to predict (or both).

Comment by ricraz on The AI Timelines Scam · 2019-07-11T19:30:17.407Z · score: 22 (11 votes) · LW · GW
But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.

I agree that there are clear costs to making extra arguments of the form "timelines are short because technology X and technology Y will fit together neatly". However, you could still make public that your timelines are a given probability distribution D, and the reasons which led you to that conclusion are Z% object-level views which you won't share, and (100-Z)% base rate reasoning and other outside-view considerations, which you will share.

I think there are very few costs to declaring which types of reasoning you're most persuaded by. There are some costs to actually making the outside-view reasoning publicly available - maybe people who read it will better understand the AI landscape and use that information to do capabilities research.

But having a lack of high-quality public timelines discussion also imposes serious costs, for a few reasons:

1. It means that safety researchers are more likely to be wrong, and therefore end up doing less relevant research. I am generally pretty skeptical of reasoning that hasn't been written down and undergone public scrutiny.

2. It means there's a lot of wasted motion across the safety community, as everyone tries to rederive the various arguments involved, and figure out why other people have the views they do, and who they should trust.

3. It makes building common knowledge (and the coordination which that knowledge can be used for) much harder.

4. It harms the credibility of the field of safety from the perspective of outside observers, including other AI researchers.

Also, the more of a risk you think 1 is, the lower the costs of disclosure are, because it becomes more likely that any information gleaned from the disclosure is wrong anyway. Yet predicting the future is incredibly hard! So the base rate for correctness here is low. And I don't think that safety researchers have a compelling advantage when it comes to correctly modelling how AI will reach human level (compared with thoughtful ML researchers).

Consider, by analogy, a debate two decades ago about whether to make public the ideas of recursive self-improvement and fast takeoff. The potential cost of that is very similar to the costs of disclosure now - giving capabilities researchers these ideas might push them towards building self-improving AIs faster. And yet I think making those arguments public was clearly the right decision. Do you agree that our current situation is fairly analogous?

EDIT: Also, I'm a little confused by

Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say.

I understand that there are good reasons for discussions to be private, but can you elaborate on why we'd want discussions about privacy to be private?

What are some of Robin Hanson's best posts?

2019-07-02T20:58:01.202Z · score: 36 (10 votes)
Comment by ricraz on Embedded Agency: Not Just an AI Problem · 2019-06-27T14:53:43.738Z · score: 13 (4 votes) · LW · GW
We have strong outside-view reasons to expect that the information processing in question probably approximates Bayesian reasoning (for some model of the environment), and the decision-making process approximately maximizes some expected utility function (which itself approximates fitness within the ancestral environment).

The use of "approximates" in this sentence (and in the post as a whole) is so loose as to be deeply misleading - for the same reasons that the "blue-minimising robot" shouldn't be described as maximising some expected utility function, and the information processing done by a single neuron shouldn't be described as Bayesian reasoning (even approximately!)

See also: coherent behaviour in the real world is an incoherent concept.

Comment by ricraz on Let's talk about "Convergent Rationality" · 2019-06-26T20:40:33.678Z · score: 4 (2 votes) · LW · GW
There is at least one example (I've struggled to dig up) of a memory-less RL agent learning to encode memory information in the state of the world.

I recall an example of a Mujoco agent whose memory was periodically wiped storing information in the position of its arms. I'm also having trouble digging it up though.

Comment by ricraz on Risks from Learned Optimization: Introduction · 2019-06-07T14:49:02.481Z · score: 6 (4 votes) · LW · GW
We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.

I appreciate the difficulty of actually defining optimizers, and so don't want to quibble with this definition, but am interested in whether you think humans are a central example of optimizers under this definition, and if so whether you think that most mesa-optimizers will "explicitly represent" their objective functions to a similar degree that humans do.

Comment by ricraz on On alien science · 2019-06-03T01:19:05.488Z · score: 3 (2 votes) · LW · GW

Agreed that this points in the right direction. I think there's more to it than that though. Consider for example a three-body problem under Newtonian mechanics. Then there's a sense in which specifying the initial masses and velocities of the bodies, along with Newton's laws of motion, is the best way to compress the information about these chaotic trajectories.

But there's still an open question here, which is why are three-body systems chaotic? Two-body systems aren't. What makes the difference? Finding an explanation probably doesn't allow you to compress any data any more, but it still seems important and interesting.

(This seems related to a potential modification of your data compression standard: that good explanations compress data in a way that minimises not just storage space, but also the computation required to unpack the data. I'm a little confused about this though.)

On alien science

2019-06-02T14:50:01.437Z · score: 44 (14 votes)
Comment by ricraz on Book review: The Sleepwalkers by Arthur Koestler · 2019-05-28T16:32:26.852Z · score: 4 (2 votes) · LW · GW

Thanks for the kind words. I agree that refactoring would be useful, but don't have the time now. I have added some headings though.

A shift in arguments for AI risk

2019-05-28T13:47:36.486Z · score: 25 (10 votes)
Comment by ricraz on "Other people are wrong" vs "I am right" · 2019-05-24T14:42:11.650Z · score: 4 (2 votes) · LW · GW

A relevant book recommendation: The Enigma of Reason argues that thinking of high-level human reasoning as a tool for attacking other people's beliefs and defending our own (regardless of their actual veracity) helps explain a lot of weird asymmetries in cognitive biases we're susceptible to, including this one.

Comment by ricraz on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-05-22T15:25:37.118Z · score: 7 (4 votes) · LW · GW

I'd like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that's difficult to formalise (e.g. somewhere within a neural network).

It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection - hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you're a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.

Comment by ricraz on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-05-22T15:19:14.045Z · score: 6 (3 votes) · LW · GW

This paper by Critch is relevant: it argues that agents with different beliefs will bet their future share of a merged utility function, such that it skews towards whoever's predictions are more correct.

Comment by ricraz on What are the open problems in Human Rationality? · 2019-05-21T23:54:16.026Z · score: 10 (3 votes) · LW · GW

Which policies in particular?

Comment by ricraz on What are the open problems in Human Rationality? · 2019-05-21T11:27:43.800Z · score: 2 (1 votes) · LW · GW

This point seems absolutely crucial; and I really appreciate the cited evidence.

Would an option to publish to AF users only be a useful feature?

2019-05-20T11:04:26.150Z · score: 14 (5 votes)
Comment by ricraz on Which scientific discovery was most ahead of its time? · 2019-05-17T10:56:46.413Z · score: 12 (4 votes) · LW · GW

Actually, general relativity seems to have been discovered by Hilbert at almost exactly the same time that Einstein did.

Comment by ricraz on Which scientific discovery was most ahead of its time? · 2019-05-16T15:12:27.544Z · score: 2 (1 votes) · LW · GW

Biggest jump forward.

Comment by ricraz on The Vulnerable World Hypothesis (by Bostrom) · 2019-05-16T15:10:58.071Z · score: 6 (3 votes) · LW · GW

Does anyone know how this paper relates to Paul Christiano's blog post titled Handling destructive technology, which seems to preempt some of the key ideas? It's not directly acknowledged in the paper.

Which scientific discovery was most ahead of its time?

2019-05-16T12:58:14.628Z · score: 39 (10 votes)
Comment by ricraz on Eight Books To Read · 2019-05-15T13:31:11.454Z · score: 8 (4 votes) · LW · GW

Interesting list. How would you compare reading the best modern summaries and analyses of the older texts, versus reading them in the original?

Quigley’s career demonstrates an excellent piece of sociological methodology... He builds a theory that emphasizes the importance of elites, and subsequently goes and talks to members of the elite to test and apply the theory.

I'm not sure if this is meant to be ironic, but that methodology seems like an excellent way to introduce confirmation bias. I guess it's excellent compared to not going and talking to anyone at all?

Comment by ricraz on When is rationality useful? · 2019-05-01T01:31:40.027Z · score: 2 (1 votes) · LW · GW

Depends what type of research. If you're doing experimental cell biology, it's less likely that your research will be ruined by abstract philosophical assumptions which can't be overcome by looking at the data.

Comment by ricraz on When is rationality useful? · 2019-05-01T01:27:55.792Z · score: 2 (1 votes) · LW · GW
So when is rationality relevant? Always! It's literally the science of how to make your life better / achieving your values.

Sometimes science isn't helpful or useful. The science of how music works may be totally irrelevant to actual musicians.

If you think of instrumental rationality of the science of how to win, then necessarily it entails considering things like how to setup your environment, unthinking habits, how to "hack" into your psyche/emotions.

It's an empirical question when and whether these things are very useful; my post gives cases in which they are, and in which they aren't.

Comment by ricraz on When is rationality useful? · 2019-04-26T21:46:24.590Z · score: 3 (2 votes) · LW · GW
Some effort spent in determining which things are good, and in which things lead to more opportunity for good is going to be rewarded (statistically) with better outcomes.

All else equal, do you think a rationalist mathematician will become more successful in their field than a non-rationalist mathematician? My guess is that if they spent the (fairly significant) time taken to learn and do rationalist things on just learning more maths, they'd do better.

(Here I'm ignoring the possibility that learning rationality makes them decide to leave the field).

I'll also wave at your wave at the recursion problem: "when is rationality useful" is a fundamentally rationalist question both in the sense of being philosophical, and in the sense that answering it is probably not very useful for actually improving your work in most fields.

Comment by ricraz on When is rationality useful? · 2019-04-26T21:42:47.987Z · score: 3 (2 votes) · LW · GW

When I talk about doing useful work, I mean something much more substantial than what you outline above. Obviously 15 minutes every day thinking about your problems is helpful, but the people at the leading edges of most fields spend all day thinking about their problems.

Perhaps doing this ritual makes you think about the problem in a more meta way. If so, there's an empirical question about how much being meta can spark clever solutions. Here I have an intuition that it can, but when I look at any particular subfield that intuition becomes much weaker. How much could a leading mathematician gain by being more meta, for example?

Comment by ricraz on When is rationality useful? · 2019-04-25T05:40:46.530Z · score: 7 (4 votes) · LW · GW

I agree with this. I think the EA example I mentioned fits this pattern fairly well - the more rational you are, the more likely you are to consider what careers and cause areas actually lead to the outcomes you care about, and go into one of those. But then you need the different skill of actually being good at it.

Comment by ricraz on When is rationality useful? · 2019-04-25T05:39:54.137Z · score: 4 (2 votes) · LW · GW

This seems to be roughly orthogonal to what I'm claiming? Whether you get the benefits from rationality quickly or slowly is distinct from what those benefits actually are.

When is rationality useful?

2019-04-24T22:40:01.316Z · score: 29 (7 votes)
Comment by ricraz on Book review: The Sleepwalkers by Arthur Koestler · 2019-04-23T02:56:14.745Z · score: 8 (4 votes) · LW · GW

Hmm, interesting. It doesn't discuss the Galileo affair, which seems like the most important case where the distinction is relevant. Nevertheless, in light of this, "geocentric models with epicycles had always been in the former category" is too strong and I'll amend it accordingly.

Book review: The Sleepwalkers by Arthur Koestler

2019-04-23T00:10:00.972Z · score: 75 (22 votes)
Comment by ricraz on What failure looks like · 2019-04-18T02:40:29.012Z · score: 2 (1 votes) · LW · GW

Mostly I am questioning whether things will turn out badly this way.

Do you not expect this threshold to be crossed sooner or later, assuming AI alignment remains unsolved?

Probably, but I'm pretty uncertain about this. It depends on a lot of messy details about reality, things like: how offense-defence balance scales; what proportion of powerful systems are mostly aligned; whether influence-seeking systems are risk-neutral; what self-governance structures they'll set up; the extent to which their preferences are compatible with ours; how human-comprehensible the most important upcoming scientific advances are.

Comment by ricraz on What failure looks like · 2019-04-17T16:53:53.343Z · score: 2 (1 votes) · LW · GW
I think the idea is that once influence-seeking systems gain a certain amount of influence, it may become faster or more certain for them to gain more influence by causing a catastrophe than to continue to work within existing rules and institutions.

The key issue here is whether there will be coordination between a set of influence-seeking systems that can cause (and will benefit from) a catastrophe, even when other systems are opposing them. If we picture systems as having power comparable to what companies have now, that seems difficult. If we picture them as having power comparable to what countries have now, that seems fairly easy.

Comment by ricraz on What failure looks like · 2019-04-09T16:41:01.080Z · score: 11 (5 votes) · LW · GW
Eventually we reach the point where we could not recover from a correlated automation failure. Under these conditions influence-seeking systems stop behaving in the intended way, since their incentives have changed---they are now more interested in controlling influence after the resulting catastrophe then continuing to play nice with existing institutions and incentives.

I'm not sure I understand this part. The influence-seeking systems which have the most influence also have the most to lose from a catastrophe. So they'll be incentivised to police each other and make catastrophe-avoidance mechanisms more robust.

As an analogy: we may already be past the point where we could recover from a correlated "world leader failure": every world leader simultaneously launching a coup. But this doesn't make such a failure very likely, unless world leaders also have strong coordination and commitment mechanisms between themselves (which are binding even after the catastrophe).

Comment by ricraz on What are CAIS' boldest near/medium-term predictions? · 2019-03-29T00:22:57.277Z · score: 2 (1 votes) · LW · GW

The 75% figure is from now until single agent AGI. I measure it proportionately because otherwise it says more about timeline estimates than about CAIS.

Comment by ricraz on What are CAIS' boldest near/medium-term predictions? · 2019-03-28T14:55:24.064Z · score: 7 (4 votes) · LW · GW

The operationalisation which feels most natural to me is something like:

  • Make a list of cognitively difficult jobs (lawyer, doctor, speechwriter, CEO, engineer, scientist, accountant, trader, consultant, venture capitalist, etc...)
  • A job is automatable when there exists a publicly accessible AI service which allows an equally skilled person to do just as well in less than 25% of the time that it used to take a specialist, OR which allows someone with little skill or training to do the job in about the same time that it used to take a specialist.
  • I claim that over 75% of the jobs on this list will be automatable within 75% of the time until a single superhuman AGI is developed.
  • (Note that there are three free parameters in this definition, which I've set to arbitrary numbers that seem intuitively reasonable).
Comment by ricraz on Disentangling arguments for the importance of AI safety · 2019-02-27T11:38:25.158Z · score: 4 (2 votes) · LW · GW

Thanks! I agree that more connection to past writings is always good, and I'm happy to update it appropriately - although, upon thinking about it, there's nothing which really comes to mind as an obvious omission (except perhaps citing sections of Superintelligence?) Of course I'm pretty biased, since I already put in the things which I thought were most important - so I'd be glad to hear any additional suggestions you have.

Comment by ricraz on How to get value learning and reference wrong · 2019-02-27T11:23:57.145Z · score: 4 (2 votes) · LW · GW

Kudos for writing about making mistakes and changing your mind. If I'm interpreting you correctly, your current perspective is quite similar to mine (which I've tried to explain here and here).

Comment by ricraz on Three Kinds of Research Documents: Clarification, Explanatory, Academic · 2019-02-15T17:43:42.288Z · score: 10 (3 votes) · LW · GW

Agreed that "clarification" is confusing. What about "exploration"?

Comment by ricraz on Arguments for moral indefinability · 2019-02-13T13:44:18.278Z · score: 5 (3 votes) · LW · GW

Thanks for the detailed comments! I only have time to engage with a few of them:

Most of this is underdefined, and that’s unsettling at least in some (but not necessarily all) cases, and if we want to make it less underdefined, the notion of 'one ethics' has to give.

I'm not that wedded to 'one ethics', more like 'one process for producing moral judgements'. But note that if we allow arbitrariness of scope, then 'one process' can be a piecewise function which uses one subprocess in some cases and another in others.

I find myself having similarly strong meta-level intuitions about wanting to do something that is "non-arbitrary" and in relevant ways "simple/elegant". ...motivationally it feels like this intuition is importantly connected to what makes it easy for me to go "all-in“ for my ethical/altruistic beliefs.

I agree that these intuitions are very strong, and they are closely connected to motivational systems. But so are some object-level intuitions like "suffering is bad", and so the relevant question is what you'd do if it were a choice between that and simplicity. I'm not sure your arguments distinguish one from the other in that context.

one can maybe avoid to feel this uncomfortable feeling of uncertainty by deferring to idealized reflection. But it’s not obvious that this lastingly solves the underlying problem

Another way of phrasing this point: reflection is almost always good for figuring out what's the best thing to do, but it's not a good way to define what's the best thing to do.

Comment by ricraz on Arguments for moral indefinability · 2019-02-13T13:29:22.824Z · score: 5 (3 votes) · LW · GW

For the record, this is probably my key objection to preference utilitarianism, but I didn't want to dive into the details in the post above (for a very long post about such things, see here).

Comment by ricraz on Coherent behaviour in the real world is an incoherent concept · 2019-02-13T12:01:02.458Z · score: 3 (2 votes) · LW · GW

From Rohin's post, a quote which I also endorse:

You could argue that while [building AIs with really weird utility functions] is possible in principle, no one would ever build such an agent. I wholeheartedly agree, but note that this is now an argument based on particular empirical facts about humans (or perhaps agent-building processes more generally).

And if you're going to argue based on particular empirical facts about what goals we expect, then I don't think that doing so via coherence arguments helps very much.

Comment by ricraz on Coherent behaviour in the real world is an incoherent concept · 2019-02-13T11:31:03.996Z · score: 2 (1 votes) · LW · GW
This seems pretty false to me.

I agree that this problem is not a particularly important one, and explicitly discard it a few sentences later. I hadn't considered your objection though, and will need to think more about it.

(Side note: I'm pretty annoyed with all the use of "there's no coherence theorem for X" in this post.)

Mind explaining why? Is this more a stylistic preference, or do you think most of them are wrong/irrelevant?

the "further out" your goal is and the more that your actions are for instrumental value, the more it should look like world 1 in which agents are valuing abstract properties of world states, and the less we should observe preferences over trajectories to reach said states.

Also true if you make world states temporally extended.

Comment by ricraz on Arguments for moral indefinability · 2019-02-12T13:39:10.645Z · score: 4 (2 votes) · LW · GW

If I had to define it using your taxonomy, then yes. However, it's also trying to do something broader. For example, it's intended to be persuasive to people who don't think of meta-ethics in terms of preferences and rationality at all. (The original intended audience was the EA forum, not LW).

Edit: on further reflection, your list is more comprehensive than I thought it was, and maybe the people I mentioned above actually would be on it even if they wouldn't describe themselves that way.

Another edit: maybe the people who are missing from your list are those who would agree that morality has normative force but deny that rationality does (except insofar as it makes you more moral), or at least are much more concerned with the former than the latter. E.g. you could say that morality is a categorical imperative but rationality is only a hypothetical imperative.

Arguments for moral indefinability

2019-02-12T10:40:01.226Z · score: 45 (14 votes)

Coherent behaviour in the real world is an incoherent concept

2019-02-11T17:00:25.665Z · score: 36 (14 votes)
Comment by ricraz on Book Trilogy Review: Remembrance of Earth’s Past (The Three Body Problem) · 2019-01-30T18:58:44.635Z · score: 1 (2 votes) · LW · GW

There are some interesting insights about the overall viewpoint behind this book, but gosh the tone of this post is vicious. I totally understand frustration with stupidity in fiction, and I've written such screeds in my time too. But I think it's well worth moderating the impulse to do so in cases like this where the characters whose absolute stupidity you're bemoaning map onto the outgroup in so many ways.

Comment by ricraz on Too Smart for My Own Good · 2019-01-24T02:00:44.030Z · score: 2 (1 votes) · LW · GW

Agreed, except that the behaviour described could also just be procrastination.

Comment by ricraz on Disentangling arguments for the importance of AI safety · 2019-01-24T01:43:33.998Z · score: 2 (1 votes) · LW · GW

I don't think it depends on how much A and B, because the "expected amount" is not a special point. In this context, the update that I made personally was "There are more shifts than I thought there were, therefore there's probably more of A and B than I thought there was, therefore I should weakly update against AI safety being important." Maybe (to make A and B more concrete) there being more shifts than I thought downgrades my opinion of the original arguments from "absolutely incredible" to "very very good", which slightly downgrades my confidence that AI safety is important.

As a separate issue, conditional on the field being very important, I might expect the original arguments to be very very good, or I might expect them to be very good, or something else. But I don't see how that expectation can prevent a change from "absolutely exceptional" to "very very good" from downgrading my confidence.

Vote counting bug?

2019-01-22T15:44:48.154Z · score: 7 (2 votes)
Comment by ricraz on Disentangling arguments for the importance of AI safety · 2019-01-22T11:43:15.378Z · score: 5 (3 votes) · LW · GW

Apologies if this felt like it was targeted specifically at you and other early AI safety advocates, I have nothing but the greatest respect for your work. I'll rewrite to clarify my intended meaning, which is more an attempt to evaluate the field as a whole. This is obviously a very vaguely-defined task, but let me take a stab at fleshing out some changes over the past decade:

1. There's now much more concern about argument 2, the target loading problem (as well as inner optimisers, insofar as they're distinct).

2. There's now less focus on recursive self-improvement as a key reason why AI will be dangerous, and more focus on what happens when hardware scales up. Relatedly, I think a greater percentage of safety researchers believe that there'll be a slow takeoff than used to be the case.

3. Argument 3 (prosaic AI alignment) is now considered more important and more tractable.

4. There's now been significant criticism of coherence arguments as a reason to believe that AGI will pursue long-term goals in an insatiable maximising fashion.

I may be wrong about these shifts - I'm speaking as a newcomer to the field who has a very limited perspective on how it's evolved over time. If so, I'd be happy to be corrected. If they have in fact occurred, here are some possible (non-exclusive) reasons why:

A. None of the proponents of the original arguments have changed their minds about the importance of those arguments, but new people came into the field because of those arguments, then disagreed with them and formulated new perspectives.

B. Some of the proponents of the original arguments have changed their minds significantly.

C. The proponents of the original arguments were misinterpreted, or overemphasised some of their beliefs at the expense of others, and actually these shifts are just a change in emphasis.

I think none of these options reflect badly on anyone involved (getting everything exactly right the first time is an absurdly high standard), but I think A and B would be weak evidence against the importance of AI safety (assuming you've already conditioned on the size of the field, etc). I also think that it's great when individual people change their minds about things, and definitely don't want to criticise that. But if the field as a whole does so (whatever that means), the dynamics of such a shift are worth examination.

I don't have strong beliefs about the relative importance of A, B and C, although I would be rather surprised if any one of them were primarily responsible for all the shifts I mentioned above.

Comment by ricraz on Disentangling arguments for the importance of AI safety · 2019-01-22T11:07:44.458Z · score: 4 (2 votes) · LW · GW

I endorse ESRogs' answer. If the world were a singleton under the control of a few particularly benevolent and wise humans, with an AGI that obeys the intention of practical commands (in a somewhat naive way, say, so it'd be unable to help them figure out ethics) then I think argument 5 would no longer apply, but argument 4 would. Or, more generally: argument 5 is about how humans might behave badly under current situations and governmental structures in the short term, but makes no claim that this will be a systemic problem in the long term (we could probably solve it using a singleton + mass surveillance); argument 4 is about how we don't know of any governmental(/psychological?) structures which are very likely to work well in the long term.

Having said that, your ideas were the main (but not sole) inspiration for argument 4, so if this isn't what you intended, then I may need to rethink its inclusion.

Disentangling arguments for the importance of AI safety

2019-01-21T12:41:43.615Z · score: 113 (38 votes)
Comment by ricraz on What AI Safety Researchers Have Written About the Nature of Human Values · 2019-01-16T15:52:21.307Z · score: 3 (2 votes) · LW · GW

Nice overview :) One point: the introductory sentences don't seem to match the content.

It is clear to most AI safety researchers that the idea of “human values” is underdefined, and this concept should be additionally formalized before it can be used in (mostly mathematical) models of AI alignment.

In particular, I don't interpret most of the researchers you listed as claiming that "[human values] should be formalized". I think that's a significantly stronger claim than, for example, the claim that we should try to understand human values better.

Comment by ricraz on Open Thread January 2019 · 2019-01-16T15:45:24.388Z · score: 5 (3 votes) · LW · GW

Are you claiming that price per computation would drop in absolute terms, or compared with the world in which Moore's law continued? The first one seems unobjectionable, the default state of everything is for prices to fall since there'll be innovation in other parts of the supply chain. The second one seems false. Basic counter-argument: if it were true, why don't people produce chips from a decade ago which are cheaper per amount of computation than the ones being produced today?

1. You wouldn't have to do R&D, you could just copy old chip designs.

2. You wouldn't have to keep upgrading your chip fabs, you could use old ones.

3. People could just keep collecting your old chips without getting rid of them.

4. Patents on old chip designs have already expired.

Comment by ricraz on Comments on CAIS · 2019-01-14T11:11:11.925Z · score: 7 (3 votes) · LW · GW
AI services can totally be (approximately) VNM rational -- for a bounded utility function.

Suppose an AI service realises that it is able to seize many more resources with which to fulfil its bounded utility function. Would it do so? If no, then it's not rational with respect to that utility function. If yes, then it seems rather unsafe, and I'm not sure how it fits Eric's criterion of using "bounded resources".

Note that CAIS is suggesting that we should use a different prior: the prior based on "how have previous advances in technology come about". I find this to be stronger evidence than how evolution got to general intelligence.

I agree with Eric's claim that R&D automation will speed up AI progress. The point of disagreement is more like: when we have AI technology that's able to do basically all human cognitive tasks (which for want of a better term I'll call AGI, as an umbrella term to include both CAIS and agent AGI), what will it look like? It's true that no past technologies have looked like unified agent AGIs - but no past technologies have also looked like distributed systems capable of accomplishing all human tasks either. So it seems like the evolution prior is still the most relevant one.

"Humans think in terms of individuals with goals, and so even if there's an equally good approach to AGI which doesn't conceive of it as a single goal-directed agent, researchers will be biased against it."
I'm curious how strong an objection you think this is. I find it weak; in practice most of the researchers I know think much more concretely about the systems they implement than "agent with a goal", and these are researchers who work on deep RL. And in the history of AI, there were many things to be done besides "agent with a goal"; expert systems/GOFAI seems like the canonical counterexample.

I think the whole paradigm of RL is an example of a bias towards thinking about agents with goals, and that as those agents become more powerful, it becomes easier to anthropomorphise them (OpenAI Five being one example where it's hard not to think of it as a group of agents with goals). I would withdraw my objection if, for example, most AI researchers took the prospect of AGI from supervised learning as seriously as AGI from RL.

A clear counterargument is that some companies will have AI CEOs, and they will outcompete the others, and so we'll quickly transition to the world where all companies have AI CEOs. I think this is not that important -- having a human in the loop need not slow down everything by a huge margin, since most of the cognitive work is done by the AI advisor, and the human just needs to check that it makes sense (perhaps assisted by other AI services).

I claim that this sense of "in the loop" is irrelevant, because it's equivalent to the AI doing its own thing while the human holds a finger over the stop button. I.e. the AI will be equivalent to current CEOs, the humans will be equivalent to current boards of directors.

To the extent that you are using this to argue that "the AI advisor will be much more like an agent optimising for an open-ended goal than Eric claims", I agree that the AI advisor will look like it is "being a very good CEO". I'm not sure I agree that it will look like an agent optimizing for an open-ended goal, though I'm confused about this.

I think of CEOs as basically the most maximiser-like humans. They have pretty clear metrics which they care about (even if it's not just share price, "company success" is a clear metric by human standards), they are able to take actions that are as broad in scope as basically any actions humans can take (expand to new countries, influence politics, totally change the lives of millions of employees), and almost all of the labour is cognitive, so "advising" is basically as hard as "doing" (modulo human interactions). To do well they need to think "outside the box" of stimulus and response, and deal with worldwide trends and arbitrarily unusual situations (has a hurricane just hit your factory? do you need to hire mercenaries to defend your supply chains?) Most of them have some moral constraints, but also there's a higher percentage of psychopaths than any other role, and it's plausible that we'd have no idea whether an AI doing well as a CEO actually "cares about" these sorts of bounds or is just (temporarily) constrained by public opinion in the same way as the psychopaths.

The main point of CAIS is that services aren't long-term goal-oriented; I agree that if services end up being long-term goal-oriented they become dangerous.

I then mentioned that to build systems which implement arbitrary tasks, you may need to be operating over arbitrarily long time horizons. But probably this also comes down to how decomposable such things are.

If you go via the CAIS route you definitely want to prevent unbounded AGI maximizers from being created until you are sure of their safety or that you can control them. (I know you addressed that in the previous point, but I'm pretty sure that no one is arguing to focus on CAIS conditional on AGI agents existing and being more powerful than CAIS, so it feels like you're attacking a strawman.)

People are arguing for a focus on CAIS without (to my mind) compelling arguments for why we won't have AGI agents eventually, so I don't think this is a strawman.

Given a sufficiently long delay, we could use CAIS to build global systems that can control any new AGIs, in the same way that government currently controls most people.

This depends on having pretty powerful CAIS and very good global coordination, both of which I think of as unlikely (especially given that in a world where CAIS occurs and isn't very dangerous, people will probably think that AI safety advocates were wrong about there being existential risk). I'm curious how likely you think this is though? If agent AGIs are 10x as dangerous, and the probability that we eventually build them is more than 10%, then agent AGIs are the bigger threat.

I also am not sure why you think that AGI agents will optimize harder for self-improvement.

Because they have long-term convergent instrumental goals, and CAIS doesn't. CAIS only "cares" about self-improvement to the extent that humans are instructing it to do so, but humans are cautious and slow. Also because even if building AGI out of task-specific strongly-constrained modules is faster at first, it seems unlikely that it's anywhere near the optimal architecture for self-improvement.

Compared to what? If the alternative is "a vastly superintelligent AGI agent that is acting within what is effectively the society of 2019", then I think CAIS is a better model. I'm guessing that you have something else in mind though.

It's something like "the first half of CAIS comes true, but the services never get good enough to actually be comprehensive/general. Meanwhile fundamental research on agent AGI occurs roughly in parallel, and eventually overtakes CAIS." As a vague picture, imagine a world in which we've applied powerful supervised learning to all industries, and applied RL to all tasks which are either as constrained and well-defined as games, or as cognitively easy as most physical labour, but still don't have AI which can independently do the most complex cognitive tasks (Turing tests, fundamental research, etc).

Comment by ricraz on Comments on CAIS · 2019-01-14T09:12:54.460Z · score: 11 (3 votes) · LW · GW

You're right, this is a rather mealy-mouthed claim. I've edited it to read as follows:

the empirical claim that we'll develop AI services which can replace humans at most cognitively difficult jobs significantly before we develop any single strongly superhuman AGI

This would be false if doing well at human jobs requires capabilities that are near AGI. I do expect a phase transition - roughly speaking I expect progress in automation to mostly require more data and engineering, and progress towards AGI to require algorithmic advances and a cognition-first approach. But the thing I'm trying to endorse in the post is a weaker claim which I think Eric would agree with.

Comment by ricraz on Comments on CAIS · 2019-01-12T20:30:40.539Z · score: 9 (4 votes) · LW · GW
AGI is ... something that approximates an expected utility maximizer.

This seems like a trait which AGIs might have, but not a part of how they should be defined. I think Eric would say that the first AI system which can carry out all the tasks we would expect an AGI to be capable of won't actually approximate an expected utility maximiser, and I consider it an open empirical question whether or not he's right.

Many risk-reducing services (especially ones that can address human safety problems) seem to require high-level general reasoning abilities, whereas many risk-increasing services can just be technical problem solvers or other kinds of narrow intelligences or optimizers, so CAIS is actually quite unsafe, and hard to make safe, whereas AGI / goal-directed agents are by default highly unsafe, but with appropriate advances in safety research can perhaps be made safe.

Yeah, good point. I guess that my last couple of sentences were pretty shallowly-analysed, and I'll retract them and add a more measured conclusion.

Comment by ricraz on Why is so much discussion happening in private Google Docs? · 2019-01-12T18:57:39.285Z · score: 16 (7 votes) · LW · GW

I agree with both those points, and would add:

3. The fact that access to the doc is invite-only, and therefore people feel like they've been specifically asked to participate.

Comments on CAIS

2019-01-12T15:20:22.133Z · score: 69 (18 votes)
Comment by ricraz on You can be wrong about what you like, and you often are · 2018-12-20T18:49:41.172Z · score: 5 (2 votes) · LW · GW
I'm just not one of those people who enjoys "deeper" activities like reading a novel. I like watching TV and playing video games.
I'm just not one of those people who likes healthy foods. You may like salads and swear by them, but I am different. I like pizza and french fries.
I'm just not an intellectual person. I don't enjoy learning.
I'm just not into that early retirement stuff. I need to maintain my current lifestyle in order to be happy.
I'm just not into "good" movies/music/art. I like the Top 50 stuff.

I'm curious why you chose these particular examples. I think they're mostly quite bad and detract from the reasonable point of the overall post. The first three, and the fifth, I'd characterise as "acquired tastes": they're things that people may come to enjoy over time, but often don't currently enjoy. So even someone who would grow to like reading novels, and would have a better life if they read more novels, may be totally correct in stating that they don't enjoy reading novels.

The fourth is a good example for many people, but many others find that retirement is boring. Also, predicting what your life will look like after a radical shift is a pretty hard problem, so if this is the sort of thing people are wrong about it doesn't seem so serious.

More generally, whether or not you enjoy something is different from whether that thing, in the future, will make you happier. At points in this post you conflate those two properties. The examples also give me elitist vibes: the implication seems to be that upper-class pursuits are just better, and people who say they don't like them are more likely to be wrong. (If anything, actually, I'd say that people are more likely to be miscalibrated about their enjoyment of an activity the more prestigious it is, since we're good at deceiving ourselves about status considerations).

Comment by ricraz on You can be wrong about what you like, and you often are · 2018-12-20T18:23:48.890Z · score: 2 (1 votes) · LW · GW

Why is the optimisation space convex?

Comment by ricraz on Double-Dipping in Dunning--Kruger · 2018-11-29T14:59:34.807Z · score: 2 (3 votes) · LW · GW

In general it's understandable not to consider that hypothesis. But when you are specifically making a pointed and insulting comment about another person, I think the bar should be higher.

How democracy ends: a review and reevaluation

2018-11-27T10:50:01.130Z · score: 17 (9 votes)

On first looking into Russell's History

2018-11-08T11:20:00.935Z · score: 35 (11 votes)

Speculations on improving debating

2018-11-05T16:10:02.799Z · score: 26 (10 votes)

Implementations of immortality

2018-11-01T14:20:01.494Z · score: 21 (8 votes)

What will the long-term future of employment look like?

2018-10-24T19:58:09.320Z · score: 11 (4 votes)

Book review: 23 things they don't tell you about capitalism

2018-10-18T23:05:29.465Z · score: 19 (11 votes)

Book review: The Complacent Class

2018-10-13T19:20:05.823Z · score: 21 (9 votes)

Some cruxes on impactful alternatives to AI policy work

2018-10-10T13:35:27.497Z · score: 146 (50 votes)

A compendium of conundrums

2018-10-08T14:20:01.178Z · score: 12 (12 votes)

Thinking of the days that are no more

2018-10-06T17:00:01.208Z · score: 13 (6 votes)

The Unreasonable Effectiveness of Deep Learning

2018-09-30T15:48:46.861Z · score: 86 (26 votes)

Deep learning - deeper flaws?

2018-09-24T18:40:00.705Z · score: 42 (17 votes)

Book review: Happiness by Design

2018-09-23T04:30:00.939Z · score: 14 (6 votes)

Book review: Why we sleep

2018-09-19T22:36:19.608Z · score: 52 (25 votes)

Realism about rationality

2018-09-16T10:46:29.239Z · score: 142 (58 votes)

Is epistemic logic useful for agent foundations?

2018-05-08T23:33:44.266Z · score: 19 (6 votes)

What we talk about when we talk about maximising utility

2018-02-24T22:33:28.390Z · score: 27 (8 votes)

In Defence of Conflict Theory

2018-02-17T03:33:01.970Z · score: 6 (6 votes)

Is death bad?

2018-01-13T04:55:25.788Z · score: 8 (4 votes)