Posts

Seven habits towards highly effective minds 2019-09-05T23:10:01.020Z · score: 39 (10 votes)
What explanatory power does Kahneman's System 2 possess? 2019-08-12T15:23:20.197Z · score: 33 (16 votes)
Why do humans not have built-in neural i/o channels? 2019-08-08T13:09:54.072Z · score: 26 (12 votes)
Book review: The Technology Trap 2019-07-20T12:40:01.151Z · score: 30 (14 votes)
What are some of Robin Hanson's best posts? 2019-07-02T20:58:01.202Z · score: 36 (10 votes)
On alien science 2019-06-02T14:50:01.437Z · score: 46 (15 votes)
A shift in arguments for AI risk 2019-05-28T13:47:36.486Z · score: 27 (11 votes)
Would an option to publish to AF users only be a useful feature? 2019-05-20T11:04:26.150Z · score: 14 (5 votes)
Which scientific discovery was most ahead of its time? 2019-05-16T12:58:14.628Z · score: 39 (10 votes)
When is rationality useful? 2019-04-24T22:40:01.316Z · score: 29 (7 votes)
Book review: The Sleepwalkers by Arthur Koestler 2019-04-23T00:10:00.972Z · score: 75 (22 votes)
Arguments for moral indefinability 2019-02-12T10:40:01.226Z · score: 45 (14 votes)
Coherent behaviour in the real world is an incoherent concept 2019-02-11T17:00:25.665Z · score: 36 (14 votes)
Vote counting bug? 2019-01-22T15:44:48.154Z · score: 7 (2 votes)
Disentangling arguments for the importance of AI safety 2019-01-21T12:41:43.615Z · score: 120 (42 votes)
Comments on CAIS 2019-01-12T15:20:22.133Z · score: 69 (18 votes)
How democracy ends: a review and reevaluation 2018-11-27T10:50:01.130Z · score: 17 (9 votes)
On first looking into Russell's History 2018-11-08T11:20:00.935Z · score: 35 (11 votes)
Speculations on improving debating 2018-11-05T16:10:02.799Z · score: 26 (10 votes)
Implementations of immortality 2018-11-01T14:20:01.494Z · score: 21 (8 votes)
What will the long-term future of employment look like? 2018-10-24T19:58:09.320Z · score: 11 (4 votes)
Book review: 23 things they don't tell you about capitalism 2018-10-18T23:05:29.465Z · score: 19 (11 votes)
Book review: The Complacent Class 2018-10-13T19:20:05.823Z · score: 21 (9 votes)
Some cruxes on impactful alternatives to AI policy work 2018-10-10T13:35:27.497Z · score: 150 (52 votes)
A compendium of conundrums 2018-10-08T14:20:01.178Z · score: 12 (12 votes)
Thinking of the days that are no more 2018-10-06T17:00:01.208Z · score: 13 (6 votes)
The Unreasonable Effectiveness of Deep Learning 2018-09-30T15:48:46.861Z · score: 86 (26 votes)
Deep learning - deeper flaws? 2018-09-24T18:40:00.705Z · score: 42 (17 votes)
Book review: Happiness by Design 2018-09-23T04:30:00.939Z · score: 14 (6 votes)
Book review: Why we sleep 2018-09-19T22:36:19.608Z · score: 52 (25 votes)
Realism about rationality 2018-09-16T10:46:29.239Z · score: 153 (66 votes)
Is epistemic logic useful for agent foundations? 2018-05-08T23:33:44.266Z · score: 19 (6 votes)
What we talk about when we talk about maximising utility 2018-02-24T22:33:28.390Z · score: 27 (8 votes)
In Defence of Conflict Theory 2018-02-17T03:33:01.970Z · score: 6 (6 votes)
Is death bad? 2018-01-13T04:55:25.788Z · score: 8 (4 votes)

Comments

Comment by ricraz on The Power to Solve Climate Change · 2019-09-12T20:02:43.399Z · score: 3 (2 votes) · LW · GW

This post says interesting and specific things about climate change, and then suddenly gets very dismissive and non-specific when it comes to individual action. And as you predict in your other posts, this leads to mistakes. You say "your causal model of how your actions will affect greenhouse gas concentrations is missing the concept of an economic equilibrium". But the whole problem of climate change is that the harm of carbon emissions affects the equilibrium point of economic activity so little. You even identify the key point ("our economy lets everyone emit carbon for free") without realizing that this implies replacement effects are very weak. Who will fly more if I fly less? In fact, since many industries have economies of scale, me flying less or eating less meat quite plausibly increases prices and decreases the carbon emissions of others.

And yes, there are complications - farm subsidies, discontinuities in response curves, etc. But decreasing personal carbon footprint also has effects on cultural norms which can add up to larger political change. That seems pretty important - even though, in general, it's the type of thing that it's very difficult to be specific about even for historical examples, let alone future ones. Dismissing these sort of effects feels very much like an example of the "valley of bad rationality".

Comment by ricraz on Concrete experiments in inner alignment · 2019-09-10T13:04:44.999Z · score: 3 (2 votes) · LW · GW
to what extent models tend to learn their goals internally vs. via reference to things in their environment

I'm not sure what this distinction is trying to refer to. Goals are both represented internally, and also refer to things in the agent's environments. Is there a tension there?

Comment by ricraz on Utility ≠ Reward · 2019-09-09T02:53:44.193Z · score: 2 (1 votes) · LW · GW

Yes, I'm assuming cumulatively-calculated reward. In general this is a fairly standard assumption (rewards being defined for every timestep is part of the definition of MDPs and POMDPs, and given that I don't see much advantage in delaying computing it until the end of the episode). For agents like AlphaGo observing these rewards obviously won't be very helpful though since those rewards are all 0 until the last timestep. But in general I expect rewards to occur multiple times per episode when training advanced agents, especially as episodes get longer.

Comment by ricraz on Utility ≠ Reward · 2019-09-08T18:16:34.897Z · score: 2 (1 votes) · LW · GW

In the context of reinforcement learning, it's literally just the reward provided by the environment, which is currently fed only to the optimiser, not to the agent. How to make those rewards good ones is a separate question being answered by research directions like reward modelling and IDA.

Comment by ricraz on Utility ≠ Reward · 2019-09-06T17:11:21.168Z · score: 8 (4 votes) · LW · GW
So the reward function can’t be the policy’s objective – one cannot be pursuing something one has no direct access to.

One question I've been wondering about recently is what happens if you actually do give an agent access to its reward during training. (Analogy for humans: a little indicator in the corner of our visual field that lights up whenever we do something that increases the number or fitness of our descendants). Unless the reward is dense and highly shaped, the agent still has to come up with plans to do well on difficult tasks, it can't just delegate those decisions to the reward information. Yet its judgement about which things are promising will presumably be better-tuned because of this extra information (although eventually you'll need to get rid of it in order for the agent to do well unsupervised).

On the other hand, adding reward to the agent's observations also probably makes the agent more likely to tamper with the physical implementation of its reward, since it will be more likely to develop goals aimed at the reward itself, rather than just the things the reward is indicating. (Analogy for humans: because we didn't have a concept of genetic fitness while evolving, it was hard for evolution to make us care about that directly. But if we'd had the indicator light, we might have developed motivations specifically directed towards it, and then later found out that the light was "actually" the output of some physical reward calculation).

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-06T14:11:16.497Z · score: 2 (1 votes) · LW · GW

I don't think I'm claiming that the value prop stories of bad startups will be low-delta overall, just that the delta will be more spread out and less specific. Because the delta of the cryobacterium article, multiplied by a million articles, is quite big, and Golden can say that this is what they'll achieve regardless of how bad they actually are. And more generally, the delta to any given consumer of a product that's better than all its competitors on several of the dimensions I listed above can be pretty big.

Rather, I'm claiming that there are a bunch of startups which will succeed because they do well on the types of things I listed above, and that the Value Prop Story sanity check can't distinguish between startups that will and won't do well on those things in advance. Consider a startup which claims that they will succeed over their competitors because they'll win at advertising. This just isn't the type of thing which we can evaluate well using the Value Prop Story test as you described it:

1. Winning at advertising isn't about providing more value for any given consumer - indeed, to the extent that advertising hijacks our attention, it plausibly provides much less value.

2. The explanation for why that startup thinks they will win on advertising might be arbitrarily non-specific. Maybe the founder has spent decades observing the world and building up strong intuitions about how advertising works, which it would take hours to explain. Maybe the advertising team is a strongly-bonded cohesive unit which the founder trusts deeply.

3. Startups which are going to win at advertising (or other aspects of high-quality non-customer-facing execution) might not even know anything about how well their competitors are doing on those tasks. E.g. I expect someone who's generically incredibly competent to beat their competitors in a bunch of ways even if they have no idea how good their competitors are. The value prop sanity check would reject this person. And if, like I argued above, being "generically incredibly competent" is one of the most important contributors to startup success, then rejecting this type of person makes the sanity check have a lot of false negatives, and therefore much less useful.

Comment by ricraz on Seven habits towards highly effective minds · 2019-09-06T13:39:32.183Z · score: 2 (1 votes) · LW · GW

Hmm, could you say more? I tend to think of social influences as good for propagating ideas - as opposed to generating new ones, which seems to depend more on the creativity of individuals or small groups.

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-05T19:15:50.387Z · score: 2 (1 votes) · LW · GW

I guess I want there to be a minimum lower standard for a Value Prop Story. If you are allowed to say things like "our product will look better and it will be cooler and customers will like our support experience more", then every startup ever has a value prop story. If we're allowing value prop stories of that low quality, then Golden's story could be "our articles will be better than Wikipedia's". Whereas when Liron said that 80% of startups don't have a value prop story, they seemed to be talking about a higher bar than that.

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-05T14:14:43.876Z · score: 8 (4 votes) · LW · GW

Intuitively I like this criterion, but it conflicts with another belief I have about startups, which is that the quality of execution is absolutely crucial. And high-quality execution is the sort of thing it's hard to tell a Value Prop Story about, because it looks like "a breadboard full of little bumps of value" rather than "a single sharp spike of value".

To be more specific, if a startup A has already created a MVP, and someone else wants to found startup B that does exactly the same thing because their team is better at:

  • UX design
  • Hiring
  • Coding
  • Minimising downtime
  • Advertising and publicity
  • Sales and partnerships
  • Fundraising
  • Budgeting
  • Customer support
  • Expanding internationally
  • Being cool

then I expect B to beat A despite not having a convincing Value Prop Story that can be explained in advance (or even in hindsight). And it seems like rather than being a rare exception, it's quite common for multiple startups to be competing in the same space and cloning each other's features, with success going to whoever executes best (more concretely: the many bike-sharing companies; food delivery companies; a bunch of banking startups in the UK; maybe FB vs MySpace?). In those cases, the lack of a Value Prop Story is a false negative and will lead you to underestimate the success of whichever company ends up winning.

Comment by ricraz on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-18T14:40:49.621Z · score: 7 (4 votes) · LW · GW

On 1: I think there's a huge amount for philosophers to do. I think of Dennett as laying some of the groundwork which will make the rest of that work easier (such as identifying that the key question is when it's useful to use an intentional stance, rather than trying to figure out which things are objectively "agents") but the key details are still very vague. Maybe the crux of our disagreement here is how well-specified "treating something as if it's a rational agent" actually is. I think that definitions in terms of utility functions just aren't very helpful, and so we need more conceptual analysis of what phrases like this actually mean, which philosophers are best-suited to provide.

On 2: you're right, as written it does subsume parts of your list. I guess when I wrote that I was thinking that most of the value would come from clarification of the most well-known arguments (i.e. the ones laid out in Superintelligence and What Failure Looks Like). I endorse philosophers pursuing all the items on your list, but from my perspective the disjoint items on my list are much higher priorities.

Comment by ricraz on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-17T21:45:46.559Z · score: 13 (14 votes) · LW · GW

Interestingly, I agree with you that philosophers could make important contributions to AI safety, but my list of things that I'd want them to investigate is almost entirely disjoint from yours.

The most important points on my list:

1. Investigating how to think about agency and goal-directed behaviour, along the lines of Dennett’s work on the intentional stance. How do they relate to intelligence and the ability to generalise across widely different domains? These are crucial concepts which are still very vague.

2. Lay out the arguments for AGI being dangerous as rigorously and comprehensively as possible, noting the assumptions which are being made and how plausible they are.

3. Evaluating the assumptions about the decomposability of cognitive work which underlie debate and IDA (in particular: the universality of humans, and the role of language).

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:59:19.790Z · score: 3 (2 votes) · LW · GW
But human nervous systems do have much higher bandwidth communication channels. We share them with the other mammals. It's the limbic system.

I'm quite uncertain about how high-bandwidth this actually is. I agree that in the first second of meeting someone, it's much more informative than language could be. Once the initial "first impression" has occurred, though, the rate of communication drops off sharply, and I think that language could overtake it after a few minutes. For example, it takes half a second to say "I'm nervous", and you can keep saying similarly-informative things for a long time: do you think you could get a new piece of similar information every half second for ten minutes via the limbic system?

(Note that I'm not necessarily saying people do communicate information about their emotions, personality and social status faster via language, just that they could).

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:52:45.345Z · score: 2 (1 votes) · LW · GW
The success of human society is a good demonstration of how very low complexity systems and behaviours can drive your competition extinct, magnify available resources, and more

On what basis are you calling human societies "very low complexity systems"? The individual units are humans, whose brains are immensely complex; and then the interactions between humans are often complicated enough that nobody has a good understanding of the system as a whole.

Ultimately there seems to be no impetus for a half-baked neuron tentacle, and a lot of cost and risk, so that will probably never be the path to such organisms.

This seems somewhat intuitive, but note that this statement is a universal negative: it's saying there is no plausible path to this outcome. In general I think we should be pretty cautious about such statements, since a lot of evolutionary innovations would have seemed deeply implausible before they happened. For example, the elephant's trunk is a massive neuron-rich tentacle which is heavily used for communication.

There are lots of simple things that organisms could do to make them wildly more successful.

I guess this is the crux of our disagreement - could you provide some examples?

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:43:51.177Z · score: 6 (3 votes) · LW · GW
Anyone who's done any infosec or network protocol work will laugh at the idea that trial and error (evolution) can make a safe high-bandwidth connection.

There are hundreds of things that people have laughed at the idea of trial and error (evolution) doing, which evolution in fact did. Thinking that evolution is dumb is generally not a good heuristic.

Also, I'm not sure what counts as safe in this context. Is language safe? Is sight?

Comment by ricraz on Mistake Versus Conflict Theory of Against Billionaire Philanthropy · 2019-08-01T18:33:02.956Z · score: 12 (9 votes) · LW · GW

Downvoted for being very hyperbolic ("less than zero", "his opponents here are all pure conflict theorists"), uncharitable/making personal attacks ("There are people opposed to nerds or to thinking", "There are people who are opposed to action of any kind"), and not substantiating these extreme views ("Because, come on. Read their quotes. Consider their arguments.")

As a more object-level comment, suppose I accept the hypothetical that all attacks on billionaire philanthropy are entirely aimed at reducing the power of billionaires. Yet if we build a societal consensus that this particular attack is very misguided and causes undesirable collateral damage, then even people who are just as anti-billionaire as before will be less likely to use it. E.g. it's much harder to criticise billionaires who only spend their money on helping kids with cancer.

Comment by ricraz on How can guesstimates work? · 2019-07-11T23:48:54.035Z · score: 16 (6 votes) · LW · GW

Very interesting question - the sort that makes me swing between thinking it's brilliant and thinking it's nonsense. I do think you overstate your premise. In almost all of the examples given in The Secret of our Success, the relevant knowledge is either non-arbitrary (e.g. the whole passage about hunting seals makes sense, it's just difficult to acquire all that knowledge), or there's a low cost to failure (try a different wood for your arrows; if they don't fly well, go back to basics).

If I engage with the question as posed, though, my primary answer is simply that over time we became wealthy and technologically capable enough that we were able to replace all the natural things that might kill us with whatever we're confident won't kill us. Which is why you can improvise while cooking - all of the ingredients have been screened very hard for safety. This is closely related to your first hypothesis.

However, this still leaves open a slightly different question. The modern world is far too complicated for anyone to understand, and so we might wonder why incomprehensible emergent effects don't render our daily lives haphazard and illegible. One partial answer is that even large-scale components of the world (like countries and companies) were designed by humans. A second partial answer, though, is that even incomprehensible patterns and mechanisms in the modern world still interact with you via other people.

This has a couple of effects. Firstly, other people try to be legible, it's just part of human interaction. (If the manioc could bargain with you, it'd be much easier to figure out how to process it properly.)

Secondly, there's an illusion of transparency because we're so good at and so used to understanding other people. Social interactions are objectively very complicated: in fact, they're "cultural norms and processes which appear arbitrary, yet could have fatal consequences if departed from". Yet it doesn't feel like the reason I refrain from spitting on strangers is arbitrary (even though I couldn't explain the causal pathway by which people started considering it rude). Note also that the space of ideas that startups explore is heavily constrained by social norms and laws.

Thirdly, facts about other humans serve as semantic stop signs. Suppose your boss fires you, because you don't get along. There's a nearly unlimited amount of complexity which shaped your personality, and your boss' personality, and the fact that you ended up in your respective positions. But once you've factored it out into "I'm this sort of person, they're that sort of person", it feels pretty legible - much more than "some foods are eat-raw sorts of foods, other foods are eat-cooked sorts of foods". (Or at least, it feels much more legible to us today - maybe people used to find the latter explanation just as compelling). A related stop sign is the idea that "somebody knows" why each step of a complex causal chain happened, which nudges us away from thinking of the chain as a whole as illegible.

So I've given two reasons for increased legibility (humans building things, and humans explaining things), and two for the illusion of legibility (illusion of transparency, and semantic stop signs). I think on small scales, the former effects predominate. But on large scales, the latter predominate - the world seems more legible than it actually is. For example:

The world seems legible -- I can roughly predict how many planes fly every day by multiplying a handful rough numbers.

Roughly predicting the number of planes which fly every day is a very low bar! You can also predict the number of trees in a forest by multiplying a handful of numbers. This doesn't help you survive in that forest. What helps you survive in the forest is being able to predict the timing of storms and the local tiger population. In the modern world, what helps you thrive is being able to predict the timing of recessions and crime rate trends. I don't think we're any better at the latter two than our ancestors were at the former. In fact, the large-scale arcs of our lives are now governed to a much greater extent by very unpredictable and difficult-to-understand events, such as scientific discoveries, technological innovation and international relations.

In summary, technology has helped us replace individual objects in our environments with safer and more legible alternatives, and the emergent complexity which persists in our modern environments is now either mediated by people, or still very tricky to predict (or both).

Comment by ricraz on The AI Timelines Scam · 2019-07-11T19:30:17.407Z · score: 22 (11 votes) · LW · GW
But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.

I agree that there are clear costs to making extra arguments of the form "timelines are short because technology X and technology Y will fit together neatly". However, you could still make public that your timelines are a given probability distribution D, and the reasons which led you to that conclusion are Z% object-level views which you won't share, and (100-Z)% base rate reasoning and other outside-view considerations, which you will share.

I think there are very few costs to declaring which types of reasoning you're most persuaded by. There are some costs to actually making the outside-view reasoning publicly available - maybe people who read it will better understand the AI landscape and use that information to do capabilities research.

But having a lack of high-quality public timelines discussion also imposes serious costs, for a few reasons:

1. It means that safety researchers are more likely to be wrong, and therefore end up doing less relevant research. I am generally pretty skeptical of reasoning that hasn't been written down and undergone public scrutiny.

2. It means there's a lot of wasted motion across the safety community, as everyone tries to rederive the various arguments involved, and figure out why other people have the views they do, and who they should trust.

3. It makes building common knowledge (and the coordination which that knowledge can be used for) much harder.

4. It harms the credibility of the field of safety from the perspective of outside observers, including other AI researchers.

Also, the more of a risk you think 1 is, the lower the costs of disclosure are, because it becomes more likely that any information gleaned from the disclosure is wrong anyway. Yet predicting the future is incredibly hard! So the base rate for correctness here is low. And I don't think that safety researchers have a compelling advantage when it comes to correctly modelling how AI will reach human level (compared with thoughtful ML researchers).

Consider, by analogy, a debate two decades ago about whether to make public the ideas of recursive self-improvement and fast takeoff. The potential cost of that is very similar to the costs of disclosure now - giving capabilities researchers these ideas might push them towards building self-improving AIs faster. And yet I think making those arguments public was clearly the right decision. Do you agree that our current situation is fairly analogous?

EDIT: Also, I'm a little confused by

Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say.

I understand that there are good reasons for discussions to be private, but can you elaborate on why we'd want discussions about privacy to be private?

Comment by ricraz on Embedded Agency: Not Just an AI Problem · 2019-06-27T14:53:43.738Z · score: 13 (4 votes) · LW · GW
We have strong outside-view reasons to expect that the information processing in question probably approximates Bayesian reasoning (for some model of the environment), and the decision-making process approximately maximizes some expected utility function (which itself approximates fitness within the ancestral environment).

The use of "approximates" in this sentence (and in the post as a whole) is so loose as to be deeply misleading - for the same reasons that the "blue-minimising robot" shouldn't be described as maximising some expected utility function, and the information processing done by a single neuron shouldn't be described as Bayesian reasoning (even approximately!)

See also: coherent behaviour in the real world is an incoherent concept.

Comment by ricraz on Let's talk about "Convergent Rationality" · 2019-06-26T20:40:33.678Z · score: 4 (2 votes) · LW · GW
There is at least one example (I've struggled to dig up) of a memory-less RL agent learning to encode memory information in the state of the world.

I recall an example of a Mujoco agent whose memory was periodically wiped storing information in the position of its arms. I'm also having trouble digging it up though.

Comment by ricraz on Risks from Learned Optimization: Introduction · 2019-06-07T14:49:02.481Z · score: 6 (4 votes) · LW · GW
We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.

I appreciate the difficulty of actually defining optimizers, and so don't want to quibble with this definition, but am interested in whether you think humans are a central example of optimizers under this definition, and if so whether you think that most mesa-optimizers will "explicitly represent" their objective functions to a similar degree that humans do.

Comment by ricraz on On alien science · 2019-06-03T01:19:05.488Z · score: 3 (2 votes) · LW · GW

Agreed that this points in the right direction. I think there's more to it than that though. Consider for example a three-body problem under Newtonian mechanics. Then there's a sense in which specifying the initial masses and velocities of the bodies, along with Newton's laws of motion, is the best way to compress the information about these chaotic trajectories.

But there's still an open question here, which is why are three-body systems chaotic? Two-body systems aren't. What makes the difference? Finding an explanation probably doesn't allow you to compress any data any more, but it still seems important and interesting.

(This seems related to a potential modification of your data compression standard: that good explanations compress data in a way that minimises not just storage space, but also the computation required to unpack the data. I'm a little confused about this though.)

Comment by ricraz on Book review: The Sleepwalkers by Arthur Koestler · 2019-05-28T16:32:26.852Z · score: 4 (2 votes) · LW · GW

Thanks for the kind words. I agree that refactoring would be useful, but don't have the time now. I have added some headings though.

Comment by ricraz on "Other people are wrong" vs "I am right" · 2019-05-24T14:42:11.650Z · score: 4 (2 votes) · LW · GW

A relevant book recommendation: The Enigma of Reason argues that thinking of high-level human reasoning as a tool for attacking other people's beliefs and defending our own (regardless of their actual veracity) helps explain a lot of weird asymmetries in cognitive biases we're susceptible to, including this one.

Comment by ricraz on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-05-22T15:25:37.118Z · score: 7 (4 votes) · LW · GW

I'd like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that's difficult to formalise (e.g. somewhere within a neural network).

It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection - hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you're a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.

Comment by ricraz on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-05-22T15:19:14.045Z · score: 6 (3 votes) · LW · GW

This paper by Critch is relevant: it argues that agents with different beliefs will bet their future share of a merged utility function, such that it skews towards whoever's predictions are more correct.

Comment by ricraz on What are the open problems in Human Rationality? · 2019-05-21T23:54:16.026Z · score: 10 (3 votes) · LW · GW

Which policies in particular?

Comment by ricraz on What are the open problems in Human Rationality? · 2019-05-21T11:27:43.800Z · score: 2 (1 votes) · LW · GW

This point seems absolutely crucial; and I really appreciate the cited evidence.

Comment by ricraz on Which scientific discovery was most ahead of its time? · 2019-05-17T10:56:46.413Z · score: 12 (4 votes) · LW · GW

Actually, general relativity seems to have been discovered by Hilbert at almost exactly the same time that Einstein did.

https://en.wikipedia.org/wiki/Relativity_priority_dispute#General_relativity_2

Comment by ricraz on Which scientific discovery was most ahead of its time? · 2019-05-16T15:12:27.544Z · score: 2 (1 votes) · LW · GW

Biggest jump forward.

Comment by ricraz on The Vulnerable World Hypothesis (by Bostrom) · 2019-05-16T15:10:58.071Z · score: 6 (3 votes) · LW · GW

Does anyone know how this paper relates to Paul Christiano's blog post titled Handling destructive technology, which seems to preempt some of the key ideas? It's not directly acknowledged in the paper.

Comment by ricraz on Eight Books To Read · 2019-05-15T13:31:11.454Z · score: 8 (4 votes) · LW · GW

Interesting list. How would you compare reading the best modern summaries and analyses of the older texts, versus reading them in the original?

Quigley’s career demonstrates an excellent piece of sociological methodology... He builds a theory that emphasizes the importance of elites, and subsequently goes and talks to members of the elite to test and apply the theory.

I'm not sure if this is meant to be ironic, but that methodology seems like an excellent way to introduce confirmation bias. I guess it's excellent compared to not going and talking to anyone at all?

Comment by ricraz on When is rationality useful? · 2019-05-01T01:31:40.027Z · score: 2 (1 votes) · LW · GW

Depends what type of research. If you're doing experimental cell biology, it's less likely that your research will be ruined by abstract philosophical assumptions which can't be overcome by looking at the data.

Comment by ricraz on When is rationality useful? · 2019-05-01T01:27:55.792Z · score: 2 (1 votes) · LW · GW
So when is rationality relevant? Always! It's literally the science of how to make your life better / achieving your values.

Sometimes science isn't helpful or useful. The science of how music works may be totally irrelevant to actual musicians.

If you think of instrumental rationality of the science of how to win, then necessarily it entails considering things like how to setup your environment, unthinking habits, how to "hack" into your psyche/emotions.

It's an empirical question when and whether these things are very useful; my post gives cases in which they are, and in which they aren't.

Comment by ricraz on When is rationality useful? · 2019-04-26T21:46:24.590Z · score: 3 (2 votes) · LW · GW
Some effort spent in determining which things are good, and in which things lead to more opportunity for good is going to be rewarded (statistically) with better outcomes.

All else equal, do you think a rationalist mathematician will become more successful in their field than a non-rationalist mathematician? My guess is that if they spent the (fairly significant) time taken to learn and do rationalist things on just learning more maths, they'd do better.

(Here I'm ignoring the possibility that learning rationality makes them decide to leave the field).

I'll also wave at your wave at the recursion problem: "when is rationality useful" is a fundamentally rationalist question both in the sense of being philosophical, and in the sense that answering it is probably not very useful for actually improving your work in most fields.

Comment by ricraz on When is rationality useful? · 2019-04-26T21:42:47.987Z · score: 3 (2 votes) · LW · GW

When I talk about doing useful work, I mean something much more substantial than what you outline above. Obviously 15 minutes every day thinking about your problems is helpful, but the people at the leading edges of most fields spend all day thinking about their problems.

Perhaps doing this ritual makes you think about the problem in a more meta way. If so, there's an empirical question about how much being meta can spark clever solutions. Here I have an intuition that it can, but when I look at any particular subfield that intuition becomes much weaker. How much could a leading mathematician gain by being more meta, for example?

Comment by ricraz on When is rationality useful? · 2019-04-25T05:40:46.530Z · score: 7 (4 votes) · LW · GW

I agree with this. I think the EA example I mentioned fits this pattern fairly well - the more rational you are, the more likely you are to consider what careers and cause areas actually lead to the outcomes you care about, and go into one of those. But then you need the different skill of actually being good at it.

Comment by ricraz on When is rationality useful? · 2019-04-25T05:39:54.137Z · score: 4 (2 votes) · LW · GW

This seems to be roughly orthogonal to what I'm claiming? Whether you get the benefits from rationality quickly or slowly is distinct from what those benefits actually are.

Comment by ricraz on Book review: The Sleepwalkers by Arthur Koestler · 2019-04-23T02:56:14.745Z · score: 8 (4 votes) · LW · GW

Hmm, interesting. It doesn't discuss the Galileo affair, which seems like the most important case where the distinction is relevant. Nevertheless, in light of this, "geocentric models with epicycles had always been in the former category" is too strong and I'll amend it accordingly.

Comment by ricraz on What failure looks like · 2019-04-18T02:40:29.012Z · score: 2 (1 votes) · LW · GW

Mostly I am questioning whether things will turn out badly this way.

Do you not expect this threshold to be crossed sooner or later, assuming AI alignment remains unsolved?

Probably, but I'm pretty uncertain about this. It depends on a lot of messy details about reality, things like: how offense-defence balance scales; what proportion of powerful systems are mostly aligned; whether influence-seeking systems are risk-neutral; what self-governance structures they'll set up; the extent to which their preferences are compatible with ours; how human-comprehensible the most important upcoming scientific advances are.

Comment by ricraz on What failure looks like · 2019-04-17T16:53:53.343Z · score: 2 (1 votes) · LW · GW
I think the idea is that once influence-seeking systems gain a certain amount of influence, it may become faster or more certain for them to gain more influence by causing a catastrophe than to continue to work within existing rules and institutions.

The key issue here is whether there will be coordination between a set of influence-seeking systems that can cause (and will benefit from) a catastrophe, even when other systems are opposing them. If we picture systems as having power comparable to what companies have now, that seems difficult. If we picture them as having power comparable to what countries have now, that seems fairly easy.

Comment by ricraz on What failure looks like · 2019-04-09T16:41:01.080Z · score: 11 (5 votes) · LW · GW
Eventually we reach the point where we could not recover from a correlated automation failure. Under these conditions influence-seeking systems stop behaving in the intended way, since their incentives have changed---they are now more interested in controlling influence after the resulting catastrophe then continuing to play nice with existing institutions and incentives.

I'm not sure I understand this part. The influence-seeking systems which have the most influence also have the most to lose from a catastrophe. So they'll be incentivised to police each other and make catastrophe-avoidance mechanisms more robust.

As an analogy: we may already be past the point where we could recover from a correlated "world leader failure": every world leader simultaneously launching a coup. But this doesn't make such a failure very likely, unless world leaders also have strong coordination and commitment mechanisms between themselves (which are binding even after the catastrophe).

Comment by ricraz on What are CAIS' boldest near/medium-term predictions? · 2019-03-29T00:22:57.277Z · score: 2 (1 votes) · LW · GW

The 75% figure is from now until single agent AGI. I measure it proportionately because otherwise it says more about timeline estimates than about CAIS.

Comment by ricraz on What are CAIS' boldest near/medium-term predictions? · 2019-03-28T14:55:24.064Z · score: 7 (4 votes) · LW · GW

The operationalisation which feels most natural to me is something like:

  • Make a list of cognitively difficult jobs (lawyer, doctor, speechwriter, CEO, engineer, scientist, accountant, trader, consultant, venture capitalist, etc...)
  • A job is automatable when there exists a publicly accessible AI service which allows an equally skilled person to do just as well in less than 25% of the time that it used to take a specialist, OR which allows someone with little skill or training to do the job in about the same time that it used to take a specialist.
  • I claim that over 75% of the jobs on this list will be automatable within 75% of the time until a single superhuman AGI is developed.
  • (Note that there are three free parameters in this definition, which I've set to arbitrary numbers that seem intuitively reasonable).
Comment by ricraz on Disentangling arguments for the importance of AI safety · 2019-02-27T11:38:25.158Z · score: 4 (2 votes) · LW · GW

Thanks! I agree that more connection to past writings is always good, and I'm happy to update it appropriately - although, upon thinking about it, there's nothing which really comes to mind as an obvious omission (except perhaps citing sections of Superintelligence?) Of course I'm pretty biased, since I already put in the things which I thought were most important - so I'd be glad to hear any additional suggestions you have.

Comment by ricraz on How to get value learning and reference wrong · 2019-02-27T11:23:57.145Z · score: 4 (2 votes) · LW · GW

Kudos for writing about making mistakes and changing your mind. If I'm interpreting you correctly, your current perspective is quite similar to mine (which I've tried to explain here and here).

Comment by ricraz on Three Kinds of Research Documents: Clarification, Explanatory, Academic · 2019-02-15T17:43:42.288Z · score: 10 (3 votes) · LW · GW

Agreed that "clarification" is confusing. What about "exploration"?

Comment by ricraz on Arguments for moral indefinability · 2019-02-13T13:44:18.278Z · score: 5 (3 votes) · LW · GW

Thanks for the detailed comments! I only have time to engage with a few of them:

Most of this is underdefined, and that’s unsettling at least in some (but not necessarily all) cases, and if we want to make it less underdefined, the notion of 'one ethics' has to give.

I'm not that wedded to 'one ethics', more like 'one process for producing moral judgements'. But note that if we allow arbitrariness of scope, then 'one process' can be a piecewise function which uses one subprocess in some cases and another in others.

I find myself having similarly strong meta-level intuitions about wanting to do something that is "non-arbitrary" and in relevant ways "simple/elegant". ...motivationally it feels like this intuition is importantly connected to what makes it easy for me to go "all-in“ for my ethical/altruistic beliefs.

I agree that these intuitions are very strong, and they are closely connected to motivational systems. But so are some object-level intuitions like "suffering is bad", and so the relevant question is what you'd do if it were a choice between that and simplicity. I'm not sure your arguments distinguish one from the other in that context.

one can maybe avoid to feel this uncomfortable feeling of uncertainty by deferring to idealized reflection. But it’s not obvious that this lastingly solves the underlying problem

Another way of phrasing this point: reflection is almost always good for figuring out what's the best thing to do, but it's not a good way to define what's the best thing to do.

Comment by ricraz on Arguments for moral indefinability · 2019-02-13T13:29:22.824Z · score: 5 (3 votes) · LW · GW

For the record, this is probably my key objection to preference utilitarianism, but I didn't want to dive into the details in the post above (for a very long post about such things, see here).

Comment by ricraz on Coherent behaviour in the real world is an incoherent concept · 2019-02-13T12:01:02.458Z · score: 3 (2 votes) · LW · GW

From Rohin's post, a quote which I also endorse:

You could argue that while [building AIs with really weird utility functions] is possible in principle, no one would ever build such an agent. I wholeheartedly agree, but note that this is now an argument based on particular empirical facts about humans (or perhaps agent-building processes more generally).

And if you're going to argue based on particular empirical facts about what goals we expect, then I don't think that doing so via coherence arguments helps very much.

Comment by ricraz on Coherent behaviour in the real world is an incoherent concept · 2019-02-13T11:31:03.996Z · score: 2 (1 votes) · LW · GW
This seems pretty false to me.

I agree that this problem is not a particularly important one, and explicitly discard it a few sentences later. I hadn't considered your objection though, and will need to think more about it.

(Side note: I'm pretty annoyed with all the use of "there's no coherence theorem for X" in this post.)

Mind explaining why? Is this more a stylistic preference, or do you think most of them are wrong/irrelevant?

the "further out" your goal is and the more that your actions are for instrumental value, the more it should look like world 1 in which agents are valuing abstract properties of world states, and the less we should observe preferences over trajectories to reach said states.

Also true if you make world states temporally extended.