Technical AGI safety research outside AI 2019-10-18T15:00:22.540Z · score: 33 (11 votes)
Seven habits towards highly effective minds 2019-09-05T23:10:01.020Z · score: 39 (10 votes)
What explanatory power does Kahneman's System 2 possess? 2019-08-12T15:23:20.197Z · score: 33 (16 votes)
Why do humans not have built-in neural i/o channels? 2019-08-08T13:09:54.072Z · score: 26 (12 votes)
Book review: The Technology Trap 2019-07-20T12:40:01.151Z · score: 30 (14 votes)
What are some of Robin Hanson's best posts? 2019-07-02T20:58:01.202Z · score: 36 (10 votes)
On alien science 2019-06-02T14:50:01.437Z · score: 46 (15 votes)
A shift in arguments for AI risk 2019-05-28T13:47:36.486Z · score: 27 (11 votes)
Would an option to publish to AF users only be a useful feature? 2019-05-20T11:04:26.150Z · score: 14 (5 votes)
Which scientific discovery was most ahead of its time? 2019-05-16T12:58:14.628Z · score: 39 (10 votes)
When is rationality useful? 2019-04-24T22:40:01.316Z · score: 29 (7 votes)
Book review: The Sleepwalkers by Arthur Koestler 2019-04-23T00:10:00.972Z · score: 75 (22 votes)
Arguments for moral indefinability 2019-02-12T10:40:01.226Z · score: 53 (17 votes)
Coherent behaviour in the real world is an incoherent concept 2019-02-11T17:00:25.665Z · score: 38 (16 votes)
Vote counting bug? 2019-01-22T15:44:48.154Z · score: 7 (2 votes)
Disentangling arguments for the importance of AI safety 2019-01-21T12:41:43.615Z · score: 120 (42 votes)
Comments on CAIS 2019-01-12T15:20:22.133Z · score: 69 (18 votes)
How democracy ends: a review and reevaluation 2018-11-27T10:50:01.130Z · score: 17 (9 votes)
On first looking into Russell's History 2018-11-08T11:20:00.935Z · score: 35 (11 votes)
Speculations on improving debating 2018-11-05T16:10:02.799Z · score: 26 (10 votes)
Implementations of immortality 2018-11-01T14:20:01.494Z · score: 21 (8 votes)
What will the long-term future of employment look like? 2018-10-24T19:58:09.320Z · score: 11 (4 votes)
Book review: 23 things they don't tell you about capitalism 2018-10-18T23:05:29.465Z · score: 19 (11 votes)
Book review: The Complacent Class 2018-10-13T19:20:05.823Z · score: 21 (9 votes)
Some cruxes on impactful alternatives to AI policy work 2018-10-10T13:35:27.497Z · score: 151 (53 votes)
A compendium of conundrums 2018-10-08T14:20:01.178Z · score: 12 (12 votes)
Thinking of the days that are no more 2018-10-06T17:00:01.208Z · score: 13 (6 votes)
The Unreasonable Effectiveness of Deep Learning 2018-09-30T15:48:46.861Z · score: 86 (26 votes)
Deep learning - deeper flaws? 2018-09-24T18:40:00.705Z · score: 42 (17 votes)
Book review: Happiness by Design 2018-09-23T04:30:00.939Z · score: 14 (6 votes)
Book review: Why we sleep 2018-09-19T22:36:19.608Z · score: 52 (25 votes)
Realism about rationality 2018-09-16T10:46:29.239Z · score: 154 (67 votes)
Is epistemic logic useful for agent foundations? 2018-05-08T23:33:44.266Z · score: 19 (6 votes)
What we talk about when we talk about maximising utility 2018-02-24T22:33:28.390Z · score: 27 (8 votes)
In Defence of Conflict Theory 2018-02-17T03:33:01.970Z · score: 25 (10 votes)
Is death bad? 2018-01-13T04:55:25.788Z · score: 8 (4 votes)


Comment by ricraz on Rohin Shah on reasons for AI optimism · 2019-11-01T11:31:04.977Z · score: 15 (10 votes) · LW · GW
Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention.

This sentence makes two claims. Firstly that Rohin reports 90% credence in safe AI by default. Secondly that 90% is unusually large compared with the relevant reference class (which I interpret to be people working full-time on AI safety).

However, as far as I can tell, there's no evidence provided for the second claim. I find this particularly concerning because it's the sort of claim that seems likely to cause (and may already have caused) information cascades, along the lines of "all these high status people think AI x-risk is very likely, so I should too".

It may well be true that Rohin is an outlier in this regard. But it may also be false: a 10% chance of catastrophe is plenty high enough to motivate people to go into the field. Since I don't know of many public statements from safety researchers stating their credence in AI x-risk, I'm curious about whether you have strong private evidence.

Comment by ricraz on In Defence of Conflict Theory · 2019-10-10T10:49:39.389Z · score: 2 (1 votes) · LW · GW
This doesn't make much sense in two of your examples: factory farming and concern for future generations. In those cases it seems that you instead have to convince the "powerful" that they are wrong.

I think it's quite a mistake-theoretic view to think that factory farming persists because powerful people are wrong about it. Instead, the (conflict-theoretic) view which I'd defend here is something like "It doesn't matter what politicians think about the morality of factory farming, very few politicians are moral enough to take the career hit of standing up for what's right when it's unpopular, and many are being bought off by the evil meat/farming lobbies. So we need to muster enough mass popular support that politicians see which way the wind is blowing and switch sides en masse (like they did with gay marriage)."

Then the relevance to "the strug­gle to rally peo­ple with­out power to keep the pow­er­ful in check will be a Red Queen’s race that we sim­ply need to keep run­ning for as long as we want pros­per­ity to last" is simply that there's no long-term way to change politicians from being weak-willed and immoral - you just need to keep fighting through all these individual issues as they come up.

I think besides "power corrupts", my main problem with "conflict theorists" is that optimizing for gaining power often requires [ideology], i.e., implicitly or explicitly ignoring certain facts that are inconvenient for building a social movement or gaining power. And then this [ideology] gets embedded into the power structure as unquestionable "truths" once the social movement actually gains power, and subsequently causes massive policy distortions.

(Warning: super simplified, off the cuff thoughts here, from a perspective I only partially endorse): I guess my inner conflict theorist believes that it's okay for there to be significant distortions in policy as long as there are mechanisms by which new ideologies can arise to address them, and that it's worthwhile to have this in exchange for dynamism and less political stagnation.

Like, you know what was one of the biggest policy distortions of all time? World War 2. And yet it had a revitalising effect on the American economy, decreased inequality, and led to a boom period.

Whereas if you don't have new ideologies rising and gaining power, then you can go around fixing individual problems all day, but the core allocation of power in society will become so entrenched that the policy distortions are disastrous.

(Edited to add: this feels relevant.)

Comment by ricraz on Arguments for moral indefinability · 2019-10-10T10:34:06.420Z · score: 2 (1 votes) · LW · GW

I address (something similar to) Yudkowsky's view in the paragraph starting:

I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection.

Particularism feels relevant and fairly similar to what I'm saying, although maybe with a bit of a different emphasis.

Comment by ricraz on Realism and Rationality · 2019-09-22T07:35:30.331Z · score: 2 (1 votes) · LW · GW
If Alice doesn't mean for her second sentence to be totally redundant -- or if she is able to interpret Bob's response as an intelligible (if incorrect) statement of disagreement with her second sentence -- then that suggests her second sentence actually constitutes a substantively normative claim.

I don't think you can declare a sentence redundant without also considering the pragmatic aspects of meaning. In this example, Alice's second sentence is a stronger claim than the first, because it again contains an implicit clause: "If you want to get protein, and you don't have any other relevant goals, you should eat meat". Or maybe it's more like "If you want to get protein, and your other goals are standard ones, you should eat meat."

Compare: Alice says "Jumping off cliffs without a parachute is a quick way to feel very excited. If you want to feel excited, you should jump off cliffs without a parachute." Bob says "No you shouldn't, because you'll die." Alice's first sentence is true, and her second sentence is false, so they can't be equivalent - but both of them can be interpreted as goal-conditional empirical sentences. It's just the case that when you make broad statements, pragmatically you are assuming a "normal" set of goals.

If she is able to interpret Bob's response as an intelligible (if incorrect) statement of disagreement with her second sentence

It's not entirely unintelligible, because Alice is relying on an implicit premise of "standard goals" I mentioned above, and the reason people like Bob are so outspoken on this issue is because they're trying to change that norm of what we consider "standard goals". I do think that if Alice really understood normativity, she would tell Bob that she was trying to make a different type of claim to his one, because his was normative and hers wasn't - while conceding that he had reason to find the pragmatics of her sentence objectionable.

Also, though, you've picked a case where the disputed statement is often used both in empirical ways and in normative ways. This is the least clear sort of example (especially since, pragmatically, when you repeat almost the same thing twice, it makes people think you're implying something different). The vast majority of examples of people using "if you want..., then you should..." seem clearly empirical to me - including many that are in morally relevant domains, where the pragmatics make their empirical nature clear:

A: "If you want to murder someone without getting caught, you should plan carefully."

B: "No you shouldn't, because you shouldn't murder people."

A: "Well obviously you shouldn't murder people, but I'm just saying that if you wanted to, planning would make things much easier."

Comment by ricraz on Realism and Rationality · 2019-09-21T19:16:57.754Z · score: 2 (1 votes) · LW · GW
1. "Bayesian updating has a certain asymptoptic convergence property, in the limit of infinite experience and infinite compute. So if you want to understand the world, you should be a Bayesian."
If the first and second sentence were meant to communicate the same thing, then the second would be totally vacuous given the first.

I was a little imprecise in saying that they're exactly equivalent - the second sentence should also have a "in the limit of infinite compute" qualification. Or else we need a hidden assumption like "These asymptotic convergence properties give us reason to believe that even low-compute approximations to Bayesianism are very good ways to understand the world." This is usually left implicit, but it allows us to think of "if you want to understand the world, you should be (approximately) a Bayesian" as an empirical claim not a normative one. For this to actually be an example of normativity, it needs to be the case that some people consider this hidden assumption unnecessary and would endorse claims like "You should use low-compute approximations to Bayesianism because Bayesianism has certain asymptotic convergence properties, even if those properties don't give us any reason to think that low-compute approximations to Bayesianism help you understand the world better." Do you expect that people would endorse this?

Comment by ricraz on Realism and Rationality · 2019-09-21T09:25:15.599Z · score: 2 (1 votes) · LW · GW
But I do have the impression that many people would at least endorse this equally normative claim: "If you have the goal of understanding the world, you should be a Bayesian."

Okay, this seems like a crux of our disagreement. This statement seems pretty much equivalent to my statement #1 in almost all practical contexts. Can you point out how you think they differ?

I agree that some statements of that form seem normative: e.g. "You should go to Spain if you want to go to Spain". However, that seems like an exception to me, because it provides no useful information about how to achieve the goal, and so from contextual clues would be interpreted as "I endorse your desire to go to Spain". Consider instead "If you want to murder someone without getting caught, you should plan carefully", which very much lacks endorsement. Or even "If you want to get to the bakery, you should take a left turn here." How do you feel about the normativity of the last statement in particular? How does it practically differ from "The most convenient way to get to the bakery from here is to take a left turn"? Clearly that's something almost everyone is a realist about (assuming a shared understanding of "convenient") at Less Wrong and elsewhere.

In general -- at least in the context of the concepts/definitions in this post -- the inclusion of an "if" clause doesn't prevent a claim from being normative. So, for example, the claim "You should go to Spain if you want to go to Spain" isn't relevantly different from the claim "You should give money to charity if you have enough money to live comfortably."

I think there's a difference between a moral statement with conditions, and a statement about what is best to do given your goals (roughly corresponding to the difference between Kant's categorical and hypothetical imperatives). "You should give money to charity if you have enough money to live comfortably" is an example of the former - it's the latter which I'm saying aren't normative in any useful sense.

Comment by ricraz on Realism and Rationality · 2019-09-18T10:33:13.156Z · score: 2 (1 votes) · LW · GW

The quote from Eliezer is consistent with #1, since it's bad to undermine people's ability to achieve their goals.

More generally, you might believe that it's morally normative to promote true beliefs (e.g. because they lead to better outcomes) but not believe that it's epistemically normative, in a realist sense, to do so (e.g. the question I asked above, about whether you "should" have true beliefs even when there are no morally relevant consequences and it doesn't further your goals).

Comment by ricraz on Realism and Rationality · 2019-09-17T14:20:54.354Z · score: 2 (1 votes) · LW · GW

Upon further thought, maybe just splitting up #1 and #2 is oversimplifying. There's probably a position #1.5, which is more like "Words like "goals" and "beliefs" only make sense to the extent that they're applied to Bayesians with utility functions - every other approach to understanding agenthood is irredeemably flawed." This gets pretty close to normative realism because you're only left with one possible theory, but it's still not making any realist normative claims (even if you think that goals and beliefs are morally relevant, as long as you're also a moral anti-realist). Maybe a relevant analogy: you might believe that using any axioms except the ZFC axioms will make maths totally incoherent, while not actually holding any opinion on whether the ZFC axioms are "true".

Comment by ricraz on Realism and Rationality · 2019-09-17T14:06:06.931Z · score: 2 (1 votes) · LW · GW
In this case, I feel like there aren't actually that many people who identify as normative anti-realists (i.e., deny that any kind of normative facts exist).

What do you mean by a normative fact here? Could you give some examples?

Comment by ricraz on Realism and Rationality · 2019-09-17T13:45:31.307Z · score: 3 (2 votes) · LW · GW
It seems to me, rather, that people often talk about updating your credences in accordance with Bayes’ rule and maximizing the expected fulfillment of your current desires as the correct things to do.

It's important to disentangle two claims:

1. In general, if you have the goal of understanding the world, or any other goal that relies on doing so, being Bayesian will allow you to achieve it to a greater extent than any other approach (in the limit of infinite compute).

2. Regardless of your goals, you should be Bayesian anyway.

Believing #2 commits you to normative realism as I understand the term, but believing #1 doesn't - #1 is simply an empirical claim about what types of cognition tend to do best towards a broad class of goals. I think that many rationalists would defend #1, and few would defend #2 - if you disagree, I'd be interested in seeing examples of the latter. (One test is by asking "Aside from moral considerations, if someone's only goal is to have false beliefs right now, should they believe true things anyway?") Either way, I agree with Wei that distinguishing between moral normativity and epistemic normativity is crucial for fruitful discussions on this topic.

Another way of framing this distinction: assume there's one true theory of physics, call it T. Then someone might make the claim "Modelling the universe using T is the correct way to do so (in the limit of having infinite compute available)." This is analogous to claim #1, and believing this claim does not commit you to normative realism, because it does not imply that anyone should want to model the universe correctly.

It might also be useful to clarify that in ricraz's recent post criticizing "realism about rationality," several of the attitudes listed aren't directly related to "realism" in the sense of this post.

I would characterise "realism about rationality" as approximately equivalent to claim #1 above (plus a few other similar claims). In particular, it is a belief about whether there is a set of simple ideas which elegantly describe the sort of "agents" who do well at their "goals" - not a belief about the normative force of those ideas. Of course, under most reasonable interpretations of #2, the truth of #2 implies #1, but not vice versa.

Comment by ricraz on The Power to Solve Climate Change · 2019-09-12T20:02:43.399Z · score: 9 (3 votes) · LW · GW

This post says interesting and specific things about climate change, and then suddenly gets very dismissive and non-specific when it comes to individual action. And as you predict in your other posts, this leads to mistakes. You say "your causal model of how your actions will affect greenhouse gas concentrations is missing the concept of an economic equilibrium". But the whole problem of climate change is that the harm of carbon emissions affects the equilibrium point of economic activity so little. You even identify the key point ("our economy lets everyone emit carbon for free") without realizing that this implies replacement effects are very weak. Who will fly more if I fly less? In fact, since many industries have economies of scale, me flying less or eating less meat quite plausibly increases prices and decreases the carbon emissions of others.

And yes, there are complications - farm subsidies, discontinuities in response curves, etc. But decreasing personal carbon footprint also has effects on cultural norms which can add up to larger political change. That seems pretty important - even though, in general, it's the type of thing that it's very difficult to be specific about even for historical examples, let alone future ones. Dismissing these sort of effects feels very much like an example of the "valley of bad rationality".

Comment by ricraz on Concrete experiments in inner alignment · 2019-09-10T13:04:44.999Z · score: 3 (2 votes) · LW · GW
to what extent models tend to learn their goals internally vs. via reference to things in their environment

I'm not sure what this distinction is trying to refer to. Goals are both represented internally, and also refer to things in the agent's environments. Is there a tension there?

Comment by ricraz on Utility ≠ Reward · 2019-09-09T02:53:44.193Z · score: 2 (1 votes) · LW · GW

Yes, I'm assuming cumulatively-calculated reward. In general this is a fairly standard assumption (rewards being defined for every timestep is part of the definition of MDPs and POMDPs, and given that I don't see much advantage in delaying computing it until the end of the episode). For agents like AlphaGo observing these rewards obviously won't be very helpful though since those rewards are all 0 until the last timestep. But in general I expect rewards to occur multiple times per episode when training advanced agents, especially as episodes get longer.

Comment by ricraz on Utility ≠ Reward · 2019-09-08T18:16:34.897Z · score: 2 (1 votes) · LW · GW

In the context of reinforcement learning, it's literally just the reward provided by the environment, which is currently fed only to the optimiser, not to the agent. How to make those rewards good ones is a separate question being answered by research directions like reward modelling and IDA.

Comment by ricraz on Utility ≠ Reward · 2019-09-06T17:11:21.168Z · score: 8 (4 votes) · LW · GW
So the reward function can’t be the policy’s objective – one cannot be pursuing something one has no direct access to.

One question I've been wondering about recently is what happens if you actually do give an agent access to its reward during training. (Analogy for humans: a little indicator in the corner of our visual field that lights up whenever we do something that increases the number or fitness of our descendants). Unless the reward is dense and highly shaped, the agent still has to come up with plans to do well on difficult tasks, it can't just delegate those decisions to the reward information. Yet its judgement about which things are promising will presumably be better-tuned because of this extra information (although eventually you'll need to get rid of it in order for the agent to do well unsupervised).

On the other hand, adding reward to the agent's observations also probably makes the agent more likely to tamper with the physical implementation of its reward, since it will be more likely to develop goals aimed at the reward itself, rather than just the things the reward is indicating. (Analogy for humans: because we didn't have a concept of genetic fitness while evolving, it was hard for evolution to make us care about that directly. But if we'd had the indicator light, we might have developed motivations specifically directed towards it, and then later found out that the light was "actually" the output of some physical reward calculation).

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-06T14:11:16.497Z · score: 2 (1 votes) · LW · GW

I don't think I'm claiming that the value prop stories of bad startups will be low-delta overall, just that the delta will be more spread out and less specific. Because the delta of the cryobacterium article, multiplied by a million articles, is quite big, and Golden can say that this is what they'll achieve regardless of how bad they actually are. And more generally, the delta to any given consumer of a product that's better than all its competitors on several of the dimensions I listed above can be pretty big.

Rather, I'm claiming that there are a bunch of startups which will succeed because they do well on the types of things I listed above, and that the Value Prop Story sanity check can't distinguish between startups that will and won't do well on those things in advance. Consider a startup which claims that they will succeed over their competitors because they'll win at advertising. This just isn't the type of thing which we can evaluate well using the Value Prop Story test as you described it:

1. Winning at advertising isn't about providing more value for any given consumer - indeed, to the extent that advertising hijacks our attention, it plausibly provides much less value.

2. The explanation for why that startup thinks they will win on advertising might be arbitrarily non-specific. Maybe the founder has spent decades observing the world and building up strong intuitions about how advertising works, which it would take hours to explain. Maybe the advertising team is a strongly-bonded cohesive unit which the founder trusts deeply.

3. Startups which are going to win at advertising (or other aspects of high-quality non-customer-facing execution) might not even know anything about how well their competitors are doing on those tasks. E.g. I expect someone who's generically incredibly competent to beat their competitors in a bunch of ways even if they have no idea how good their competitors are. The value prop sanity check would reject this person. And if, like I argued above, being "generically incredibly competent" is one of the most important contributors to startup success, then rejecting this type of person makes the sanity check have a lot of false negatives, and therefore much less useful.

Comment by ricraz on Seven habits towards highly effective minds · 2019-09-06T13:39:32.183Z · score: 2 (1 votes) · LW · GW

Hmm, could you say more? I tend to think of social influences as good for propagating ideas - as opposed to generating new ones, which seems to depend more on the creativity of individuals or small groups.

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-05T19:15:50.387Z · score: 2 (1 votes) · LW · GW

I guess I want there to be a minimum lower standard for a Value Prop Story. If you are allowed to say things like "our product will look better and it will be cooler and customers will like our support experience more", then every startup ever has a value prop story. If we're allowing value prop stories of that low quality, then Golden's story could be "our articles will be better than Wikipedia's". Whereas when Liron said that 80% of startups don't have a value prop story, they seemed to be talking about a higher bar than that.

Comment by ricraz on The Power to Judge Startup Ideas · 2019-09-05T14:14:43.876Z · score: 8 (4 votes) · LW · GW

Intuitively I like this criterion, but it conflicts with another belief I have about startups, which is that the quality of execution is absolutely crucial. And high-quality execution is the sort of thing it's hard to tell a Value Prop Story about, because it looks like "a breadboard full of little bumps of value" rather than "a single sharp spike of value".

To be more specific, if a startup A has already created a MVP, and someone else wants to found startup B that does exactly the same thing because their team is better at:

  • UX design
  • Hiring
  • Coding
  • Minimising downtime
  • Advertising and publicity
  • Sales and partnerships
  • Fundraising
  • Budgeting
  • Customer support
  • Expanding internationally
  • Being cool

then I expect B to beat A despite not having a convincing Value Prop Story that can be explained in advance (or even in hindsight). And it seems like rather than being a rare exception, it's quite common for multiple startups to be competing in the same space and cloning each other's features, with success going to whoever executes best (more concretely: the many bike-sharing companies; food delivery companies; a bunch of banking startups in the UK; maybe FB vs MySpace?). In those cases, the lack of a Value Prop Story is a false negative and will lead you to underestimate the success of whichever company ends up winning.

Comment by ricraz on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-18T14:40:49.621Z · score: 7 (4 votes) · LW · GW

On 1: I think there's a huge amount for philosophers to do. I think of Dennett as laying some of the groundwork which will make the rest of that work easier (such as identifying that the key question is when it's useful to use an intentional stance, rather than trying to figure out which things are objectively "agents") but the key details are still very vague. Maybe the crux of our disagreement here is how well-specified "treating something as if it's a rational agent" actually is. I think that definitions in terms of utility functions just aren't very helpful, and so we need more conceptual analysis of what phrases like this actually mean, which philosophers are best-suited to provide.

On 2: you're right, as written it does subsume parts of your list. I guess when I wrote that I was thinking that most of the value would come from clarification of the most well-known arguments (i.e. the ones laid out in Superintelligence and What Failure Looks Like). I endorse philosophers pursuing all the items on your list, but from my perspective the disjoint items on my list are much higher priorities.

Comment by ricraz on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-17T21:45:46.559Z · score: 13 (14 votes) · LW · GW

Interestingly, I agree with you that philosophers could make important contributions to AI safety, but my list of things that I'd want them to investigate is almost entirely disjoint from yours.

The most important points on my list:

1. Investigating how to think about agency and goal-directed behaviour, along the lines of Dennett’s work on the intentional stance. How do they relate to intelligence and the ability to generalise across widely different domains? These are crucial concepts which are still very vague.

2. Lay out the arguments for AGI being dangerous as rigorously and comprehensively as possible, noting the assumptions which are being made and how plausible they are.

3. Evaluating the assumptions about the decomposability of cognitive work which underlie debate and IDA (in particular: the universality of humans, and the role of language).

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:59:19.790Z · score: 3 (2 votes) · LW · GW
But human nervous systems do have much higher bandwidth communication channels. We share them with the other mammals. It's the limbic system.

I'm quite uncertain about how high-bandwidth this actually is. I agree that in the first second of meeting someone, it's much more informative than language could be. Once the initial "first impression" has occurred, though, the rate of communication drops off sharply, and I think that language could overtake it after a few minutes. For example, it takes half a second to say "I'm nervous", and you can keep saying similarly-informative things for a long time: do you think you could get a new piece of similar information every half second for ten minutes via the limbic system?

(Note that I'm not necessarily saying people do communicate information about their emotions, personality and social status faster via language, just that they could).

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:52:45.345Z · score: 2 (1 votes) · LW · GW
The success of human society is a good demonstration of how very low complexity systems and behaviours can drive your competition extinct, magnify available resources, and more

On what basis are you calling human societies "very low complexity systems"? The individual units are humans, whose brains are immensely complex; and then the interactions between humans are often complicated enough that nobody has a good understanding of the system as a whole.

Ultimately there seems to be no impetus for a half-baked neuron tentacle, and a lot of cost and risk, so that will probably never be the path to such organisms.

This seems somewhat intuitive, but note that this statement is a universal negative: it's saying there is no plausible path to this outcome. In general I think we should be pretty cautious about such statements, since a lot of evolutionary innovations would have seemed deeply implausible before they happened. For example, the elephant's trunk is a massive neuron-rich tentacle which is heavily used for communication.

There are lots of simple things that organisms could do to make them wildly more successful.

I guess this is the crux of our disagreement - could you provide some examples?

Comment by ricraz on Why do humans not have built-in neural i/o channels? · 2019-08-09T15:43:51.177Z · score: 6 (3 votes) · LW · GW
Anyone who's done any infosec or network protocol work will laugh at the idea that trial and error (evolution) can make a safe high-bandwidth connection.

There are hundreds of things that people have laughed at the idea of trial and error (evolution) doing, which evolution in fact did. Thinking that evolution is dumb is generally not a good heuristic.

Also, I'm not sure what counts as safe in this context. Is language safe? Is sight?

Comment by ricraz on Mistake Versus Conflict Theory of Against Billionaire Philanthropy · 2019-08-01T18:33:02.956Z · score: 12 (9 votes) · LW · GW

Downvoted for being very hyperbolic ("less than zero", "his opponents here are all pure conflict theorists"), uncharitable/making personal attacks ("There are people opposed to nerds or to thinking", "There are people who are opposed to action of any kind"), and not substantiating these extreme views ("Because, come on. Read their quotes. Consider their arguments.")

As a more object-level comment, suppose I accept the hypothetical that all attacks on billionaire philanthropy are entirely aimed at reducing the power of billionaires. Yet if we build a societal consensus that this particular attack is very misguided and causes undesirable collateral damage, then even people who are just as anti-billionaire as before will be less likely to use it. E.g. it's much harder to criticise billionaires who only spend their money on helping kids with cancer.

Comment by ricraz on How can guesstimates work? · 2019-07-11T23:48:54.035Z · score: 16 (6 votes) · LW · GW

Very interesting question - the sort that makes me swing between thinking it's brilliant and thinking it's nonsense. I do think you overstate your premise. In almost all of the examples given in The Secret of our Success, the relevant knowledge is either non-arbitrary (e.g. the whole passage about hunting seals makes sense, it's just difficult to acquire all that knowledge), or there's a low cost to failure (try a different wood for your arrows; if they don't fly well, go back to basics).

If I engage with the question as posed, though, my primary answer is simply that over time we became wealthy and technologically capable enough that we were able to replace all the natural things that might kill us with whatever we're confident won't kill us. Which is why you can improvise while cooking - all of the ingredients have been screened very hard for safety. This is closely related to your first hypothesis.

However, this still leaves open a slightly different question. The modern world is far too complicated for anyone to understand, and so we might wonder why incomprehensible emergent effects don't render our daily lives haphazard and illegible. One partial answer is that even large-scale components of the world (like countries and companies) were designed by humans. A second partial answer, though, is that even incomprehensible patterns and mechanisms in the modern world still interact with you via other people.

This has a couple of effects. Firstly, other people try to be legible, it's just part of human interaction. (If the manioc could bargain with you, it'd be much easier to figure out how to process it properly.)

Secondly, there's an illusion of transparency because we're so good at and so used to understanding other people. Social interactions are objectively very complicated: in fact, they're "cultural norms and processes which appear arbitrary, yet could have fatal consequences if departed from". Yet it doesn't feel like the reason I refrain from spitting on strangers is arbitrary (even though I couldn't explain the causal pathway by which people started considering it rude). Note also that the space of ideas that startups explore is heavily constrained by social norms and laws.

Thirdly, facts about other humans serve as semantic stop signs. Suppose your boss fires you, because you don't get along. There's a nearly unlimited amount of complexity which shaped your personality, and your boss' personality, and the fact that you ended up in your respective positions. But once you've factored it out into "I'm this sort of person, they're that sort of person", it feels pretty legible - much more than "some foods are eat-raw sorts of foods, other foods are eat-cooked sorts of foods". (Or at least, it feels much more legible to us today - maybe people used to find the latter explanation just as compelling). A related stop sign is the idea that "somebody knows" why each step of a complex causal chain happened, which nudges us away from thinking of the chain as a whole as illegible.

So I've given two reasons for increased legibility (humans building things, and humans explaining things), and two for the illusion of legibility (illusion of transparency, and semantic stop signs). I think on small scales, the former effects predominate. But on large scales, the latter predominate - the world seems more legible than it actually is. For example:

The world seems legible -- I can roughly predict how many planes fly every day by multiplying a handful rough numbers.

Roughly predicting the number of planes which fly every day is a very low bar! You can also predict the number of trees in a forest by multiplying a handful of numbers. This doesn't help you survive in that forest. What helps you survive in the forest is being able to predict the timing of storms and the local tiger population. In the modern world, what helps you thrive is being able to predict the timing of recessions and crime rate trends. I don't think we're any better at the latter two than our ancestors were at the former. In fact, the large-scale arcs of our lives are now governed to a much greater extent by very unpredictable and difficult-to-understand events, such as scientific discoveries, technological innovation and international relations.

In summary, technology has helped us replace individual objects in our environments with safer and more legible alternatives, and the emergent complexity which persists in our modern environments is now either mediated by people, or still very tricky to predict (or both).

Comment by ricraz on The AI Timelines Scam · 2019-07-11T19:30:17.407Z · score: 22 (11 votes) · LW · GW
But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.

I agree that there are clear costs to making extra arguments of the form "timelines are short because technology X and technology Y will fit together neatly". However, you could still make public that your timelines are a given probability distribution D, and the reasons which led you to that conclusion are Z% object-level views which you won't share, and (100-Z)% base rate reasoning and other outside-view considerations, which you will share.

I think there are very few costs to declaring which types of reasoning you're most persuaded by. There are some costs to actually making the outside-view reasoning publicly available - maybe people who read it will better understand the AI landscape and use that information to do capabilities research.

But having a lack of high-quality public timelines discussion also imposes serious costs, for a few reasons:

1. It means that safety researchers are more likely to be wrong, and therefore end up doing less relevant research. I am generally pretty skeptical of reasoning that hasn't been written down and undergone public scrutiny.

2. It means there's a lot of wasted motion across the safety community, as everyone tries to rederive the various arguments involved, and figure out why other people have the views they do, and who they should trust.

3. It makes building common knowledge (and the coordination which that knowledge can be used for) much harder.

4. It harms the credibility of the field of safety from the perspective of outside observers, including other AI researchers.

Also, the more of a risk you think 1 is, the lower the costs of disclosure are, because it becomes more likely that any information gleaned from the disclosure is wrong anyway. Yet predicting the future is incredibly hard! So the base rate for correctness here is low. And I don't think that safety researchers have a compelling advantage when it comes to correctly modelling how AI will reach human level (compared with thoughtful ML researchers).

Consider, by analogy, a debate two decades ago about whether to make public the ideas of recursive self-improvement and fast takeoff. The potential cost of that is very similar to the costs of disclosure now - giving capabilities researchers these ideas might push them towards building self-improving AIs faster. And yet I think making those arguments public was clearly the right decision. Do you agree that our current situation is fairly analogous?

EDIT: Also, I'm a little confused by

Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say.

I understand that there are good reasons for discussions to be private, but can you elaborate on why we'd want discussions about privacy to be private?

Comment by ricraz on Embedded Agency: Not Just an AI Problem · 2019-06-27T14:53:43.738Z · score: 13 (4 votes) · LW · GW
We have strong outside-view reasons to expect that the information processing in question probably approximates Bayesian reasoning (for some model of the environment), and the decision-making process approximately maximizes some expected utility function (which itself approximates fitness within the ancestral environment).

The use of "approximates" in this sentence (and in the post as a whole) is so loose as to be deeply misleading - for the same reasons that the "blue-minimising robot" shouldn't be described as maximising some expected utility function, and the information processing done by a single neuron shouldn't be described as Bayesian reasoning (even approximately!)

See also: coherent behaviour in the real world is an incoherent concept.

Comment by ricraz on Let's talk about "Convergent Rationality" · 2019-06-26T20:40:33.678Z · score: 4 (2 votes) · LW · GW
There is at least one example (I've struggled to dig up) of a memory-less RL agent learning to encode memory information in the state of the world.

I recall an example of a Mujoco agent whose memory was periodically wiped storing information in the position of its arms. I'm also having trouble digging it up though.

Comment by ricraz on Risks from Learned Optimization: Introduction · 2019-06-07T14:49:02.481Z · score: 12 (4 votes) · LW · GW
We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.

I appreciate the difficulty of actually defining optimizers, and so don't want to quibble with this definition, but am interested in whether you think humans are a central example of optimizers under this definition, and if so whether you think that most mesa-optimizers will "explicitly represent" their objective functions to a similar degree that humans do.

Comment by ricraz on On alien science · 2019-06-03T01:19:05.488Z · score: 3 (2 votes) · LW · GW

Agreed that this points in the right direction. I think there's more to it than that though. Consider for example a three-body problem under Newtonian mechanics. Then there's a sense in which specifying the initial masses and velocities of the bodies, along with Newton's laws of motion, is the best way to compress the information about these chaotic trajectories.

But there's still an open question here, which is why are three-body systems chaotic? Two-body systems aren't. What makes the difference? Finding an explanation probably doesn't allow you to compress any data any more, but it still seems important and interesting.

(This seems related to a potential modification of your data compression standard: that good explanations compress data in a way that minimises not just storage space, but also the computation required to unpack the data. I'm a little confused about this though.)

Comment by ricraz on Book review: The Sleepwalkers by Arthur Koestler · 2019-05-28T16:32:26.852Z · score: 4 (2 votes) · LW · GW

Thanks for the kind words. I agree that refactoring would be useful, but don't have the time now. I have added some headings though.

Comment by ricraz on "Other people are wrong" vs "I am right" · 2019-05-24T14:42:11.650Z · score: 4 (2 votes) · LW · GW

A relevant book recommendation: The Enigma of Reason argues that thinking of high-level human reasoning as a tool for attacking other people's beliefs and defending our own (regardless of their actual veracity) helps explain a lot of weird asymmetries in cognitive biases we're susceptible to, including this one.

Comment by ricraz on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-05-22T15:25:37.118Z · score: 7 (4 votes) · LW · GW

I'd like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that's difficult to formalise (e.g. somewhere within a neural network).

It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection - hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you're a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.

Comment by ricraz on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-05-22T15:19:14.045Z · score: 6 (3 votes) · LW · GW

This paper by Critch is relevant: it argues that agents with different beliefs will bet their future share of a merged utility function, such that it skews towards whoever's predictions are more correct.

Comment by ricraz on What are the open problems in Human Rationality? · 2019-05-21T23:54:16.026Z · score: 10 (3 votes) · LW · GW

Which policies in particular?

Comment by ricraz on What are the open problems in Human Rationality? · 2019-05-21T11:27:43.800Z · score: 2 (1 votes) · LW · GW

This point seems absolutely crucial; and I really appreciate the cited evidence.

Comment by ricraz on Which scientific discovery was most ahead of its time? · 2019-05-17T10:56:46.413Z · score: 12 (4 votes) · LW · GW

Actually, general relativity seems to have been discovered by Hilbert at almost exactly the same time that Einstein did.

Comment by ricraz on Which scientific discovery was most ahead of its time? · 2019-05-16T15:12:27.544Z · score: 2 (1 votes) · LW · GW

Biggest jump forward.

Comment by ricraz on The Vulnerable World Hypothesis (by Bostrom) · 2019-05-16T15:10:58.071Z · score: 6 (3 votes) · LW · GW

Does anyone know how this paper relates to Paul Christiano's blog post titled Handling destructive technology, which seems to preempt some of the key ideas? It's not directly acknowledged in the paper.

Comment by ricraz on Eight Books To Read · 2019-05-15T13:31:11.454Z · score: 8 (4 votes) · LW · GW

Interesting list. How would you compare reading the best modern summaries and analyses of the older texts, versus reading them in the original?

Quigley’s career demonstrates an excellent piece of sociological methodology... He builds a theory that emphasizes the importance of elites, and subsequently goes and talks to members of the elite to test and apply the theory.

I'm not sure if this is meant to be ironic, but that methodology seems like an excellent way to introduce confirmation bias. I guess it's excellent compared to not going and talking to anyone at all?

Comment by ricraz on When is rationality useful? · 2019-05-01T01:31:40.027Z · score: 2 (1 votes) · LW · GW

Depends what type of research. If you're doing experimental cell biology, it's less likely that your research will be ruined by abstract philosophical assumptions which can't be overcome by looking at the data.

Comment by ricraz on When is rationality useful? · 2019-05-01T01:27:55.792Z · score: 2 (1 votes) · LW · GW
So when is rationality relevant? Always! It's literally the science of how to make your life better / achieving your values.

Sometimes science isn't helpful or useful. The science of how music works may be totally irrelevant to actual musicians.

If you think of instrumental rationality of the science of how to win, then necessarily it entails considering things like how to setup your environment, unthinking habits, how to "hack" into your psyche/emotions.

It's an empirical question when and whether these things are very useful; my post gives cases in which they are, and in which they aren't.

Comment by ricraz on When is rationality useful? · 2019-04-26T21:46:24.590Z · score: 3 (2 votes) · LW · GW
Some effort spent in determining which things are good, and in which things lead to more opportunity for good is going to be rewarded (statistically) with better outcomes.

All else equal, do you think a rationalist mathematician will become more successful in their field than a non-rationalist mathematician? My guess is that if they spent the (fairly significant) time taken to learn and do rationalist things on just learning more maths, they'd do better.

(Here I'm ignoring the possibility that learning rationality makes them decide to leave the field).

I'll also wave at your wave at the recursion problem: "when is rationality useful" is a fundamentally rationalist question both in the sense of being philosophical, and in the sense that answering it is probably not very useful for actually improving your work in most fields.

Comment by ricraz on When is rationality useful? · 2019-04-26T21:42:47.987Z · score: 3 (2 votes) · LW · GW

When I talk about doing useful work, I mean something much more substantial than what you outline above. Obviously 15 minutes every day thinking about your problems is helpful, but the people at the leading edges of most fields spend all day thinking about their problems.

Perhaps doing this ritual makes you think about the problem in a more meta way. If so, there's an empirical question about how much being meta can spark clever solutions. Here I have an intuition that it can, but when I look at any particular subfield that intuition becomes much weaker. How much could a leading mathematician gain by being more meta, for example?

Comment by ricraz on When is rationality useful? · 2019-04-25T05:40:46.530Z · score: 7 (4 votes) · LW · GW

I agree with this. I think the EA example I mentioned fits this pattern fairly well - the more rational you are, the more likely you are to consider what careers and cause areas actually lead to the outcomes you care about, and go into one of those. But then you need the different skill of actually being good at it.

Comment by ricraz on When is rationality useful? · 2019-04-25T05:39:54.137Z · score: 4 (2 votes) · LW · GW

This seems to be roughly orthogonal to what I'm claiming? Whether you get the benefits from rationality quickly or slowly is distinct from what those benefits actually are.

Comment by ricraz on Book review: The Sleepwalkers by Arthur Koestler · 2019-04-23T02:56:14.745Z · score: 8 (4 votes) · LW · GW

Hmm, interesting. It doesn't discuss the Galileo affair, which seems like the most important case where the distinction is relevant. Nevertheless, in light of this, "geocentric models with epicycles had always been in the former category" is too strong and I'll amend it accordingly.

Comment by ricraz on What failure looks like · 2019-04-18T02:40:29.012Z · score: 2 (1 votes) · LW · GW

Mostly I am questioning whether things will turn out badly this way.

Do you not expect this threshold to be crossed sooner or later, assuming AI alignment remains unsolved?

Probably, but I'm pretty uncertain about this. It depends on a lot of messy details about reality, things like: how offense-defence balance scales; what proportion of powerful systems are mostly aligned; whether influence-seeking systems are risk-neutral; what self-governance structures they'll set up; the extent to which their preferences are compatible with ours; how human-comprehensible the most important upcoming scientific advances are.

Comment by ricraz on What failure looks like · 2019-04-17T16:53:53.343Z · score: 2 (1 votes) · LW · GW
I think the idea is that once influence-seeking systems gain a certain amount of influence, it may become faster or more certain for them to gain more influence by causing a catastrophe than to continue to work within existing rules and institutions.

The key issue here is whether there will be coordination between a set of influence-seeking systems that can cause (and will benefit from) a catastrophe, even when other systems are opposing them. If we picture systems as having power comparable to what companies have now, that seems difficult. If we picture them as having power comparable to what countries have now, that seems fairly easy.