# A model I use when making plans to reduce AI x-risk

post by Benito · 2018-01-19T00:21:45.460Z · score: 136 (48 votes) · LW · GW · 40 comments

## Contents

  A model of AI x-risk in four parts
1. Alignment is hard.
2. Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.
3. Our current epistemic state regarding AGI timelines will continue until we're close (<2 years from) to having AGI.
4. Given timeline uncertainty, it's best to spend marginal effort on plans that assume / work in shorter timelines.
Concrete implications
None


I've been thinking about what implicit model of the world I use to make plans that reduce x-risk from AI. I list four main gears below (with quotes to illustrate), and then discuss concrete heuristics I take from it.

## A model of AI x-risk in four parts

1. Alignment is hard.

Quoting "Security Mindset and the Logistic Success Curve" (link)

Coral: YES. Given that this is a novel project entering new territory, expect it to take at least two years more time, or 50% more development time—whichever is less—compared to a security-incautious project that otherwise has identical tools, insights, people, and resources. And that is a very, very optimistic lower bound.
Amber: This story seems to be heading in a worrying direction.
Coral: Well, I'm sorry, but creating robust systems takes longer than creating non-robust systems even in cases where it would be really, extraordinarily bad if creating robust systems took longer than creating non-robust systems.

2. Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.

Quoting "The Hidden Complexity of Wishes" (link)

There are three kinds of genies: Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.
[...]
There is no safe wish smaller than an entire human morality. There are too many possible paths through Time. You can't visualize all the roads that lead to the destination you give the genie... any more than you can program a chess-playing machine by hardcoding a move for every possible board position.
And real life is far more complicated than chess. You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes. Especially if you wish for something longer-term or wider-range than rescuing your mother from a burning building.

3. Our current epistemic state regarding AGI timelines will continue until we're close (<2 years from) to having AGI.

Quoting "There is No Fire Alarm for AGI" (link)

It's not that whenever somebody says "fifty years" the thing always happens in two years. It's that this confident prediction of things being far away corresponds to an epistemic state about the technology that feels the same way internally until you are very very close to the big development. It's the epistemic state of "Well, I don't see how to do the thing" and sometimes you say that fifty years off from the big development, and sometimes you say it two years away, and sometimes you say it while the Wright Flyer is flying somewhere out of your sight.
[...]
So far as I can presently estimate, now that we've had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we are probably going to occupy our present epistemic state until very near the end.
By saying we're probably going to be in roughly this epistemic state until almost the end, I don't mean to say we know that AGI is imminent, or that there won't be important new breakthroughs in AI in the intervening time. I mean that it's hard to guess how many further insights are needed for AGI, or how long it will take to reach those insights. After the next breakthrough, we still won't know how many more breakthroughs are needed, leaving us in pretty much the same epistemic state as before. Whatever discoveries and milestones come next, it will probably continue to be hard to guess how many further insights are needed, and timelines will continue to be similarly murky.

4. Given timeline uncertainty, it's best to spend marginal effort on plans that assume / work in shorter timelines.

Stated simply: If you don't know when AGI is coming, you should make sure alignment gets solved in worlds where AGI comes soon.

Quoting "Allocating Risk-Mitigation Across Time" (link)

Suppose we are also unsure about when we may need the problem solved by. In scenarios where the solution is needed earlier, there is less time for us to collectively work on a solution, so there is less work on the problem than in scenarios where the solution is needed later. Given the diminishing returns on work, that means that a marginal unit of work has a bigger expected value in the case where the solution is needed earlier. This should update us towards working to address the early scenarios more than would be justified by looking purely at their impact and likelihood.
[...]
There are two major factors which seem to push towards preferring more work which focuses on scenarios where AI comes soon. The first is nearsightedness: we simply have a better idea of what will be useful in these scenarios. The second is diminishing marginal returns: the expected effect of an extra year of work on a problem tends to decline when it is being added to a larger total. And because there is a much larger time horizon in which to solve it (and in a wealthier world), the problem of AI safety when AI comes later may receive many times as much work as the problem of AI safety for AI that comes soon. On the other hand one more factor preferring work on scenarios where AI comes later is the ability to pursue more leveraged strategies which eschew object-level work today in favour of generating (hopefully) more object-level work later.

The above is a slightly misrepresentative quote; the paper is largely undecided as to whether shorter term strategies or longer term strategies are more valuable (given uncertainty over timelines), and recommends a portfolio approach (running multiple strategies, that each apply to different timelines). Nonetheless when reading it I did update toward short-term strategies as being especially neglected, both by myself and the x-risk community at large.

## Concrete implications

Informed by the model above, here are heuristics I use for making plans.

• Solve alignment! Aaargh! Solve it! Solve it now!
• I nearly forgot to say it explicitly, but it's the most important: if you have a clear avenue to do good work on alignment, or field-building in alignment, do it.
• Find ways to contribute to intellectual progress on alignment
• I think that intellectual progress is very tractable.
• A central example of a small project I'd love to see more people attempt, is people writing up (in their own words) analyses and summaries of core disagreements in alignment research.
• A broader category of things that can be done to push discourse forward can be found in this talk Oliver and I have given in the past, about how to write good comments on LessWrong.
• It seems to me that people I talk to think earning-to-give is easy and doable, but pushing forward intellectual progress (especially on alignment) is impossible, or at least only 'geniuses' can do it. I disagree; there is a lot of low hanging fruit.
• Build infrastructure for the alignment research community
• The Berkeley Existential Risk Initiative (BERI) is a great example of this - many orgs (FHI, CHAI, etc) have ridiculous university constraints upon their actions, and so one of BERI's goals is to help them outsource this (to BERI) and remove the bureaucratic mess. This is ridiculously helpful. (FYI they're hiring.)
• I personally have been chatting recently with various alignment researchers about what online infrastructure could be helpful, and have found surprisingly good opportunities to improve things (will write up more on this in a future post).
• What other infrastructure could you build for better communication between key researchers?
• Avoid/reduce direct government involvement (in the long run)
• It's important that those running AGI projects are capable of understanding the alignment problem and why it's necessary to solve alignment before implementing an AGI. There's a better chance of this when the person running the project has a strong technical understanding of how AI works.
• A government-run AI project is analogous to a tech company with non-technical founders. Sure, the founders can employ a CTO, but then you have Paul Graham's design problem - how are they supposed to figure out who a good CTO is? They don't know what to test for. They will likely just pick whoever comes with the strongest recommendation, and given their info channels that will probably just be whoever has the most status.
• Focus on technical solutions to x-risk rather than political or societal
• I have an impression that humanity has a better track record of finding technical than political/social solutions to problems, and this means we should focus even more on things like alignment.
• As one datapoint, fields like computer science, engineering and mathematics seem to make a lot more progress than ones like macroeconomics, political theory, and international relations. If you can frame something as either a math problem or a political problem, do the former.
• I don't have something strong to back this up with, so will do some research/reading.
• Avoid things that (because they're social) are fun to argue about
• For example, ethics is a very sexy subject that can easily attract public outrage and attention while not in fact being useful (cf. bioethics). If we expect alignment to not be solved, the question of "whose values do we get to put into the AI?" is an enticing distraction.
• Another candidate for a sexy subject that is basically a distraction, is discussion of the high status people in AI e.g. "Did you hear what Elon Musk said to Demis Hassabis?" Too many of my late-night conversations fall into patterns like this, and I actively push back against it (both in myself and others).
• This recommendation is a negative one ("Don't do this"). If you have any ideas for positive things to do instead, please write them down. What norms/TAPs push away from social distractions?

I wrote this post to make explicit some of the thinking that goes into my plans. While the heuristics are informed by the model, they likely hide other assumptions that I didn’t notice.

To folks who have tended to agree with my object level suggestions, I expect you to have a sense of having read obvious things, stated explicitly. To everyone else, I’d love to read about the core models that inform your views on AI, and I’d encourage you to read more on those of mine that are new to you.

My thanks and appreciation to Jacob Lagerros for help editing.

[Edit: On 01/26/18, I made slight edits to this post body and title. It used to say there were four models in part I, and instead now says that part I lists four parts of a single model. Some of the comments were a response to the original, and thus may read a little funny.]

comment by Kaj_Sotala · 2018-01-23T10:07:10.276Z · score: 42 (13 votes) · LW · GW
Focus on technical solutions to x-risk rather than political or societal

One consideration that points against this is that focusing on technical solutions will make you only think about technical problems, but if you don't also look at the societal problems, you might not realize that your proposed technical solution is unworkable due to a societal problem.

One good example is Oracle AI. People have debated the question of whether we could use a pure question-answering or "tool" AI as a way to create safe agent AI. There has been a bunch of discussion about the technical challenge of creating it, where the objections have typically focused on something like "you can't box in a superintelligent AI that wants to escape", and then sought to define ways to make the AI want to stay in the box.

But this neglects the fact that even if you manage to build an AI that wants to stay in the box, this is useless if there are others who have reasons to let their AI out of the box. (My paper "Disjunctive Scenarios of Catastrophic AI Risk" goes into detail about the various reasons that would cause people to let their AI out, in section 5.2.) Solving the technical problem of keeping the AI contained does nothing for the societal problem of making people want to keep their AIs contained.

Similarly, Seth Baum has pointed out that the challenge of creating beneficial AI is a social challenge because it seeks to motivate AI developers to choose beneficial AI designs. This is the general form of the specific example I gave above: it's not enough to create an aligned technical design, one also needs to get people to implement your aligned designs.

Of course, you can try to just be the first one to build an aligned superintelligence that takes over the world... but that's super-risky for obvious reasons, such as the fact that it involves a race to be the first one to build the superintelligence, meaning that you don't have the time to make the superintelligence safely aligned. To avoid that, you'll want to try to avoid arms races... which is again a societal problem.

In order to have a good understanding of what would work for solving the AI problem, you need to have an understanding of the whole problem, and the societal dimension represents a big part of the problem. I'm not saying that you couldn't still focus primarily on the technical aspects - after all, a single person can only do as much and we all need to specialize - but you should keep in mind what kinds of technical solutions look feasible given the societal landscape, and properly understanding the nature of the societal landscape requires spending some effort on also thinking about the societal problems and their possible solutions.

comment by orthonormal · 2018-03-31T16:32:04.175Z · score: 2 (2 votes) · LW · GW

I'm pretty sure that, without exception, anyone who's made a useful contribution on Oracle AI recognizes that "let several organizations have an Oracle AI for a significant amount of time" is a world-ending failure, and that their work is instead progress on questions like "if you can have the only Oracle AI for six months, can you save the world rather than end it?"

Correct me if I'm wrong.

comment by Kaj_Sotala · 2018-03-31T17:57:29.712Z · score: 6 (1 votes) · LW · GW

If so, that doesn't seem to be reflected in their papers: none of e.g. Chalmers 2010, Yampolskiy 2012, Armstrong, Sandberg & Bostrom 2012 or Babcock, Kramar & Yampolskiy 2016 mention that as far as I could find, instead only discussing the feasibility of containment. This leaves the impression that successful containment would be sufficient for a safe outcome. E.g. the conclusion section of the Armstrong et al., despite being generally pessimistic and summarizing lots of problems they identified, still seems to suggest that if only the technical problems on Oracle AI could be overcome, then we might be safe:

Analysing the different putative solutions to the OAI-control problem has been a generally discouraging exercise. The physical methods of control, which should be implemented in all cases, are not enough to ensure safe OAI. The other methods of control have been variously insufficient, problematic, or even dangerous.
But these methods are still in their infancy. Control methods used in the real world have been the subject of extensive theoretical analysis or long practical refinement. The lack of intensive study in AI safety leaves methods in this field very underdeveloped. But this is an opportunity: much progress can be expected at relatively little effort. For instance, there is no reason that a few good ideas would not be enough to put the concepts of space and time restrictions on a sufficiently firm basis for rigorous coding.
But the conclusion is not simply that more study is needed. This paper has made some progress in analysing the contours of the problem, and identifying those areas most amenable to useful study, what is important and what is dispensable, and some of the dangers and pitfalls to avoid. The danger of naively relying on confining the OAI to a virtual sub-world should be clear, while sensible boxing methods should be universally applicable. Motivational control appears potentially promising, but it requires more understanding of AI motivation systems before it can be used.
Even the negative results are of use, insofar as they inoculate us against false confidence: the problem of AI control is genuinely hard, and it is important to recognise this. A list of approaches to avoid is valuable as it can help narrow the search.
On the other hand, there are reasons to believe the oracle AI approach is safer than the general AI approach. The accuracy and containment problems are strictly simpler than the general AI safety problem, and many more tools are available to us: physical and epistemic capability control mainly rely on having the AI boxed, while many motivational control methods are enhanced by this fact. Hence there are grounds to direct high-intelligence AI research to explore the oracle AI model.
The creation of super-human artificial intelligence may turn out to be potentially survivable.

Also, in just about every informal discussion about AI safety that I recall seeing, when someone unfamiliar with existing work in the field suggests something like AI boxing, the standard response has always been "you can't box an AI that's smarter than you" (sometimes citing Eliezer's AI box experiments) - which then frequently leads to digressions about whether intelligence is magic, on how trustworthy the evidence from the AI box experiments is, etc.

comment by orthonormal · 2018-03-31T19:51:25.352Z · score: 10 (2 votes) · LW · GW

To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don't think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn't a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I'd be interested in seeing it.

comment by Kaj_Sotala · 2018-03-31T20:27:34.227Z · score: 10 (2 votes) · LW · GW

My only evidence for this being a neglected consideration was what I wrote above: that the only place where I recall having seen this discussed in any detail is in my own papers. (I do believe that Eliezer has briefly mentioned something similar too, but even he has mostly just used the "well you can't contain a superintelligence" line in response to Oracle AI arguments in general.)

You're certainly in a position to know the actual thoughts of researchers working on this better than I do, and the thing about confinement being insufficient on its own is rather obvious if you think about it at all. So if you say that "everyone worth mentioning already thinks this", then that sounds plausible to me and I don't see a point in trying to go look for counterexamples. But in that case I feel even more frustrated that the "obvious" thing hasn't really filtered into public discussion, and that e.g. popular takes on the subject still seem to treat the "can't box a superintelligence" thing as the main argument against OAI, when you could instead give arguments that were much more compelling.

comment by orthonormal · 2018-03-31T22:47:45.460Z · score: 10 (2 votes) · LW · GW

That's a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don't want "we don't see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage" to filter into public discourse: it pattern-matches too well to "trust us, you need to let us run the universe".

comment by gsastry · 2018-01-19T17:43:40.871Z · score: 27 (11 votes) · LW · GW

I don't understand the point about avoiding government involvement in the long run. It seems like your argument is that government projects are incompetent at managing tech projects (maybe because of structural reasons). This seems like a very strong claim to me, and seems only accurate when there's bad incentive compatibility. For example, are you excluding things like the Manhattan Project?

comment by orthonormal · 2018-03-31T16:35:35.079Z · score: 12 (3 votes) · LW · GW

I'd be interested in a list of well-managed government science and engineering projects if one exists. The Manhattan Project and the Apollo Project both belong on that list (despite both having their flaws- leaks to the USSR from the former, and the Apollo 1 disaster from the latter); what are other examples?

comment by LawChan · 2018-01-19T22:37:45.699Z · score: 26 (11 votes) · LW · GW

Minor nitpick - these don't seem to be models (at least not gears-level models) so much as background assumptions or heuristics.

comment by Benito · 2018-01-27T03:57:06.261Z · score: 13 (3 votes) · LW · GW

This seems correct, and I am worried about words losing their precise meaning.

When I wrote this post I thought the points in 'background models' feel qualitatively different to the others.

• They feel like the cruxes that most affect the subsequent heuristics - you might change my mind on one of the heuristics, but if you changed my mind on any of the 'background models' I'd change most of the heuristics.
• They're the ones I feel most confident about, and where (internally) by beliefs feel like they have the most structure. For example, I'm currently writing a post expanding on my intuitions behind the first one / my model that says 'alignment is hard'.

But I still feel like pushing against model being used for anything other than 'thing with multiple moving parts that makes predictions' so I'll change the wording.

Right now I feel like the first section is more accurately described as being a single model of four parts, and have made slight edits to the post accordingly. Happy to hear of better suggestions, but don't want to significantly change the post and title.

comment by ryan_b · 2018-01-19T21:19:44.195Z · score: 23 (8 votes) · LW · GW

I wonder if you could flesh out your intuitions for avoiding political solutions to problems point a bit more. The first x-risk was nuclear war, which has a technical dimension but is fundamentally a problem of international relations. It also appears to be the most successfully managed x-risk so far, in the sense that it has been an extant threat for 50 years without going off.

I certainly agree we should not content ourselves with an AI ban in lieu of technical progress, but if we use politics in the broad sense of including institutional behavior and conflict I feel this dimension of the problem is currently neglected.

comment by jacobjacob · 2018-01-22T22:56:04.449Z · score: 18 (5 votes) · LW · GW

The mere fact that an x-risk hasn't occured is not evidence that it has been well managed, because that's the only possible state you could observe (if it wasn't true you wouldn't be around). Then again nuclear war is a GCR, so the anthropics might not be that bad.

On another note, if the nuclear situation is what it looks like when humanity "manages" an x-risk, I think we're in a pretty dire state...

comment by Qiaochu_Yuan · 2018-01-23T20:37:59.272Z · score: 7 (2 votes) · LW · GW

Is there consensus on this? My opinion is also that anthropic effects imply that nuclear war hasn't necessarily been well-managed (reading stories like Petrov's it seems like dumb luck has been more important than good institutional management) but my impression is that people are far from universally accepting enough of anthropic reasoning to buy this in general.

comment by ryan_b · 2018-01-25T18:30:42.254Z · score: 5 (2 votes) · LW · GW

Well-managed and best-managed aren't necessarily the same thing. The fact remains that the nuclear problem absorbed huge amounts of intellectual effort, spurred the development of whole fields expressly for its control, and has been a consistent, global insitutional priority.

The trouble is that no other x-risk has really been managed at all, although we are clearly moving in that direction for the climate. Any management vs. zero management -> best management.

comment by Cheibriados · 2018-01-20T23:41:43.136Z · score: 6 (2 votes) · LW · GW

The fact that - unlike the case of the nuclear war where the quality of the threat was visible to politicians and the public alike - alignment seems to be a problem which not even all AI researchers understand is worth mentioning. That in itself probably excludes the possibility of a direct political solution. But even politics in the narrow sense can be utilized with a bit of creativity (e.g. by providing politicians a motivation more direct than saving the world, grounded on things they can understand without believing weird-sounding claims of cultish-looking folks).

comment by AlexMennen · 2018-01-22T06:14:21.827Z · score: 4 (1 votes) · LW · GW
I certainly agree we should not content ourselves with an AI ban in lieu of technical progress

Why not? An AI ban isn't politically possible, but if it was enacted and enforced, I'd expect it to be effective at preventing risks from unaligned AI.

comment by Matthew Barnett (matthew-barnett) · 2018-01-23T08:47:35.888Z · score: 7 (3 votes) · LW · GW

I've heard before that an argument against banning AI research (even if you could do such a thing) is that hardware will continue to improve. This is bad because it enables less technically abled parties to weild supercomputer-level AI developments. It's better that a single company stays ahead in the race than the remote possibility that someone can create a seed-AI in their basement.

comment by ryan_b · 2018-01-23T19:41:58.852Z · score: 4 (1 votes) · LW · GW

I argue this is not enforceable in any meaningful sense. Returning to the nuclear weapons example, there are large industrial facilities and logistical footprints which are required. These can be tracked and targeted for enforcement. By contrast, computers and mathematics are cheap, ubiquitous, and you cannot have a modern civilization without them. As secret projects go, AI would be trivial to conceal. The best we could do is enforce a publishing ban - but stopping the flow of any kind of information is a very expensive task, and we could not confidently say the risk is mitigated, only delayed. Further, voluntary compliance would only mean ceding the first-mover advantage to institutions which are already less concerned with issues like ethics.

I would expect attempts to ban AI research to make it marginally less likely to appear, and much less likely to be aligned if it does. Not a net gain.

comment by Qiaochu_Yuan · 2018-01-21T00:39:37.215Z · score: 18 (6 votes) · LW · GW

Thanks for writing this. I want to push back against a couple of things:

First, I think it's not at all clear that intellectual progress on alignment is easy, and I think describing the situation in terms of low-hanging fruit is misleading. I think it's more like there are lots of low-hanging thingies some of which are probably fruit but it's unclear which ones, and the other ones are poisonous in unknown ways. More concretely, I worry a lot about people making implicit assumptions or adopting frames that seem reasonable but actually actively make it harder to have the right thoughts and go in the right direction. (I have this same kind of worry about bad LW posts.)

Second, I think it's not at all clear that talking about what Elon Musk said to Demis Hassabis is unimportant; this is closely related to the disagreement I had with Ray about demon threads. Put bluntly, Elon Musk and Demis Hassabis are very powerful people whose actions matter a lot, and it makes sense to keep track of what's going on with them. The gossip intuitions we evolved for doing this, if anything, underestimate the impact of doing so, since those intuitions are tied to gossip affecting the fate of a few people in a tribe whereas this kind of gossip potentially affects the long-run future of humanity.

comment by Benito · 2018-01-21T21:53:26.382Z · score: 15 (3 votes) · LW · GW

If you work at SpaceX, the decisions Elon Musk makes are very important to you. However, it would be bad if you thus spent a large deal of time discussing Musk's motives, goals, and personal preferences, instead of building a space rocket. It's best for the company as a whole to spend most resources building the actual products, and the best way to reliably rise is to reliably create value. It would be really bad if each worker at the company spent 10% of their time gossiping about Musk, rather than building a space rocket.

To give another example, in online forums, there's a common phenomena whereby a set that is nominally about something (e.g. podcasting equipment) becomes about the forum - who should run it, who gets what rights, etc.

The general point is that there is a constant force to discuss the meta, discuss the community, discuss status, that negatively affects any community that is trying to do something real, be it discuss podcasting gear, anime, or research. The x-risk community is no different.

(I was somewhat sloppy in my words; while I did say "Avoid things that (because they're social) are fun to argue about", I also said "Another candidate for a sexy subject that is basically a distraction, is discussion of the high status people in AI e.g. "Did you hear what Elon Musk said to Demis Hassabis?"". It is actually important, but I want to suggest that most of the time we discuss it we're likely being motivated by other reasons, and on net we should push against that.)

comment by Raemon · 2018-01-22T01:29:47.754Z · score: 14 (3 votes) · LW · GW

Ironic disclaimer: Arguing about whether politics is useful is temptingly distracting in the same way that politics is. I'll do one more public response clarifiying something if need be, but if this seems like it warrants further discussion would prefer to do so in private channel.

You persuaded me elsewhere that politics/gossip/etc are (at least sometimes, at least reasonably) important to most people. However, the phrasing of this comment feels exactly like what I was trying to caution against.

Ben's point about "just build the space rocket" is one key point. Another is that if you aren't in particular circles, Demis and Elon can't hear you. And, the set of things you can do that influence them meaningfully are very different from what your intuitions will push you towards. (on average, for most values of "you")

Yes, there's something important that needs to be dealt with here. But not the way everyone will do by default as if Elon and Demis were people in the tribe a few hundred feet away.

AI alignment technical progress feels (to the average AI Alignment enthusiast) like something they don't understand well enough to comment on. So they instead comment on something they think they can comment on, which is what Elon and Demis et all seem to be doing.

I think (this is a bit of an exaggeration but I think close enough to true) that productively engaging in politics should feel about as intimidatingly-opaque as AI Technical Progress does. If it doesn't feel like you're solving a complicated problem that your brain didn't evolve to handle, you probably aren't doing it right.

comment by Benito · 2018-01-22T02:43:35.352Z · score: 5 (1 votes) · LW · GW
Arguing about whether politics is useful is temptingly distracting in the same way that politics is.

Yeah, this is why I decided that the next post I write on this topic will be more fleshing out of the 'background models' section than the heuristic section.

comment by Qiaochu_Yuan · 2018-01-23T20:42:17.701Z · score: 4 (1 votes) · LW · GW

You and Ben make fair points. I think I didn't have a good sense of what level of gossip Ben was pushing back against; I had a sense he was pushing back against "occasionally gossip at parties" which seemed too strong to me (that's the level of gossip I get exposed to by default), but if Ben is pushing against something more like "15% of my conversations are dominated by gossip by default" that would make more sense to me.

comment by Benito · 2018-01-21T21:59:10.020Z · score: 5 (1 votes) · LW · GW
I think it's not at all clear that intellectual progress on alignment is easy, and I think describing the situation in terms of low-hanging fruit is misleading

For the most part I was just stating my belief rather than arguing for it (though I pointed at two examples). I'll think some more about this and maybe write up a post (though I think Ray is also planning such a post).

comment by Caspar42 · 2018-01-20T10:14:15.591Z · score: 18 (8 votes) · LW · GW

I think this is a good overview, but most of the views proposed here seem contentious and the arguments given in support shouldn't suffice to change the mind of anyone who has thought about these questions for a bit or who is aware of the disagreements about them within the community.

Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.

If your values differ from those of the average human, then this may not be true/relevant. E.g., I would guess that for a utilitarian current average human values are worse than, e.g., 90% "paperclipping values" and 10% classical utilitarianism.

Also, if gains from trade between value systems are big, then a lot of value may come from ensuring that the AI engages in acausal trade (https://wiki.lesswrong.com/wiki/Acausal_trade ). This is doubly persuasive if you already see your own policies as determining what agents with similar decision theories but different values do elsewhere in the universe. (See, e.g., section 4.6.3 of "Multiverse-wide Cooperation via Correlated Decision Making".)

Given timeline uncertainty, it's best to spend marginal effort on plans that assume / work in shorter timelines.
Stated simply: If you don't know when AGI is coming, you should make sure alignment gets solved in worlds where AGI comes soon.

I guess the question is what "soon" means. I agree with the argument provided in the quote. But there are also some arguments to work on longer timelines, e.g.:

• If it's hard and most value comes from full alignment, then why even try to optimize for very short timelines?
• Similarly, there is a "social" difficulty of getting people in AI to notice your (or the AI safety community's) work. Even if you think you could write down within a month a recipe for increasing the probability of AI being aligned by a significant amount, you would probably need much more than a month to make it significantly more likely to get people to consider applying your recipe.

It seems obvious that most people shouldn't think too much about extremely short timelines (<2 years) or the longest plausible timelines (>300 years). So, these arguments together probably point to something in the middle of these and the question is where. Of course, it also depends on one's beliefs about AI timelines.

To me it seems that the concrete recommendations (aside from the "do AI safety things") don't have anything to do with the background assumptions.

As one datapoint, fields like computer science, engineering and mathematics seem to make a lot more progress than ones like macroeconomics, political theory, and international relations.

For one, "citation needed". But also: the alternative to doing technical AI safety work isn't to do research in politics but to do political activism (or lobbying or whatever), i.e. to influence government policy.

As your "technical rather than political" point currently stands, it's applicable to any problem, but it is obviously invalid at this level of generality. To argue plausibly that technical work on AI safety is more important than AI strategy (which is plausibly true), you'd have to refer to some specifics of the problems related to AI.

comment by Benito · 2018-01-21T22:04:29.337Z · score: 11 (3 votes) · LW · GW
if gains from trade between value systems are big, then a lot of value may come from ensuring that the AI engages in acausal trade

Yeah, that sounds right to me. Most of the value is probably spread between that and breaking out of our simulation, but I haven't put much thought into it. There's other crucial considerations too (e.g. how to deal with an infinite universe). Thanks for pointing out the nuanced ways that what I said was wrong, and I'll reflect more on what true sentiment my inutions are pointing to (if the sentiment is indeed true at all).

comment by AlexMennen · 2018-01-22T06:08:46.183Z · score: 9 (2 votes) · LW · GW
90% "paperclipping values" and 10% classical utilitarianism.

Are those probabilities, or weightings for taking a weighted average? And if the latter, what does that even mean?

comment by Elizabeth (pktechgirl) · 2018-01-19T17:43:52.393Z · score: 14 (3 votes) · LW · GW

Another vote for ideas on how to steer away from sticky, low-productivity conversations.

comment by Benito · 2018-01-21T20:37:42.081Z · score: 10 (4 votes) · LW · GW

Some thoughts on how to turn conversations about social-reality, into conversations about reality:

• The main skill I use (which I still need to practice more) is to find something about the discussion that I'm honestly uncertain or confused about - somewhere that my models stop - and get curious
• My description of this looks like: Start proposing mechanisms that would predict the phenomena, notice which parts of the mechanism your System 1 feels iffy about, and attempt to improve/modify the mechanism. Then iterate.
• A high-effort way to go straight to this skill is to run a simulation of a person you know who is endlessly curious about the object level details of the world. That person who is always asking the physics/economics/astronomy/etc questions. Simulate them in your situation and see what they'd start talking about.
• A low-effort way to imitate this skill (to help you bootstrap up) might be to pick some part of the conversational topic, and go through the process of a 5 Whys analysis. Practising the process can help your System 1 learn the purpose of the process.
• (Jacobjacob suggested 5 Whys to me in person, inspiring me to write this whole comment)
comment by TurnTrout · 2018-01-19T02:47:17.706Z · score: 13 (6 votes) · LW · GW

Regarding 3), both of the AI-minded professors I spoke to at my university dismissed AI alignment work due to this epistemic issue.

Nothing related to AI safety taught here, but I’ll spend my free time at my PhD program going through MIRI’s reading list.

comment by Benito · 2018-01-19T02:49:30.265Z · score: 11 (3 votes) · LW · GW

If you write up thoughts along the way, framings you find useful for understanding the reading list concepts, and any new ideas that come to mind, I think that would be a great submission for the AI Alignment prize :-)

comment by LawChan · 2018-01-19T22:43:50.807Z · score: 12 (3 votes) · LW · GW

Regarding the "fun to argue about" point - maybe a positive recommendation would be "focus on hitting the target"? Or "focus on technical issues"? I don't think there is a nice, concise phrase that captures what to do.

From Yudkowsky's Twelve Virtues of Rationality:

Before these eleven virtues is a virtue which is nameless.
Miyamoto Musashi wrote, in The Book of Five Rings:
“The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy’s cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him. More than anything, you must be thinking of carrying your movement through to cutting him.”
Every step of your reasoning must cut through to the correct answer in the same movement. More than anything, you must think of carrying your map through to reflecting the territory.

Musashi calls this virtue "the way of the void"; but I think that this name is sufficient counterintuitive that we should not try to adopt it.

I also am not sure if this is something that's informed about your models on AI per se; getting nerd-sniped is a common issue for intellectual communities, and being able to actually do things that contribute to your goals is a super important skill.

comment by Benito · 2018-01-21T20:57:07.290Z · score: 5 (1 votes) · LW · GW

Focus on technical issues is also a good commandment. See my reply to Elizabeth elsewhere on this page (link) for other thoughts I have on positive recommendations.

Regarding avoiding being nerd-sniped; I didn't actually say avoid being nerd-sniped - I think learning to just enjoy the technical problems for their own sake can be a great thing for doing valuable research. I specifically meant to avoid getting pulled into e.g. gossip, and other things that steal your attention because they're social. The same way that forums about podcasts can just become forums that discuss who gets to be president of the podcast forum, it's important to fight against the forces that would cause an AI x-risk community to largely be about the people in charge of the AI community, rather than understanding AI, alignment, and making intellectual progress.

comment by whpearson · 2018-01-19T19:29:10.916Z · score: 12 (3 votes) · LW · GW
2. Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.

Miri from their fundraiser seems to think it is important that the first people that develop AI use it to develop other technologies to get to a safe period. This suggests to me that they care what is done with the first AGIs.

From their fundraiser:

and if early AGI systems can be used safely at all, then we expect it to be possible for an AI-empowered project to safely automate a reasonably small set of concrete science and engineering tasks that are sufficient for ending the risk period.

Most alignment research seems to be about aligning to one person. Things like corrigibility seem to be like that, rather than the whole of humanity.

I also think you can get this type of "alignment" right, in that it aligns to one person or group and still have a bad outcome. It depends a lot on the sanity of that group. A suicide cult that wants to destroy the earth would not be a good group for the AI to be aligned to, for the rest of us.

comment by Benito · 2018-01-21T22:13:03.233Z · score: 5 (1 votes) · LW · GW

Actually, I might just be wrong about this. There are very important questions regarding what goals to set up in the early AGIs.

I still think that there's something important that's close to this though, and I may write a separate post trying to say this more accurately.

comment by Elizabeth (pktechgirl) · 2018-01-23T06:02:25.972Z · score: 10 (4 votes) · LW · GW

I promoted this to curated for:

• Being very specific and actionable.
• Clear, concise writing.
• Explaining both the model and the implications.
comment by avturchin · 2018-01-21T13:34:00.820Z · score: 10 (3 votes) · LW · GW

I think that focusing only on the technical solutions has two shortcomings:

1) Technical solutions are local, so they will work for just one AI, but not for any other AI ine world. To deliver your technical solution to other AI teams you need social or political instrument, which you suggest to ignore. As a result, the only way to make your local solution global is to use first AI to forcefully take over the world. It produce convergence of goals in the direction that first AI must be military AI.

2) Techical solution should not include technical work on AI, as it may increease probability of its creation. It is limiting the field of technical work on AI safety to the decesion theory, utility theory etc.

These two properties produce very narrow field of work on technical AI safety, and most useful fruites are probably outside it.

comment by Trent Fowler (trent-fowler) · 2018-02-04T20:01:31.847Z · score: 2 (3 votes) · LW · GW

For years I've been wanting to put together a research or reading group to work on value alignment. I started a meetup group, gave a number of talks on x-risk and machine ethics, and even kicked around the idea of founding my own futurist institute.

None of this went anywhere.

So: if someone in the Denver/Boulder Colorado area happens to read this comment and wants to hel with some of these goals, get in touch with me on Facebook or at fowlertm9@gmail.com.

Also, I am putting together a futurist speaker series on behalf of the Da Vinci Institute, and if you'd like to talk about the value alignment problem please drop me a line.

(Unrelated: the speaker series is in part meant to advertise for the Da Vinci institute's outstanding office spaces. If you have a startup and need space let me know)