Reframing Superintelligence: Comprehensive AI Services as General Intelligence

post by Rohin Shah (rohinmshah) · 2019-01-08T07:12:29.534Z · LW · GW · 77 comments

This is a link post for https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf?asd=sa

Contents

  The Model
  Why Comprehensive?
  Isn't this just as dangerous as AGI?
  What happens when we create AGI?
  Safety in the CAIS world
  Summary
None
77 comments

Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy. ETA: This comment [AF(p) · GW(p)] provides updates based on more discussion with Eric.

The Model

The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D) processes. AI researchers consider a problem, define a search space, formulate an objective, and use an optimization technique in order to obtain an AI system, called a service, that performs the task.

A service is an AI system that delivers bounded results for some task using bounded resources in bounded time. Superintelligent language translation would count as a service, even though it requires a very detailed understanding of the world, including engineering, history, science, etc. Episodic RL agents also count as services.

While each of the AI R&D subtasks is currently performed by a human, as AI progresses we should expect that we will automate these tasks as well. At that point, we will have automated R&D, leading to recursive technological improvement. This is not recursive self-improvement, because the improvement comes from R&D services creating improvements in basic AI building blocks, and those improvements feed back into the R&D services. All of this should happen before we get any powerful AGI agents that can do arbitrary general reasoning.

Why Comprehensive?

Since services are focused on particular tasks, you might think that they aren't general intelligence, since there would be some tasks for which there is no service. However, pretty much everything we do can be thought of as a task -- including the task of creating a new service. When we have a new task that we would like automated, our service-creating-service can create a new service for that task, perhaps by training a new AI system, or by taking a bunch of existing services and putting them together, etc. In this way, the collection of services can perform any task, and so as an aggregate is generally intelligent. As a result, we can call this Comprehensive AI Services, or CAIS. The "Comprehensive" in CAIS is the analog of the "General" in AGI. So, we'll have the capabilities of an AGI agent, before we can actually make a monolithic AGI agent.

Isn't this just as dangerous as AGI?

You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process, and so it will not have any of the standard convergent instrumental subgoals (unless the subgoals are helpful for the task before reaching the bound).

In addition, all of the optimization pressure on the service is pushing it towards a particular narrow task. This sort of strong optimization tends to focus behavior. Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task. Think of how a racecar is optimized for speed, while a bus is optimized for carrying passengers, rather than having a "generally capable vehicle".

It's also worth noting what we mean by superintelligent here. In this case, we mean that the service is extremely competent at its assigned task. It need not be learning at all. We see this distinction with RL agents -- when they are trained using something like PPO, they are learning, but at test time you can simply execute them without any PPO and they will perform the behavior they previously learned and won't change that behavior at all.

(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and online learning settings, or even with vanilla RL if the learned policy has access to external memory and can implement a planning process separately from the training procedure.)

On a different note, you might argue that if we analyze the system of services as a whole, then it certainly looks generally intelligent, and so should be regarded as an AGI agent. However, "AGI agent" usually carries the anthropomorphic connotation of VNM rationality / expected utility maximization / goal-directedness. While it seems possible and even likely that each individual service can be well-modeled as VNM rational (albeit with a bounded utility function), it is not the case that a system of VNM rational agents will itself look VNM rational -- in fact, game theory is all about how systems of rational agents have weird behavior.

In addition, there are several aspects of CAIS that make it more safe than a classic monolithic AGI agent. Under CAIS, each service interacts with other services via clearly defined channels of communication, so that the system is interpretable and transparent, even though each service may be opaque. We can reason about what information is present in the inputs to infer what the service could possibly know. We could also provide access to some capability through an external resource during training, so that the service doesn't develop that capability itself.

This interpretability allows us to monitor the service -- for example, we could look at which subservices it accesses in order to make sure it isn't doing anything crazy. But what if having a human in the loop leads to unacceptable delays? Well, this would only happen for deployed applications, where having a human in the loop seems expected, and should also be economically incentivized because it leads to better behavior. Basic AI R&D can continue to be improved autonomously without a human in the loop, so you could still see an intelligence explosion. Note that tactical tasks requiring quick reaction times probably would be delegated to AI services, but the important strategic decisions could still be left in human hands (assisted by AI services, of course).

What happens when we create AGI?

Well, it might not be valuable to create an AGI. We want to perform many different tasks, and it makes sense for these to be done by diverse services. It would not be competitive to include all capabilities in a single monolithic agent. This is analogous to how specialization of labor is a good idea for us humans.

(My opinion: It seems like the lesson of deep learning is that if you can do something end-to-end, that will work better than a structured approach. This has happened with computer vision, natural language processing, and seems to be in the process of happening with robotics. So I don't buy this -- while it seems true that we will get CAIS before AGI since structured approaches tend to be available sooner and to work with less compute, I expect that a monolithic AGI agent would outperform CAIS at most tasks once we can make one.)

That said, if we ever do build AGI, we can leverage the services from our CAIS-world in order to make it safe. We could use superintelligent security services to constrain any AGI agent that we build. For example, we could have services trained to identify long-term planning processes and to perform adversarial testing and red teaming.

Safety in the CAIS world

While CAIS suggests that we will not have AGI agents, this does not mean that we automatically get safety. We will still have AI systems that take high impact actions, and if they take even one wrong action of this sort it could be catastrophic. One way this could happen is if the system of services starts to show agentic behavior -- our standard AI safety work could apply to this scenario.

In order to ensure safety, we should have AI safety researchers figure out and codify the best development practices that need to be followed. For example, we could try to always use predictive models of human (dis)approval as a sanity check on any plan that is being enacted. We could also train AI services that can adversarially check new services to make sure they are safe.

Summary

The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D. This reframes the problems of AI safety and has implications for what technical safety researchers should be doing.

ETA: This comment [AF(p) · GW(p)] provides updates based on more discussion with Eric.

77 comments

Comments sorted by top scores.

comment by Wei Dai (Wei_Dai) · 2019-01-08T17:23:05.321Z · LW(p) · GW(p)

This is one of the documents I was responding to when I wrote A general model of safety-oriented AI development [LW · GW], Three AI Safety Related Ideas [LW · GW], and Two Neglected Problems in Human-AI Safety [LW · GW]. (I didn't cite it because it was circulating semi-privately in draft form, and Eric apparently didn't want its existence to be publicly known.) I'm disappointed that although Eric wrote to me "I think that your two neglected problems are critically important", the perspectives in those posts didn't get incorporated more into the final document, which spends only 3 short paragraphs out of hundreds of pages to talk about what I think of as "human safety problems". (I think those paragraphs were in the draft even before I wrote my posts.)

I worry about the framing adopted in this document that the main problem in human-AI safety is "questions of what humans might choose to do with their capabilities", as opposed to my preferred framing of "how can we design human-AI systems to minimize total risk". (To be fair to Eric, a lot of other AI safety people also only talk about "misuse risk" and not about how AI is by default likely to exacerbate human safety problems, e.g., by causing rapid distributional shifts for humans.) I worry that this gives AI researchers and developers license to think, "I'm just developing an AI service. AI services will be comprehensive anyway so there's no reason for me to hold back or think more about what I'm doing. It's someone else's job to worry about what humans might choose to do with these capabilities."

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-08T17:38:17.991Z · LW(p) · GW(p)

I actually think the CAIS model gives me optimism for these sorts of problems. As long as we acknowledge that the problems exist and can be an issue, we could develop services that help us mitigate them. Safety in the CAIS world already depends on having services that are in charge of good engineering, testing, red teaming, monitoring, etc., as well as services that evaluate objectives and make sure humans would approve of them. It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc.

I'd be interested in a list of services that you think would be helpful for addressing human safety problems. You might think of this as "our best current guess at metaphilosophy and metaphilosophy research".

(I know you were mainly talking about the document's framing, I don't have much to say about that.)

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-10T12:28:42.979Z · LW(p) · GW(p)

It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc.

Can you explain how you'd implement these services? Take "how disruptive new technologies will be" for example. I imagine you can't just apply ML given the paucity of training data and how difficult it would be to generalize from historical data to new technologies and new social situations. And it seems to me that if you base it on any kind of narrow AI technology, it would be easy to miss some of the novel implications/consequences of the new technologies and social situations and end up with a wrong answer. Maybe you could instead base it on a general purpose reasoner or question-answerer, but if something like that exists, AI would already have created a lot of new technologies that are risky for humans to face. Plus, the general purpose AI could replace a lot of discrete/narrow AI services, so I feel like we would already have moved past the CAIS world at that point. BTW, if the service is not just a thin wrapper on top of a general purpose AI which is generally trustworthy, I also don't know how you'd know whether you can trust the answers that it gives.

I’d be interested in a list of services that you think would be helpful for addressing human safety problems. You might think of this as “our best current guess at metaphilosophy and metaphilosophy research”.

I could try to think in that direction after I get a better sense of what kinds of services might be both feasible and trustworthy in the CAIS world. It seems easy to become too optimistic/complacent under the CAIS model if I just try to imagine what safety-enhancing services might be helpful without worrying about whether those services would be feasible or how well they'd work at the time when they're needed.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-10T17:35:59.166Z · LW(p) · GW(p)
Can you explain how you'd implement these services?

Not really. I think of CAIS as suggesting that we take an outside view that says "looking at how AI has been progressing, and how humans generally do things, we'll probably be able to do more and more complex tasks as time goes on". But the emphasis that CAIS places is that the things we'll be able to do will be domain-specific tasks, rather than getting a general-purpose reasoner. I don't have a detailed enough inside view to say how complex tasks might be implemented in practice.

I agree with the rest of what you said, which feels to me like considering a few possible inside-view scenarios and showing that they don't work.

One way to think about this is through the lens of iterated amplification. With iterated amplification, we also get the property that our AI systems will be able to do more and more complex tasks as time goes on. The key piece that enables this is the ability to decompose problems, so that iterated amplification always bottoms out into a tree of questions and subquestions down to leaves which the base agent can answer. You could think of (my conception of) CAIS as a claim that a similar process will happen in a decentralized way for all of ML by default, and at any point the things we can do will look like an explicit iterated amplification deliberation tree of depth one or two, where the leaves are individual services and the top level question will be some task that is accomplished through a combination of individual services.

I could try to think in that direction after I get a better sense of what kinds of services might be both feasible and trustworthy in the CAIS world. It seems easy to become too optimistic/complacent under the CAIS model if I just try to imagine what safety-enhancing services might be helpful without worrying about whether those services would be feasible or how well they'd work at the time when they're needed.

Agreed, I'm making a bid for generating ideas without worrying about feasibility and trustworthiness, but not spending too much time on this and not taking the results too seriously.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-10T21:57:38.378Z · LW(p) · GW(p)

You could think of (my conception of) CAIS as a claim that a similar process will happen in a decentralized way for all of ML by default, and at any point the things we can do will look like an explicit iterated amplification deliberation tree of depth one or two, where the leaves are individual services and the top level question will be some task that is accomplished through a combination of individual services.

This seems like a sensible way of looking at things, and in this framing I'd say that my worry is that crucial safety-enhancing services may only appear fairly high in the overall tree of services, or outside the tree altogether (see also #3 in Three AI Safety Related Ideas which makes a similar point), and in the CAIS world it would be hard to limit access to the lower-level services (as a risk-reduction measure).

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-10T22:50:46.047Z · LW(p) · GW(p)

Yeah, that seems right, I don't think anyone is arguing against that claim.

comment by Wei Dai (Wei_Dai) · 2019-01-09T06:04:02.365Z · LW(p) · GW(p)

I have a problem with section 32, "Unaligned superintelligent agents need not threaten world stability". Here's the summary of that section from the paper:

  • Powerful SI-level capabilities can precede AGI agents.
  • SI-level capabilities could be applied to strengthen defensive stability.
  • Unopposed preparation enables strong defensive capabilities.
  • Strong defensive capabilities can constrain problematic agents.

So the key idea here seems to be that good actors will have a period of time to use superintelligent AI services to prepare some sort of ubiquitous defense that will constrain any subsequent AGI agents. But I don't understand where this period of "unopposed preparation" comes from. Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could? If they did that, then superintelligent AGI agents would arise nearly simultaneously with SI-level capabilities, and there would be no such period of unopposed preparation. In section 32.2, Eric only argues that SI-level capabilities can precede AGI agents. Since I think they wouldn't at least not by a significant margin, the whole argument seems to fall apart or has to be interpreted in a way that makes it strategically irrelevant.

Eric seems to think that no one would bother to create AGI because "AGI agents offer no compelling value", by which he means "Because general AI-development capabilities can provide stable, comprehensive AI services, there is no compelling, practical motivation for undertaking the more difficult and potentially risky implementation of self-modifying AGI agents." But if quickly building an AGI can potentially allow someone to take over the world before "unopposed preparation" can take place, isn't that a compelling motivation by itself for many people?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-09T09:55:27.631Z · LW(p) · GW(p)
Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could?

Because any task that an AGI could do, CAIS could do as well. (Though I don't agree with this -- unified agents seem to work better.)

But if quickly building an AGI can potentially allow someone to take over the world before "unopposed preparation" can take place, isn't that a compelling motivation by itself for many people?

I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.

It may be the case that people try to take over the world just with CAIS, and maybe that could succeed. I think he's arguing only against AGI accident risk here, not against malicious uses of AI. (I think you already knew that, but it wasn't fully clear on reading your comment.)

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-09T11:47:40.732Z · LW(p) · GW(p)

I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.

That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32. If that was his position, he could just talk about how ordinary policing and military defense would work in a CAIS world (i.e., against human adversaries wielding CAIS) and say that the same policing/defense would also work against AGI because AGI is not much more capable than CAIS.

Instead it seems clear that he thinks AGI requires special effort to defend against, which is made possible by a delay between SI-level CAIS and AGI, which he proposes that we use to do a very extensive "unopposed preparation". I've been trying to figure out why he thinks there will be such a delay and my current best guess is "Implementation of the AGI model is widely regarded as requiring conceptual breakthroughs." (page 75) which he repeats on page 77, "AGI (but not CAIS) calls for conceptual breakthroughs to enable both implementation and subsequent safe application." I don't understand why he thinks such conceptual breakthroughs will be required though. Why couldn't someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI? Do you get it? Or am I on the wrong track here?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-09T23:09:16.450Z · LW(p) · GW(p)
Do you get it?

I doubt I will ever be able to confidently answer yes to that question.

That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32.

My model is that he does think AGI won't be much more capable than CAIS (see sections 12 and 13 in particular, and 10, 11 and 16 also touch on the topic), but lots of people (including me) kept making the argument that end-to-end training tends to improve performance and so AGI would outperform CAIS, and so he decided to write a response to that.

In general, my impression from talking to him and reading earlier drafts is that the earlier chapters are representative of his core models, while the later chapters are more like responses to particular arguments, or specific implications of those models.

I can give one positive argument for AGI being harder to make than SI-level CAIS. All of our current techniques for building AI systems create things that are bounded in the time horizon they are optimizing over. It's actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the "conceptual breakthroughs" point.) Seems a lot easier to get a bunch of bounded services and hook them up together in such a way that they can do the sorts of things that AGI agents could do.

The one scenario that is both concrete and somewhat plausible to me is that we run powerful deep RL on a very complex environment, and this finds an agent that does very-long-term-planning, because that's what it takes to do well on the environment. I don't know what Eric thinks about this scenario, but it doesn't seem to influence his thinking very much (and in fact in the OP I argued that CAIS isn't engaging enough with this scenario).

Why couldn't someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI?

If you take a bunch of a bounded services and connect them together in some straightforward way, you wouldn't get something that is optimizing over the long term. Where did the long term optimization come from?

For example, you could take any long term task and break it down into the "plan maker" which thinks for an hour and gives a plan for the task, and the "plan executor" which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn't have convergent instrumental subgoals.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-10T00:19:33.817Z · LW(p) · GW(p)

Thanks, I think this is helpful for me to understand Eric's model better, but I'm still pretty confused.

It’s actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the “conceptual breakthroughs” point.)

But it's quite unclear how to use current techniques to do a lot of things. Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS? (Given your disagreement with Eric on this, I guess this is more a question for him than for you.)

Where did the long term optimization come from?

I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.

For example, you could take any long term task and break it down into the “plan maker” which thinks for an hour and gives a plan for the task, and the “plan executor” which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn’t have convergent instrumental subgoals.

I don't see why it wouldn't, unless these services are specifically designed to be corrigible (in which case the "corrigible" part seems much more important than the "service" part). For example, suppose you asked the plan maker to create a plan to cure cancer. Why would the mere fact that it's a bounded service prevent it from coming up with a plan that involves causing human extinction (and a bunch of convergent instrumental subgoals like deceiving humans who might stop it)? (If there was a human in the loop, then you could look at the plan and reject it, but I'm imagining that someone, in order to build an AGI as quickly and efficiently as possible, stripped off the "optimize for human consumption" part of the strategic planner and instead optimized it to produce plans for direct machine consumption.)

Replies from: rohinmshah, PeterMcCluskey
comment by Rohin Shah (rohinmshah) · 2019-01-10T17:22:37.130Z · LW(p) · GW(p)
Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS?

I think I share Eric's intuition that this problem is hard in a more fundamental way than other things, but I don't really know why I have this intuition. Some potential generators:

  • ML systems seem to be really good at learning tasks, but really bad at learning explicit reasoning. I think of CAIS as being on the side of "we never figure out explicit reasoning at the level that humans do it", and making up for this deficit by having good simulators that allow us to learn from experience, or by collecting much more data across multiple instances of AI systems, or by trying out many different AI designs and choosing the one which performs best.
  • It seems like humans tend to build systems by making individual parts that we can understand and predict well, and putting those together in a way where we can make some guarantees/predictions about what will happen. CAIS plays to this strength, whereas "figure out how to do very-long-term-planning" doesn't.
I don't see why it wouldn't, unless these services are specifically designed to be corrigible (in which case the "corrigible" part seems much more important than the "service" part).

Yeah, you're right, I definitely said the wrong thing there. I guess the difference is that the convergent instrumental subgoals are now "one level up" -- they aren't subgoals of the AI service itself, they're subgoals of the plan that was created by the AI service. It feels like this is qualitatively different and easier to address, but I can't really say why. More generators:

  • In this setting, convergent instrumental subgoals happen only if the plan-making service is told to maximize outcomes. However, since it's one level up, it should be easier to ask for something that says something more like "do X, interpreted pragmatically and not literally".
  • Things that happen one level up in the CAIS world are easier to point at and more interpretable, so it should be easier to find and fix issues of this sort.

(You could of course say "just because it's easier that doesn't mean people will do it", but I could imagine that if its easy enough this becomes best practice and people do it by default, and you don't actually gain very much by taking these parts out.)

I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.

Yeah, here also what I should have said is that the long term optimization is happening one level up, whereas with the typical AGI agent scenario it feels like the long term optimization needs to happen at the base level, and that's the thing we don't know how to do.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-11T05:11:39.003Z · LW(p) · GW(p)

Unfortunately, I only vaguely understand the points that you're trying to make in this comment... Would it be fair to just say at this point that this is an important crux that Eric failed to convincingly argue for?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-11T17:00:12.975Z · LW(p) · GW(p)

I agree that it's an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric's position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-11T17:36:13.168Z · LW(p) · GW(p)

Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about "AGI won’t be much more capable than CAIS" and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-12T00:27:57.877Z · LW(p) · GW(p)

I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives.

Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn't really argue for that.

Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn't give you AGI.

Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.)

Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.)

Section 17: Most complex tasks involve several different subtasks that don't interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services.

Section 38: Division of labor + specialization are useful for good performance.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-13T06:29:50.135Z · LW(p) · GW(p)

Most of these sections seem to only contain arguments that AGI won't come earlier than CAIS, but not that it would come later than CAIS. In other words, they don't argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn't simpler than CAIS, and not that it's more complex, except for this paragraph in the summary, Section 13.5:

To summarize, in each of the areas outlined above, the classic AGI model both obscures and increases complexity: In order for general learning and capabilities to fit a classic AGI model, they must not only exist, but must be integrated into a single, autonomous, self-modifying agent. Further, achieving this kind of integration would increase, not reduce, the challenges of aligning AI behaviors with human goals: These challenges become more difficult when the goals of a single agent must motivate all (and only) useful tasks.

So putting alignment aside (I'm assuming that someone would be willing to build an unaligned AGI if it's easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is "must be integrated into a single, autonomous, self-modifying agent", but why should this integration add a non-negligible amount of complexity? Why can't someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn't seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-17T18:19:25.133Z · LW(p) · GW(p)
Why can't someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed?

I think Eric would not call that an AGI agent.

Setting aside what Eric thinks and talking about what I think: There is one conception of "AGI risk" where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is "pointed at" a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself.

With the system you described, there is no end-to-end training, and it doesn't seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me.

I think that Eric's conception of "AGI agent" is like the first thing I described. I agree that this is not what everyone means by "AGI", and it is particularly not the thing you mean by "AGI".

You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add -- such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can't look at things like "generated plans", and you can't check that you have the right goal because the Eric-AGI-agent will not let you change its goal.)

With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like "be as accurate as possible", which will lead to perverse behavior like manipulating humans so that what they approve is easier to predict. In CAIS, this doesn't happen, because the approval model is not searching over possibilities that involve manipulating humans.

comment by PeterMcCluskey · 2019-01-25T16:49:49.591Z · LW(p) · GW(p)

I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.

That's not consistent with my understanding of section 27. My understanding is that Drexler would describe that as too dangerous.

suppose you asked the plan maker to create a plan to cure cancer.

I suspect that a problem here is that "plan maker" is ambiguous as to whether it falls within Drexler's notion of something with a bounded goal.

CAIS isn't just a way to structure software. It also requires some not-yet-common sense about what goals to give the software.

"Cure cancer" seems too broad to qualify as a goal that Drexler would consider safe to give to software. Sections 27 and 28 suggest that Drexler wants humans to break that down into narrower subtasks. E.g. he says:

By contrast, it is difficult to envision a development path in which AI developers would treat all aspects of biomedical research (or even cancer research) as a single task to be learned and implemented by a generic system.

Replies from: PeterMcCluskey, rohinmshah
comment by PeterMcCluskey · 2019-01-28T22:32:35.165Z · LW(p) · GW(p)

After further rereading, I now think that what Drexler imagines is a bit more complex: (section 27.7) "senior human decision makers" would have access to a service with some strategic planning ability (which would have enough power to generate plans with dangerously broad goals), and they would likely restrict access to those high-level services.

I suspect Drexler is deliberately vague about the extent to which the strategic planning services will contain safeguards.

This, of course, depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of his analysis.

And presumably the publicly available AI services won't be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer + a theorem prover doesn't sound dangerous. But I'd prefer to have something more convincing than just "I spent a few minutes looking for risks, and didn't find any".

comment by Rohin Shah (rohinmshah) · 2019-01-25T17:52:15.460Z · LW(p) · GW(p)

Fwiw, by my understanding of CAIS and my definition of a service here as "A service is an AI system that delivers bounded results for some task using bounded resources in bounded time", a plan maker would qualify as a service. So every time I make claims about "services" I intend for those claims to apply to plan makers as well.

I have tried to use words the same way that Drexler does, but obviously I can't know exactly what he meant.

comment by Rohin Shah (rohinmshah) · 2019-02-17T21:17:51.485Z · LW(p) · GW(p)

Eric and I have exchanged a few emails since I posted this summary, I'm posting some of it here (with his permission), edited by me for conciseness and clarity. The paragraphs in the quotes are Eric's, but I have rearranged his paragraphs and omitted some of them for better flow in this comment.

There is a widespread intuition that AGI agents would by nature be more integrated, flexible, or efficient than comparable AI services. I am persuaded that this is wrong, and stems from an illusion of simplicity that results from hiding mechanism in a conceptually opaque box, a point that is argued at some length in Section 13.
Overall, I think that many of us have been in the habit of seeing flexible optimization itself as problem, when optimization is instead (in the typical case) a strong constraint on a system’s behavior (see Section 8). Flexibility of computation in pursuit of optimization for bounded tasks seems simply useful, regardless of planning horizon, scope of considerations, or scope of required knowledge.

I agree that AGI agents hide mechanism in an opaque box. I also agree that the sort of optimization that current ML does, which is very task-focused, is a strong constraint on behavior. There seems to be a different sort of optimization that humans are capable of, where we can enter a new domain and perform well in it very quickly; I don't have a good understanding of that sort of optimization, and I think that's what the classic AGI agent risks are about.

Relatedly, I've used the words "monolithic AGI agent" a bunch in the summary and the post. Now, I want to instead talk about whether AI systems will be opaque and well-integrated, since that's the main crux of our disagreement. It's plausible to me that even if they are opaque and well-integrated, you don't get the classic AGI agent risks, because you don't get the sort of optimization I was talking about above.

In this connection, you cite the power of end-to-end training, but Section 17.4 (“General capabilities comprise many tasks and end-to-end relationships”) argues that, because diverse tasks encompass many end-to-end relationships, the idea that a broad set of tasks can be trained “end to end” is mistaken, a result of the narrowness of current trained systems in which services form chains rather than networks that are more wide than deep. We should instead expect that broad capabilities will best be implemented by sets of systems (or sets of end-to-end chains of systems) that comprise well-focused competencies: Systems that draw on distinct subtask competencies will typically be easier to train and provide more robust and general performance (Section 17.5).  Modularity typically improves flexibility and generality, rather than impeding it.
Note that the ability to employ subtask components in multiple contexts constitutes a form of transfer learning, and [...] this transfer learning can carry with it task-specific aspects of behavioral alignment.

This seems like the main crux of the disagreement. My claim is that for any particular task, given enough compute, data and model size, an opaque, well-integrated, unstructured AI system will outperform a transparent, modular collection of services. This is only on the axis of performance at the task: I agree that the structured system will generalize better out of distribution (which leads to robustness, flexibility, and better transfer learning). I'm basing this primarily off of empirical evidence and intuitions:

  • For many tasks so far (computer vision, NLP, robotics), transitioning from a modular architecture to end-to-end deep learning led to large boosts in performance.
  • My impression is that many interdisciplinary academics are able to transfer ideas and intuitions from one of their fields to the other, allowing them to make big contributions that more experienced researchers could not do. This suggests that patterns of problem-solving from one field can transfer to another in a non-trivial way, that you could achieve best with well-integrated systems.
  • Psychology research can be thought of as an attempt to systematize/modularize our knowledge about humans. Despite a huge amount of work in psychology, our internal, implicit, well-integrated models of humans are way better than our explicit theories.

Humans definitely solve large tasks in a very structured way; I hypothesize that this is because for those tasks the limits of human compute/data/brain size prevent us from getting the benefits of an unstructured heuristic approach.

Speaking of integration:

Regarding integration, I’ve argued that classic AGI-agent models neither simplify nor explain general AI capabilities (Section 13.3), including the integration of competencies. Whatever integration of functionality one expects to find inside an opaque AGI agent must be based on mechanisms that presumably apply equally well to integrating relatively transparent systems of services. These mechanisms can be dynamic, rather than static, and can include communication via opaque vector embeddings, jointly fine-tuning systems that perform often-repeated tasks, and matching of tasks to services, (including service-development services) in semantically meaningful “task spaces” (discussed in Section 39 “Tiling task-space with AI services can provide general AI capabilities”).
[...]
Direct lateral links between competencies such as organic synthesis, celestial mechanics, ancient Greek, particle physics, image interpretation, algorithm design, traffic planning (etc.) are likely to be sparse, particularly when services perform object-level tasks. This sparseness is, I think, inherent in natural task-structures, quite independent of human cognitive limitations.

(The paragraphs above were written in a response to me while I was still using the phrase "AGI agents")

I expect that the more you integrate the systems of services, the more opaque they will become. The resulting system will be less interpretable; it will be harder to reason about what information particular services do not have access to (Section 9.4); and it is harder to tell when malicious behavior is happening. The safety affordances identified in CAIS no longer apply because there is not enough modularity between services.

Re: sparseness inherent in task-structures, I think this is a result of human cognitive limitations but don't know how to argue more for that perspective.

Replies from: Wei_Dai, lcmgcd
comment by Wei Dai (Wei_Dai) · 2019-02-17T22:39:28.378Z · LW(p) · GW(p)

Can you summarize this exchange, especially what updates you made as a result of it, if any?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-02-18T03:48:43.004Z · LW(p) · GW(p)

That was the summary :P The full thing was quite a bit longer. I also didn't want to misquote Eric.

Maybe the shorter summary is: there are two axes which we can talk about. First, will systems be transparent, modular and structured (call this CAIS-like), or will they be opaque and well-integrated? Second, assuming that they are opaque and well-integrated, will they have the classic long-term goal-directed AGI-agent risks or not?

Eric and I disagree on the first one: my position is that for any particular task, while CAIS-like systems will be developed first, they will gradually be replaced by well-integrated ones, once we have enough compute, data, and model capacity.

I'm not sure how much Eric and I disagree on the second one: I think it's reasonable to predict that the resulting systems are specialized for particular bounded tasks and so won't be running broad searches for long-term plans. I would still worry about inner optimizers; I don't know what Eric thinks about that worry.

This summary is more focused on my beliefs than Eric's, and is probably not a good summary of the intent behind the original comment, which was "what does Eric think Rohin got wrong in his summary + opinion of CAIS", along with some commentary from me trying to clarify my beliefs.

Updates were mainly about actually carving up the space in the way above. Probably others, but I often find it hard to introspect on how my beliefs are updating.

Replies from: jvmancuso
comment by jvmncs (jvmancuso) · 2019-08-27T21:23:39.478Z · LW(p) · GW(p)

I don't understand why this crux needs to be dichotomous. Setting aside the opacity question for the moment, why can't services in a CAIS be differentiable w.r.t. each other?

Example Consider a language modeling service (L) that is consumed by several downstream tasks, including various text classifiers, an auto-correction service for keyboards, and a machine translation service. In the end-to-end view, it would be wise for these downstream services to use a language representation from L and to propagate their own error information back to L so that it can improve its shared representation. Since the downstream services ultimately make up L's raison d'etre, it will be obliged to do so.

For situations that are not so neatly differentiable, we can describe the services network as a stochastic computation graph if there is a benefit for end-to-end learning the entire system. This should lead to a slightly more precise conjecture about the relationship between the CAIS agent and utility-maximizing agent: A CAIS agent that can be described as a stochastic computation graph is equivalent to some utility-maximizing agent when trained end-to-end via approximate backpropagation.

It's likely that CAIS agents aren't usefully described as stochastic computation graphs, or that we may need to extend the usage of "stochastic computation graph" here to deal with services that create other services as offspring and attach them to the graph. But the possibility itself suggests a spectrum between the archetypal modular CAIS and an end-to-end CAIS, in which subgraphs of the services network are trained end-to-end. It's not obvious to me that the CAIS as defined in the text discounts this scenario, despite Eric's comments here.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-08-27T21:42:33.107Z · LW(p) · GW(p)

I broadly agree, especially if you set aside opacity; I very rarely mean to imply a strict dichotomy.

I do think in the scenario you outlined the main issue would be opacity: the learned language representation would become more and more specialized between the various services, becoming less interpretable to humans and more "integrated" across services.

comment by lukehmiles (lcmgcd) · 2019-11-07T18:40:04.508Z · LW(p) · GW(p)

One way to test the "tasks don't overlap" idea is to have two nets do two different tasks, but connect their internal layers. Then see how high the weights on those layers get. Like, is the internal processing done by Mario AI useful for Greek translation at all? If it is then backprop etc should discover that.

comment by habryka (habryka4) · 2019-02-13T22:48:34.637Z · LW(p) · GW(p)

Promoted to curated: I think the linked document is one of the most interesting things to be written in AI Alignment in the last year, and this is the best summary and commentary of it that currently exists. Quality wise, I think everything that I have to say has already been covered by the other commenters, but I overall found reading the linked document, as well as this summary, to be quite helpful in my thinking about AI Alignment, though I also disagree with large parts of it (However, I am not at the research level, and so have a harder time judging how useful it would be for the people who are spending even more time thinking about AI Alignment).

Thanks a lot for writing this summary, and thanks a lot to Eric for all the work he is doing.

comment by PeterMcCluskey · 2019-01-08T20:26:46.727Z · LW(p) · GW(p)

I want to draw separate attention to chapter 40 of Drexler's paper, which uses what looks like a novel approach to argue that current supercomputers likely have more raw processing power than a human brain. I find that scary.

Replies from: ESRogs
comment by ESRogs · 2019-01-08T21:16:21.659Z · LW(p) · GW(p)

From the conclusion of that section:

Many modern AI tasks, although narrow, are comparable to narrow capacities of neural systems in the human brain. Given an empirical value for the fraction of computational resources required to perform that task with humanlike throughput on a 1 PFLOP/s machine, and an inherently uncertain and ambiguous—yet bounded—estimate of the fraction of brain resources required to perform “the equivalent” of that machine task, we can estimate the ratio of PFLOP/s machine capacity to brain capacity. What are in the author’s judgment plausible estimates for each task are consistent in suggesting that this ratio is ~10 or more. Machine learning and human learning differ in their relationship to costs, but even large machine learning costs can be amortized over an indefinitely large number of task-performing systems and application events.
In light of these considerations, we should expect that substantially superhuman computational capacity will accompany the eventual emergence of a software with broad functional competencies. On present evidence, scenarios that assume otherwise seem unlikely.

I'm not completely sure I'm understanding the first paragraph correctly.

With the bit about "this ratio is ~10 or more" it sounds like he's saying roughly that, "When we use modern AI systems to complete tasks that humans also do, it appears to take 10+ PFLOP/s per human brain."

(Or, since you're not using your whole brain for a given task, maybe a better translation is, "If a task uses 10% of your brain, then a modern AI system will need to use 1+ PFLOP/s to achieve human level performance.")

Does that match other readers' interpretations?

Replies from: Hoagy, rhaps0dy
comment by Hoagy · 2019-02-15T17:08:49.946Z · LW(p) · GW(p)

Late to the party but I'm pretty confident he's saying the opposite - that a 1 PFLOP/s system is likely to have 10 or more times the computational capacity of the human brain, which is rather terrifying.

He gives the example of Baidu's Deep Speech 2 which requires around 1 GFLOP/s to run and produces human-comparable results. This is 10^6 slower than the 1 PFLOP/s machine. He estimates that this process in humans take around 10^-3 of the human brain, thereby giving the estimate of a 1 PFLOP/s system being 10^3 times faster than the brain. His other examples give similar results.

comment by Adrià Garriga-alonso (rhaps0dy) · 2019-01-09T00:17:32.587Z · LW(p) · GW(p)

Yes, though I'm fairly sure he's talking about using trained neural networks to e.g. classify an image, which is known to be fairly cheap, rather than training them. In other words, he's talking about using an AI service rather than creating one.

He also says that "Machine learning and human learning differ in their relationship to costs" which is also evidence for my interpretation: training is expensive, testing on one example is very cheap.

comment by Rohin Shah (rohinmshah) · 2021-01-08T05:16:51.113Z · LW(p) · GW(p)

I trust past-me to have summarized CAIS much better than current-me; back when this post was written I had just finished reading CAIS for the third or fourth time, and I haven't read it since. (This isn't a compliment -- I read it multiple times because I had a lot of trouble understanding it.)

I've put in two points of my own in the post. First:

(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and online learning settings, or even with vanilla RL if the learned policy has access to external memory and can implement a planning process separately from the training procedure.)

I agree even more with this two years later. There is an important point that CAIS makes, which is that learning is separate from competence. Nonetheless, just because an AI system must first learn about a domain before it can act in it, does not mean that we will notice it doing so. The AI does not learn to take over the world by trying to take over the world and failing, it learns by making a plan to take over the world, learning about the relevant domains (e.g. if it wants to engineer a pandemic, it learns about genetics by reading textbooks), until it is confident that its plan will succeed. This can be true even if the AI was trained using PPO.

(To connect with current discourse, this is basically saying "this doesn't engage with mesa optimization")

Second:

(My opinion: It seems like the lesson of deep learning is that if you can do something end-to-end, that will work better than a structured approach. This has happened with computer vision, natural language processing, and seems to be in the process of happening with robotics. So I don't buy this -- while it seems true that we will get CAIS before AGI since structured approaches tend to be available sooner and to work with less compute, I expect that a monolithic AGI agent would outperform CAIS at most tasks once we can make one.)

I still agree with this, though I'd phrase it differently now. Now I would say that there is some level of data, model capacity, and compute at which an end-to-end / monolithic approach outperforms a structured approach on the training distribution (this is related to but not the same as the bitter lesson). However, at low levels of these three, the structured approach will typically perform better. The required levels at which the end-to-end approach works better depends on the particular task, and increases with task difficulty.

Since we expect all three of these factors to grow over time, I then expect that there will be an expanding Pareto frontier where at any given point the most complex tasks are performed by structured approaches, but as time progresses these are replaced by end-to-end / monolithic systems (but at the same time new, even more complex tasks are found, that can be done in a structured way).

(Really I expect this will be true up till human-level AI and a little past that, and after that who knows what happens.)

----

On CAIS itself:

  • I think the "monolithic AGI" that Eric critiques is a bit of a strawman, but nonetheless it is important to argue against it.
  • I really like the learning vs. competence distinction, and use it frequently.
  • I think it is often hard to tell what exactly is being argued in CAIS, and have found it difficult to understand as a result.
  • If I reread it now, I suspect there are many framings I would disagree with.
  • There are other people with similar perspectives, e.g. Michael Jordan, though they don't engage with AI safety arguments as much.
  • CAIS is in my top 20 things produced in the field of AI alignment.
comment by Donald Hobson (donald-hobson) · 2019-01-09T16:08:22.239Z · LW(p) · GW(p)

I disagree outright with

Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.

Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

And the deeper reason for that is that we have no idea how to tell what's a hole.

Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow by blow formal description of what you mean by "cleans cars" then your "service generator" is just a compiler. If you do not give a complete specification of what you mean, where does the information that "chopping off a nearby head to wipe windows with is unacceptable" come from. If the service generator notices that cars need cleaning and build the service by itself, you have an AGI by another name.

Obviously, if you have large amounts of training data made by humans with joysticks, and the robot is sampling from the same distribution, then you should be fine. This system learns that dirtier windshields need more wiping from 100's of examples of humans doing that, it doesn't chop off any heads because the humans didn't.

However, if you want the robot to display remotely novel behavior, then the distance between the training data and the new good solutions, becomes as large as the distance from the training data to bad solutions. If it's smart enough to go to the shops and buy a sponge, without having that strategy hardcoded in when it was built, then its smart enough to break into your neighbors house and nick a sponge.

The only thing that distinguishes one from the other is what humans prefer.

Distinguishing low impact from high impact is also hard.

This might be a good approach, but I don't feel it answers the question "I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?" So far, CAIS looks confused.

Replies from: ESRogs, rohinmshah
comment by ESRogs · 2019-01-09T20:48:53.422Z · LW(p) · GW(p)

It seems like the important thing is how bounded the task is.

For example, in the case of Go, if you just kept training AlphaZero, would you expect it to eventually decide that it needs to break out into the physical world to get more computing power?

It seems to me that it could get to be ultra-super-human at Go without that happening. (Even if there is some theoretical threshold where, with enough computation, it couldn't help but stumble upon a sequence of moves that causes the program to crash. It seems to me that you're likely to get crashing behavior long before you get hack-out-of-the-vm behavior, and the threshold for either may be too high to matter.)

If that's true for Go, then the questions are:

1. How much less bounded of a task can you train a system to do while maintaining the focused-on-the-task property?

and

2. How general of a system can you make by composing such focused systems together?

comment by Rohin Shah (rohinmshah) · 2019-01-09T23:21:11.540Z · LW(p) · GW(p)
Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

Note that under the CAIS worldview, in order to be competent in some domain you need to have some experience in that domain (i.e. competence requires learning). Or at least, that's the worldview under which I find CAIS most compelling. In that case, the AI would have had to try breaking out of the box a few times in order to get good at it, and why would it do that? Even if it ever hit upon this plan, whenever it tried it for the first time it would get a gradient pushing that behavior away, since it didn't help with achieving the goal. Only after significant learning would it be able to execute these weird plans in a way that they actually succeed and help achieve the goal, and that significant learning will not happen.

The only thing that distinguishes one from the other is what humans prefer.

CAIS would definitely use human preference information, see eg. section 22.

This might be a good approach, but I don't feel it answers the question "I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?"

It's not really an approach to AI safety, it's mostly meant to be a different prediction about how we achieve superintelligence. (There are definitely some prescriptive aspects of CAIS, and some arguments that it is safer than AGI agents, but mostly it is meant to be descriptive, I believe.)

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2019-01-10T18:12:10.211Z · LW(p) · GW(p)

Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don't get stuck in local optima as much as current RL algorithms.

AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.

Current reinforcement learning algorithms aren't very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit "break out the boxish") In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.

I agree that most of the really dangerous break out the boxes probably can't be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)

I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.

I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.

What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI.

I'm also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-10T20:14:20.183Z · LW(p) · GW(p)
AIXI would be very good at making complex plans and doing well first time.

Agreed, I claim we have no clue at how to make anything remotely like AIXI in the real world.

Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully.

Agreed, in a CAIS world, the system of interacting services would probably notice the plan but not execute it because of some service that is meant to prevent it from doing crazy things that humans would not want.

What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI.

This definitely seems like the crux for many people. I'm quite unsure about this point; it seems plausible to me that CAIS could in fact do most things such that there aren't very large incentives, especially if the Factored Cognition [AF · GW] hypothesis is true.

I'm also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms.

I don't see why it would have to be little tweaks to existing algorithms, it seems plausible to have the R&D services consider entirely new algorithms as well.

comment by Eli Tyre (elityre) · 2019-01-08T07:24:21.648Z · LW(p) · GW(p)

As a note, I belive that FHI is planning to publish a(n edited?) version of this document as an actual book ala Superintelligence: Paths, Dangers, Strategies.

Replies from: ESRogs
comment by ESRogs · 2019-01-08T20:09:45.960Z · LW(p) · GW(p)

After reading the post and some of these comments (including this one) it was unclear to me whether FHI had actually intended to make this public yet.

It seems that in fact they have: https://www.fhi.ox.ac.uk/reframing/

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-09T02:25:38.740Z · LW(p) · GW(p)

It's linked in the first sentence of the post. Though I guess I link to the pdf instead of the web page.

I tried to make this a link post, but I got an error message saying that it has already been linked before.

Replies from: ESRogs, jimrandomh
comment by ESRogs · 2019-01-09T04:34:01.013Z · LW(p) · GW(p)

Yeah, saw the link, but since it was direct to the pdf, wasn't sure if there'd been an announcement or anything like that.

(Perhaps I should have enough trust in FHI that if a link is accessible then that's intentional. Not something you can count on in general though. :P)

comment by jimrandomh · 2019-01-09T02:39:13.408Z · LW(p) · GW(p)

The restriction on having multiple linkposts to the same URL is something we inherited from our framework (Vulcan), which doesn't particularly make sense for LW. We've taken it out, so you'll be able to make the linkpost after the next time we deploy an update (which will be later this week).

Replies from: habryka4
comment by habryka (habryka4) · 2019-01-10T19:19:45.583Z · LW(p) · GW(p)

I also just went in and appended some random URL parameters to the URL to avoid the duplication filter for now.

comment by Tobias_Baumann · 2019-01-08T17:05:13.115Z · LW(p) · GW(p)

Upvoted. I've long thought that Drexler's work is a valuable contribution to the debate that hasn't received enough attention so far, so it's great to see that this has now been published.

I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, merging of biological and artificial intelligence, brain-computer interfaces, etc.), I find it hard to justify a very high degree of confidence in Drexler's model in particular.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-08T17:23:52.598Z · LW(p) · GW(p)

That seems right. I would argue that CAIS is more likely than any particular one of the other scenarios that you listed, because it is primarily taking trends from the past and projecting them into the future, whereas most other scenarios require something qualitatively new -- eg. an AGI agent (before CAIS) would happen if we find the one true learning algorithm, ems require us to completely map out the brain in a way that we don't have any results for currently, even in simple cases like C. elegans. But CAIS is probably not more likely than a disjunction over all of those possible scenarios.

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2019-01-09T04:07:58.726Z · LW(p) · GW(p)

eg. an AGI agent (before CAIS) would happen if we find the one true learning algorithm

I think generality and goal-directedness are likely orthogonal attributes. A "one true learning algorithm" sounds very general, but a priori I don't expect it to be any more goal-directed than the comprehensive AI services idea outlined in this post. I suspect you can take each of your comprehensive AI services and swap out the specific algorithm you were using for a one true learning algorithm without making the result any more of an agent.

I'm thinking about it something like this:

  • Traditional view of superintelligent AI ("top-down"): A superintelligent AI is something that's really good at achieving arbitrary goals. We abstract away the details of its implementation and view it as a generic hyper-competent goal achievement process, with a wide array of actions & strategies at its disposal. This view potentially lets us do FAI research without having to contribute to AI progress or depend overmuch on any particular direction that AI capabilities development proceeds in.

  • CAIS ("bottom-up"): We have a collection of AI services. We can use these services to accomplish specific tasks, including maybe eventually generating additional services. Each service represents a specific algorithm that achieves superior performance along one or more dimensions in a narrow or broad range of circumstances. If we abstract away the details of how tasks are being accomplished, that may lead to an inaccurate view of the system's behavior. For example, our machine learning algorithms may get better and better at performing classification tasks... but we have to look into the details of how the algorithm works in order to figure out whether it will consider strategies for improving its classification ability such as "pwn all other servers in the cluster and order them to search the space of hyperparameters in parallel". Our classification systems have been getting better and better, and arguably also more general, without them considering strategies like the pwnage strategy, and it's plausible this trend will continue until the algorithms are superhuman in all domains. Indeed, this feels to me like a fundamental defining characteristic of superintelligence refers to... it refers to a specific bit of computer code that is able to learn better and faster, using fewer computational resources, than whatever algorithms the human brain uses.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-09T10:08:56.731Z · LW(p) · GW(p)
I suspect you can take each of your comprehensive AI services and swap out the specific algorithm you were using for a one true learning algorithm without making the result any more of an agent.

Mostly agreed, but if we find the one true learning algorithm, then CAIS is no longer on the development path towards AGI agents, and I would predict that someone builds an AGI agent in that world because it could have lots of economic benefits that have not already been captured by CAIS services.

Indeed, this feels to me like a fundamental defining characteristic of superintelligence refers to... it refers to a specific bit of computer code that is able to learn better and faster, using fewer computational resources, than whatever algorithms the human brain uses.

I actually see CAIS as an argument against this. I think we could get superintelligent services by having lots of specialization (unlike humans, who are mostly general and a little bit specialized for their jobs), by aggregating learning across many actors (whereas humans can't learn from other humans' experience), by making models much larger and with much more compute (whereas humans are limited by brain size). Humans could still outperform AI services on things like power usage, sample efficiency, compute requirements, etc. while still having lots of AI services that can perform nearly any task at a superhuman level.

comment by Kaj_Sotala · 2020-12-13T21:48:50.619Z · LW(p) · GW(p)

Seconding Neel Nanda's nomination [LW(p) · GW(p)].

comment by ryan_b · 2019-02-15T16:07:54.205Z · LW(p) · GW(p)

There is a discussion at OvercomingBias of this work now.

comment by ESRogs · 2019-01-08T20:47:43.636Z · LW(p) · GW(p)
The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D.

This conclusion seems similar to the one Paul arrives at here:

In the slow takeoff scenario, pre-AGI systems have a transformative impact that’s only slightly smaller than AGI.

(See also this post from AI Impacts.)

comment by Neel Nanda (neel-nanda-1) · 2020-12-05T09:31:29.138Z · LW(p) · GW(p)

CAIS is a very different take on what transformative AI might look like than the ones I find most intuitive. I think it's really useful to experience a range of different perspectives to break me out of my cached thoughts.

And I'm grateful to Rohin for writing up this summary! I think this kind of thing is a valuable service for spreading these ideas to more people, who don't want to read a 200 page document.

comment by habryka (habryka4) · 2021-01-07T05:15:15.103Z · LW(p) · GW(p)

I think the CAIS framing that Eric Drexler proposed gave concrete shape to a set of intuitions that many people have been relying on for their thinking about AGI. I also tend to think that those intuitions and models aren't actually very good at modeling AGI, but I nevertheless think it productively moved the discourse forward a good bit. 

In particular I am very grateful about the comment thread between Wei Dai and Rohin, which really helped me engage with the CAIS ideas, and I think were necessary to get me to my current understanding of CAIS and to pass the basic ITT of CAIS (which I think I have succeeded in in a few conversations I've had since the report came out). 

An additional reference that has not been brought up in the comments or the post is Gwern's writing on this, under the heading: "Why Tool AIs Want to Be Agent AIs" 

comment by ryan_b · 2019-02-15T16:07:02.818Z · LW(p) · GW(p)

I see a few criticisms about how this doesn't really solve the problem, it only delays it because we expect a unified agent to outperform the combined services.

It seems to me on the basis of that criticism that this is worth driving as a commercial template anyway. Every R&D dollar that goes into a bounded service is one that doesn't drive specifically for an unbounded agent; every PhD doing development an individual service is not doing development on a unified agent.

We're currently still in the regime where first mover advantage is overwhelming; if CAIS were in place rather than win all the marbles immediately they would win all the marbles eventually and so the incentives are reduced. I expect this approach to extend the runway we have for nailing down the safety questions before a unified agent takes off.

I suppose the delaying action could backfire by reducing funding for safety, and also potentially by simplifying the problem of a unified AGI to bootstrapping from a superintelligent CAIS coordinator. Is there any difference between the superintelligent CAIS coordinator and the AGI in terms of alignment?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-02-15T22:20:18.607Z · LW(p) · GW(p)
I see a few criticisms about how this doesn't really solve the problem, it only delays it because we expect a unified agent to outperform the combined services.

Not sure if you're talking about me, but I suspect that my criticism could be read that way. Just want to clarify that I do think "we expect a unified agent to outperform the combined services" but I don't think this means we shouldn't pursue CAIS. That strategic question seems hard and I don't have a strong opinion on it.

Replies from: ryan_b
comment by ryan_b · 2019-02-19T15:47:18.440Z · LW(p) · GW(p)

You were one of them, but not the only one. I thought it was worth pointing the strategic question out specifically, because we have only recently had enough plausible alternatives for there to even be such a question.

Granted, the lack of options makes me feel a bit like anime-guy-looks-at-butterfly for alternatives. I agree the strategic question is hard.

comment by Gordon Seidoh Worley (gworley) · 2019-01-09T02:45:32.451Z · LW(p) · GW(p)

What excites me most about Eric's position since I first learned of it is that it provides a framework for safer AI systems that we might otherwise build if we were trying to target AGI. From this perspective it's valuable for setting policy and missions for AI-focused endeavors in such a way that we potentially delay the creation of AGI.

Although it might be argued that this is inevitable (last time I talked to Eric this was the impression that I got; he felt he was laying out some ideas that would happen anyway and was taking the time to explain why he thinks they will happen that way, rather than trying to nudge us towards a path), having it codified and publicized as a best course of action, it may serve on the margin to more encourage folks to work in a paradigm of doing AI develop with an eye towards incorporation in CAIS rather than as a stepping stone towards AGI. This is important because it will apply optimization pressure to ignore adding the things AGI would need since those may take extra time and cost, and if most of the short and medium term economic and academic benefits can be realized within the CAIS paradigm, then we will see a shift towards optimizing more for CAIS and less for AGI, which seems broadly beneficial from a safety standpoint because CAIS is less integrated and less agentic by design (at least for now; that might be a path from CAIS to AGI). Having this be common knowledge and the accepted paradigm of AI research is thus beneficial for pushing people away from incentive gradients that more directly lead to AGI, buying time for more safety research.

Given this, it's probably worthwhile for folks well positioned to influence other researchers to be made better aware of this work, which might be something folks here can do if they have the ears of those people (or just are those people).

comment by avturchin · 2019-01-08T18:00:50.076Z · LW(p) · GW(p)

My main objection to this idea is that it is a local solution, and doesn't have built-in mechanisms to become global AI safety solution, that is, to prevent other AIs creation, which could be agential superintelligences. One can try to make "AI police" as a service, but it could be less effective than agential police.

Another objection is probably Gwern's idea that any Tool AI "wants" to become agential AI.

This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.

Replies from: rohinmshah, rhaps0dy, Wei_Dai
comment by Rohin Shah (rohinmshah) · 2019-01-09T02:11:14.240Z · LW(p) · GW(p)

If by agent we mean "system that takes actions in the real world", then services can be agents. As I understand it, Eric is only arguing against monolithic AGI agents that are optimizing a long-term utility function and that can learn/perform any task.

Current factory robots definitely look like a service, and even the soon-to-come robots-trained-with-deep-RL will be services. They execute particular learned behaviors.

If I remember correctly, Gwern's argument is basically that Agent AI will outcompete Tool AI because Agent AI can optimize things that Tool AI cannot, such as its own cognition. In the CAIS world, there are separate services that improve cognition, and so the CAIS services do get the benefit of ever-improving cognition, without being classical AGI agents. But overall I agree with this point (and disagree with Eric) because I expect there to be lots of gains to be had by removing the boundaries between services, at least where possible.

comment by Adrià Garriga-alonso (rhaps0dy) · 2019-01-09T01:00:18.101Z · LW(p) · GW(p)
This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.

Recursive self-improvement that makes the intelligence "super" quickly is what makes the misaligned utility actually dangerous, as opposed to dangerous like a, say, current day automatized assembly line.

A robot that self-improves would need to have the capacity to control its actuators and also to self-improve. Since none of these capabilities directly depends on the other, each time one of them improves, the improvement is much more likely to be first demonstrated independently of an improvement in the other one.

Thus we're likely to already have some experience with self-improving AI, or the recursively improved AI to help us, when we get to dealing with people wanting to build self-improving robots. Even though with advanced AI in hand to help we should maybe still start early on that, it seems more important to get the not-necessarily-and-also-probably-not-robotic AI right.

Replies from: avturchin
comment by avturchin · 2019-01-09T09:00:34.821Z · LW(p) · GW(p)

I meant not that the "robot will self-improve", but that the research in robotics will create AIs which are agential and adapted to act in the real world. Such AIs may start to self-improve later and without robotic body.

comment by Wei Dai (Wei_Dai) · 2019-01-08T18:58:38.902Z · LW(p) · GW(p)

One can try to make “AI police” as a service, but it could be less effective than agential police.

This seems likely to me as well, especially since "service" is by definition bounded and agent is not.

Replies from: rohinmshah, litvand
comment by Rohin Shah (rohinmshah) · 2019-01-09T02:18:03.032Z · LW(p) · GW(p)

Monitoring surveillance in order to see if anyone is breaking rules seems to be quite a bounded task, and in fact is one that we are already in the process of automating (using our current AI systems, which are basically all bounded).

Of course, there are lots of other tasks that are not as clear. But to the extent that you believe the Factored Cognition hypothesis [? · GW], you should believe that we can make bounded services that nevertheless do a very good job.

Replies from: Wei_Dai
comment by Wei Dai (Wei_Dai) · 2019-01-09T07:39:41.419Z · LW(p) · GW(p)

Monitoring surveillance in order to see if anyone is breaking rules seems to be quite a bounded task, and in fact is one that we are already in the process of automating (using our current AI systems, which are basically all bounded).

That seems true, but if this surveillance monitoring isn't 100% effective, won't you still need an agential police to deal with any threats that manage to evade the surveillance? Or do you buy Eric's argument [LW(p) · GW(p)] that we can use a period of "unopposed preparation" to make sure that the defense, even though it's bounded, is still much more capable than any agential threat it might face?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-09T10:23:38.062Z · LW(p) · GW(p)

Sorry, when I said "there are lots of other tasks that are not as clear", I meant that there are a lot of other tasks relevant to policing and security that are not as clear, such as police to deal with threats that evade surveillance. I think the optimism here comes from our ability to decompose tasks, such that we can take a task that seems to require goal-directed agency (like "be the police") and turn it into a bunch of subtasks that no longer look agential.

comment by litvand · 2022-06-02T17:09:18.074Z · LW(p) · GW(p)

I agree that in the long term, agent AI could probably improve faster than CAIS, but I think CAIS could still be a solution.

Regardless of how it is aligned, aligned AI will tend to improve slower than unaligned AI, because it is trying to achieve a more complicated goal, human oversight takes time, etc. To prevent unaligned AI, aligned AI will need a head start, so it can stop any unaligned AI while it's still much weaker. I don't think CAIS is fundamentally different in that respect.

If the reasoning in the post that CAIS will develop before AGI holds up, then CAIS would actually have an advantage, because it would be easier to get a head start.

comment by Mitchell_Porter · 2019-01-09T02:23:03.667Z · LW(p) · GW(p)

So what is he saying? We never need to solve the problem of designing a human-friendly superintelligent agent?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-09T09:59:08.672Z · LW(p) · GW(p)

I don't think he'd make a strong claim about that, but I wouldn't be surprised if he assigned that possibility significant credence. I assign that possibility relatively low credence. I assign much more credence to the position that we'll never need to solve the problem of designing a human-friendly superintelligent goal-directed agent.

comment by Charlie Steiner · 2019-01-09T03:46:41.535Z · LW(p) · GW(p)

Thanks for the summary! I agree that this is missing some extra consideration for programs that are planning / searching at test time. We normally think of Google Maps as non-agenty, "tool-like," "task-directed," etc, but it's performing a search for the best route from A to B, and capable of planning to overcome obstacles - as long as those obstacles are within the ontology of its map of ways from A to B.

A thermostat is dumber than Google Maps, but its data is more closely connected to the real world (local temperature rather than general map), and its output is too (directly controlling a heater rather than displaying directions). If we made a "Google Thermostat Maps" website that let you input your thermostat's state, and showed you a heater control value, it would perform the same computations as your thermostat but lose its apparent agency. The condition for us treating the thermostat like an agent isn't just what computation it's doing, it's that its input, search (such as it is), and output ontologies match and extend into the real world well enough that even very simple computation can produce behavior suitable for the intentional stance.

comment by PeterMcCluskey · 2019-01-08T19:58:27.064Z · LW(p) · GW(p)

I consider it important to further clarify the notion of a bounded utility function.

A deployed neural network has a utility function that can be described as outputting a description of the patterns it sees in its most recent input, according to whatever algorithm it's been trained to apply. It's pretty clear to any expert that the neural network doesn't care about anything beyond a specific set of numbers that it outputs.

A neural network that is in the process of being trained is slightly harder to analyze, but essentially the same. It cares about generating an algorithm that will be used in a deployed neural network. At any one training step, it is focused solely on applying fixed algorithms to produce improvements to the deployable algorithm. It has no concept that would lead it to look beyond its immediate task of incremental improvements to that deployable algorithm.

And in some important sense, those steps are the main ways in which AI gets used to produce cars that have superhuman driving ability, and the designers can prove (at least to themselves) that the cars won't go out and buy more processing power, or forage for more energy.

Many forms of AI will be more complex than neural networks (e.g. they might be a mix of RL and neural networks), and I don't have the expertise to extend this analysis to those systems. I'm confident that it's possible in principle to get general-purpose superhuman AIs using only this kind of bounded utility function, but I'm uncertain how practical that is compared to a more unified agent with a broader utility function.

Replies from: ESRogs
comment by ESRogs · 2019-01-08T20:28:55.488Z · LW(p) · GW(p)

To clarify, when you say "bounded utility function" you mean that it's only defined over a fixed set of inputs, right?

(As opposed to meaning that the output of the function is never infinite, as in this post [LW · GW], which is what I first think of when I hear "bounded utility function". In other words, I expected bounded utility to refer to the range of the function, but you seem to be referring to the domain. Not sure which is more standard, but thought it worth calling out for other readers who may be confused.)

Replies from: rohinmshah, PeterMcCluskey
comment by Rohin Shah (rohinmshah) · 2019-01-09T02:23:48.116Z · LW(p) · GW(p)

It sounds like he's talking about services. From the post:

A service is an AI system that delivers bounded results for some task using bounded resources in bounded time.
comment by PeterMcCluskey · 2019-01-08T22:12:50.603Z · LW(p) · GW(p)

I'm not talking about the range. Domain seems possibly right, but not as informative as I'd like. I'm talking about what parts of spacetime it cares about, and saying that it only cares about specific outputs of a specific process. Drexler refers to this as "bounded scope and duration". Note that this will normally be an implicit utility function, that we infer from our understanding of the system.

"bounded utility function" is definitely not an ideal way of referring to this.

comment by atlas · 2019-01-08T11:13:35.230Z · LW(p) · GW(p)
You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process [...]

Does this assume that we'll be able to build generally intelligent systems (e.g. the service-creating-service) that optimize for a bounded task?

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2019-01-08T16:33:36.651Z · LW(p) · GW(p)

Depends what you mean by "generally intelligent". Any individual service could certainly have deep and broad knowledge about the world (as with eg. a language translation service), but no service will be able to do all tasks (eg. the service-creating-service is not going to be able to edit genomes, except by creating a new service that learns how to edit genomes).

With that caveat, yes, this assumes that we'll be able to build services that optimize for bounded tasks. But this is meant more as a description of how existing AI systems already work. Current RL agents are best modeled as optimizing for maximizing reward obtained for the current episode. (This isn't exactly right, because the value function is trying to capture the reward that can be obtained in the future, but in practice this doesn't make much of a difference.)