Reframing Superintelligence: Comprehensive AI Services as General Intelligence

post by rohinmshah · 2019-01-08T07:12:29.534Z · score: 83 (28 votes) · LW · GW · 55 comments

This is a link post for https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf?asd=sa

Contents

Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy.

The Model

The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D) processes. AI researchers consider a problem, define a search space, formulate an objective, and use an optimization technique in order to obtain an AI system, called a service, that performs the task.

A service is an AI system that delivers bounded results for some task using bounded resources in bounded time. Superintelligent language translation would count as a service, even though it requires a very detailed understanding of the world, including engineering, history, science, etc. Episodic RL agents also count as services.

While each of the AI R&D subtasks is currently performed by a human, as AI progresses we should expect that we will automate these tasks as well. At that point, we will have automated R&D, leading to recursive technological improvement. This is not recursive self-improvement, because the improvement comes from R&D services creating improvements in basic AI building blocks, and those improvements feed back into the R&D services. All of this should happen before we get any powerful AGI agents that can do arbitrary general reasoning.

Why Comprehensive?

Since services are focused on particular tasks, you might think that they aren't general intelligence, since there would be some tasks for which there is no service. However, pretty much everything we do can be thought of as a task -- including the task of creating a new service. When we have a new task that we would like automated, our service-creating-service can create a new service for that task, perhaps by training a new AI system, or by taking a bunch of existing services and putting them together, etc. In this way, the collection of services can perform any task, and so as an aggregate is generally intelligent. As a result, we can call this Comprehensive AI Services, or CAIS. The "Comprehensive" in CAIS is the analog of the "General" in AGI. So, we'll have the capabilities of an AGI agent, before we can actually make a monolithic AGI agent.

Isn't this just as dangerous as AGI?

You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process, and so it will not have any of the standard convergent instrumental subgoals (unless the subgoals are helpful for the task before reaching the bound).

In addition, all of the optimization pressure on the service is pushing it towards a particular narrow task. This sort of strong optimization tends to focus behavior. Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task. Think of how a racecar is optimized for speed, while a bus is optimized for carrying passengers, rather than having a "generally capable vehicle".

It's also worth noting what we mean by superintelligent here. In this case, we mean that the service is extremely competent at its assigned task. It need not be learning at all. We see this distinction with RL agents -- when they are trained using something like PPO, they are learning, but at test time you can simply execute them without any PPO and they will perform the behavior they previously learned and won't change that behavior at all.

(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and online learning settings, or even with vanilla RL if the learned policy has access to external memory and can implement a planning process separately from the training procedure.)

On a different note, you might argue that if we analyze the system of services as a whole, then it certainly looks generally intelligent, and so should be regarded as an AGI agent. However, "AGI agent" usually carries the anthropomorphic connotation of VNM rationality / expected utility maximization / goal-directedness. While it seems possible and even likely that each individual service can be well-modeled as VNM rational (albeit with a bounded utility function), it is not the case that a system of VNM rational agents will itself look VNM rational -- in fact, game theory is all about how systems of rational agents have weird behavior.

In addition, there are several aspects of CAIS that make it more safe than a classic monolithic AGI agent. Under CAIS, each service interacts with other services via clearly defined channels of communication, so that the system is interpretable and transparent, even though each service may be opaque. We can reason about what information is present in the inputs to infer what the service could possibly know. We could also provide access to some capability through an external resource during training, so that the service doesn't develop that capability itself.

This interpretability allows us to monitor the service -- for example, we could look at which subservices it accesses in order to make sure it isn't doing anything crazy. But what if having a human in the loop leads to unacceptable delays? Well, this would only happen for deployed applications, where having a human in the loop seems expected, and should also be economically incentivized because it leads to better behavior. Basic AI R&D can continue to be improved autonomously without a human in the loop, so you could still see an intelligence explosion. Note that tactical tasks requiring quick reaction times probably would be delegated to AI services, but the important strategic decisions could still be left in human hands (assisted by AI services, of course).

What happens when we create AGI?

Well, it might not be valuable to create an AGI. We want to perform many different tasks, and it makes sense for these to be done by diverse services. It would not be competitive to include all capabilities in a single monolithic agent. This is analogous to how specialization of labor is a good idea for us humans.

(My opinion: It seems like the lesson of deep learning is that if you can do something end-to-end, that will work better than a structured approach. This has happened with computer vision, natural language processing, and seems to be in the process of happening with robotics. So I don't buy this -- while it seems true that we will get CAIS before AGI since structured approaches tend to be available sooner and to work with less compute, I expect that a monolithic AGI agent would outperform CAIS at most tasks once we can make one.)

That said, if we ever do build AGI, we can leverage the services from our CAIS-world in order to make it safe. We could use superintelligent security services to constrain any AGI agent that we build. For example, we could have services trained to identify long-term planning processes and to perform adversarial testing and red teaming.

Safety in the CAIS world

While CAIS suggests that we will not have AGI agents, this does not mean that we automatically get safety. We will still have AI systems that take high impact actions, and if they take even one wrong action of this sort it could be catastrophic. One way this could happen is if the system of services starts to show agentic behavior -- our standard AI safety work could apply to this scenario.

In order to ensure safety, we should have AI safety researchers figure out and codify the best development practices that need to be followed. For example, we could try to always use predictive models of human (dis)approval as a sanity check on any plan that is being enacted. We could also train AI services that can adversarially check new services to make sure they are safe.

Summary

The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D. This reframes the problems of AI safety and has implications for what technical safety researchers should be doing.

55 comments

comment by Wei_Dai · 2019-01-08T17:23:05.321Z · score: 28 (9 votes) · LW · GW

This is one of the documents I was responding to when I wrote A general model of safety-oriented AI development [LW · GW], Three AI Safety Related Ideas [LW · GW], and Two Neglected Problems in Human-AI Safety [LW · GW]. (I didn't cite it because it was circulating semi-privately in draft form, and Eric apparently didn't want its existence to be publicly known.) I'm disappointed that although Eric wrote to me "I think that your two neglected problems are critically important", the perspectives in those posts didn't get incorporated more into the final document, which spends only 3 short paragraphs out of hundreds of pages to talk about what I think of as "human safety problems". (I think those paragraphs were in the draft even before I wrote my posts.)

I worry about the framing adopted in this document that the main problem in human-AI safety is "questions of what humans might choose to do with their capabilities", as opposed to my preferred framing of "how can we design human-AI systems to minimize total risk". (To be fair to Eric, a lot of other AI safety people also only talk about "misuse risk" and not about how AI is by default likely to exacerbate human safety problems, e.g., by causing rapid distributional shifts for humans.) I worry that this gives AI researchers and developers license to think, "I'm just developing an AI service. AI services will be comprehensive anyway so there's no reason for me to hold back or think more about what I'm doing. It's someone else's job to worry about what humans might choose to do with these capabilities."

comment by rohinmshah · 2019-01-08T17:38:17.991Z · score: 4 (2 votes) · LW · GW

I actually think the CAIS model gives me optimism for these sorts of problems. As long as we acknowledge that the problems exist and can be an issue, we could develop services that help us mitigate them. Safety in the CAIS world already depends on having services that are in charge of good engineering, testing, red teaming, monitoring, etc., as well as services that evaluate objectives and make sure humans would approve of them. It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc.

I'd be interested in a list of services that you think would be helpful for addressing human safety problems. You might think of this as "our best current guess at metaphilosophy and metaphilosophy research".

(I know you were mainly talking about the document's framing, I don't have much to say about that.)

comment by Wei_Dai · 2019-01-10T12:28:42.979Z · score: 10 (3 votes) · LW · GW

It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc.

Can you explain how you'd implement these services? Take "how disruptive new technologies will be" for example. I imagine you can't just apply ML given the paucity of training data and how difficult it would be to generalize from historical data to new technologies and new social situations. And it seems to me that if you base it on any kind of narrow AI technology, it would be easy to miss some of the novel implications/consequences of the new technologies and social situations and end up with a wrong answer. Maybe you could instead base it on a general purpose reasoner or question-answerer, but if something like that exists, AI would already have created a lot of new technologies that are risky for humans to face. Plus, the general purpose AI could replace a lot of discrete/narrow AI services, so I feel like we would already have moved past the CAIS world at that point. BTW, if the service is not just a thin wrapper on top of a general purpose AI which is generally trustworthy, I also don't know how you'd know whether you can trust the answers that it gives.

I’d be interested in a list of services that you think would be helpful for addressing human safety problems. You might think of this as “our best current guess at metaphilosophy and metaphilosophy research”.

I could try to think in that direction after I get a better sense of what kinds of services might be both feasible and trustworthy in the CAIS world. It seems easy to become too optimistic/complacent under the CAIS model if I just try to imagine what safety-enhancing services might be helpful without worrying about whether those services would be feasible or how well they'd work at the time when they're needed.

comment by rohinmshah · 2019-01-10T17:35:59.166Z · score: 10 (3 votes) · LW · GW
Can you explain how you'd implement these services?

Not really. I think of CAIS as suggesting that we take an outside view that says "looking at how AI has been progressing, and how humans generally do things, we'll probably be able to do more and more complex tasks as time goes on". But the emphasis that CAIS places is that the things we'll be able to do will be domain-specific tasks, rather than getting a general-purpose reasoner. I don't have a detailed enough inside view to say how complex tasks might be implemented in practice.

I agree with the rest of what you said, which feels to me like considering a few possible inside-view scenarios and showing that they don't work.

One way to think about this is through the lens of iterated amplification. With iterated amplification, we also get the property that our AI systems will be able to do more and more complex tasks as time goes on. The key piece that enables this is the ability to decompose problems, so that iterated amplification always bottoms out into a tree of questions and subquestions down to leaves which the base agent can answer. You could think of (my conception of) CAIS as a claim that a similar process will happen in a decentralized way for all of ML by default, and at any point the things we can do will look like an explicit iterated amplification deliberation tree of depth one or two, where the leaves are individual services and the top level question will be some task that is accomplished through a combination of individual services.

I could try to think in that direction after I get a better sense of what kinds of services might be both feasible and trustworthy in the CAIS world. It seems easy to become too optimistic/complacent under the CAIS model if I just try to imagine what safety-enhancing services might be helpful without worrying about whether those services would be feasible or how well they'd work at the time when they're needed.

Agreed, I'm making a bid for generating ideas without worrying about feasibility and trustworthiness, but not spending too much time on this and not taking the results too seriously.

comment by Wei_Dai · 2019-01-10T21:57:38.378Z · score: 10 (2 votes) · LW · GW

You could think of (my conception of) CAIS as a claim that a similar process will happen in a decentralized way for all of ML by default, and at any point the things we can do will look like an explicit iterated amplification deliberation tree of depth one or two, where the leaves are individual services and the top level question will be some task that is accomplished through a combination of individual services.

This seems like a sensible way of looking at things, and in this framing I'd say that my worry is that crucial safety-enhancing services may only appear fairly high in the overall tree of services, or outside the tree altogether (see also #3 in Three AI Safety Related Ideas which makes a similar point), and in the CAIS world it would be hard to limit access to the lower-level services (as a risk-reduction measure).

comment by rohinmshah · 2019-01-10T22:50:46.047Z · score: 2 (1 votes) · LW · GW

Yeah, that seems right, I don't think anyone is arguing against that claim.

comment by Wei_Dai · 2019-01-09T06:04:02.365Z · score: 22 (6 votes) · LW · GW

I have a problem with section 32, "Unaligned superintelligent agents need not threaten world stability". Here's the summary of that section from the paper:

  • Powerful SI-level capabilities can precede AGI agents.
  • SI-level capabilities could be applied to strengthen defensive stability.
  • Unopposed preparation enables strong defensive capabilities.
  • Strong defensive capabilities can constrain problematic agents.

So the key idea here seems to be that good actors will have a period of time to use superintelligent AI services to prepare some sort of ubiquitous defense that will constrain any subsequent AGI agents. But I don't understand where this period of "unopposed preparation" comes from. Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could? If they did that, then superintelligent AGI agents would arise nearly simultaneously with SI-level capabilities, and there would be no such period of unopposed preparation. In section 32.2, Eric only argues that SI-level capabilities can precede AGI agents. Since I think they wouldn't at least not by a significant margin, the whole argument seems to fall apart or has to be interpreted in a way that makes it strategically irrelevant.

Eric seems to think that no one would bother to create AGI because "AGI agents offer no compelling value", by which he means "Because general AI-development capabilities can provide stable, comprehensive AI services, there is no compelling, practical motivation for undertaking the more difficult and potentially risky implementation of self-modifying AGI agents." But if quickly building an AGI can potentially allow someone to take over the world before "unopposed preparation" can take place, isn't that a compelling motivation by itself for many people?

comment by rohinmshah · 2019-01-09T09:55:27.631Z · score: 6 (3 votes) · LW · GW
Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could?

Because any task that an AGI could do, CAIS could do as well. (Though I don't agree with this -- unified agents seem to work better.)

But if quickly building an AGI can potentially allow someone to take over the world before "unopposed preparation" can take place, isn't that a compelling motivation by itself for many people?

I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.

It may be the case that people try to take over the world just with CAIS, and maybe that could succeed. I think he's arguing only against AGI accident risk here, not against malicious uses of AI. (I think you already knew that, but it wasn't fully clear on reading your comment.)

comment by Wei_Dai · 2019-01-09T11:47:40.732Z · score: 9 (4 votes) · LW · GW

I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.

That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32. If that was his position, he could just talk about how ordinary policing and military defense would work in a CAIS world (i.e., against human adversaries wielding CAIS) and say that the same policing/defense would also work against AGI because AGI is not much more capable than CAIS.

Instead it seems clear that he thinks AGI requires special effort to defend against, which is made possible by a delay between SI-level CAIS and AGI, which he proposes that we use to do a very extensive "unopposed preparation". I've been trying to figure out why he thinks there will be such a delay and my current best guess is "Implementation of the AGI model is widely regarded as requiring conceptual breakthroughs." (page 75) which he repeats on page 77, "AGI (but not CAIS) calls for conceptual breakthroughs to enable both implementation and subsequent safe application." I don't understand why he thinks such conceptual breakthroughs will be required though. Why couldn't someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI? Do you get it? Or am I on the wrong track here?

comment by rohinmshah · 2019-01-09T23:09:16.450Z · score: 15 (4 votes) · LW · GW
Do you get it?

I doubt I will ever be able to confidently answer yes to that question.

That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32.

My model is that he does think AGI won't be much more capable than CAIS (see sections 12 and 13 in particular, and 10, 11 and 16 also touch on the topic), but lots of people (including me) kept making the argument that end-to-end training tends to improve performance and so AGI would outperform CAIS, and so he decided to write a response to that.

In general, my impression from talking to him and reading earlier drafts is that the earlier chapters are representative of his core models, while the later chapters are more like responses to particular arguments, or specific implications of those models.

I can give one positive argument for AGI being harder to make than SI-level CAIS. All of our current techniques for building AI systems create things that are bounded in the time horizon they are optimizing over. It's actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the "conceptual breakthroughs" point.) Seems a lot easier to get a bunch of bounded services and hook them up together in such a way that they can do the sorts of things that AGI agents could do.

The one scenario that is both concrete and somewhat plausible to me is that we run powerful deep RL on a very complex environment, and this finds an agent that does very-long-term-planning, because that's what it takes to do well on the environment. I don't know what Eric thinks about this scenario, but it doesn't seem to influence his thinking very much (and in fact in the OP I argued that CAIS isn't engaging enough with this scenario).

Why couldn't someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI?

If you take a bunch of a bounded services and connect them together in some straightforward way, you wouldn't get something that is optimizing over the long term. Where did the long term optimization come from?

For example, you could take any long term task and break it down into the "plan maker" which thinks for an hour and gives a plan for the task, and the "plan executor" which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn't have convergent instrumental subgoals.

comment by Wei_Dai · 2019-01-10T00:19:33.817Z · score: 6 (3 votes) · LW · GW

Thanks, I think this is helpful for me to understand Eric's model better, but I'm still pretty confused.

It’s actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the “conceptual breakthroughs” point.)

But it's quite unclear how to use current techniques to do a lot of things. Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS? (Given your disagreement with Eric on this, I guess this is more a question for him than for you.)

Where did the long term optimization come from?

I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.

For example, you could take any long term task and break it down into the “plan maker” which thinks for an hour and gives a plan for the task, and the “plan executor” which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn’t have convergent instrumental subgoals.

I don't see why it wouldn't, unless these services are specifically designed to be corrigible (in which case the "corrigible" part seems much more important than the "service" part). For example, suppose you asked the plan maker to create a plan to cure cancer. Why would the mere fact that it's a bounded service prevent it from coming up with a plan that involves causing human extinction (and a bunch of convergent instrumental subgoals like deceiving humans who might stop it)? (If there was a human in the loop, then you could look at the plan and reject it, but I'm imagining that someone, in order to build an AGI as quickly and efficiently as possible, stripped off the "optimize for human consumption" part of the strategic planner and instead optimized it to produce plans for direct machine consumption.)

comment by rohinmshah · 2019-01-10T17:22:37.130Z · score: 7 (4 votes) · LW · GW
Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS?

I think I share Eric's intuition that this problem is hard in a more fundamental way than other things, but I don't really know why I have this intuition. Some potential generators:

  • ML systems seem to be really good at learning tasks, but really bad at learning explicit reasoning. I think of CAIS as being on the side of "we never figure out explicit reasoning at the level that humans do it", and making up for this deficit by having good simulators that allow us to learn from experience, or by collecting much more data across multiple instances of AI systems, or by trying out many different AI designs and choosing the one which performs best.
  • It seems like humans tend to build systems by making individual parts that we can understand and predict well, and putting those together in a way where we can make some guarantees/predictions about what will happen. CAIS plays to this strength, whereas "figure out how to do very-long-term-planning" doesn't.
I don't see why it wouldn't, unless these services are specifically designed to be corrigible (in which case the "corrigible" part seems much more important than the "service" part).

Yeah, you're right, I definitely said the wrong thing there. I guess the difference is that the convergent instrumental subgoals are now "one level up" -- they aren't subgoals of the AI service itself, they're subgoals of the plan that was created by the AI service. It feels like this is qualitatively different and easier to address, but I can't really say why. More generators:

  • In this setting, convergent instrumental subgoals happen only if the plan-making service is told to maximize outcomes. However, since it's one level up, it should be easier to ask for something that says something more like "do X, interpreted pragmatically and not literally".
  • Things that happen one level up in the CAIS world are easier to point at and more interpretable, so it should be easier to find and fix issues of this sort.

(You could of course say "just because it's easier that doesn't mean people will do it", but I could imagine that if its easy enough this becomes best practice and people do it by default, and you don't actually gain very much by taking these parts out.)

I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.

Yeah, here also what I should have said is that the long term optimization is happening one level up, whereas with the typical AGI agent scenario it feels like the long term optimization needs to happen at the base level, and that's the thing we don't know how to do.

comment by Wei_Dai · 2019-01-11T05:11:39.003Z · score: 6 (3 votes) · LW · GW

Unfortunately, I only vaguely understand the points that you're trying to make in this comment... Would it be fair to just say at this point that this is an important crux that Eric failed to convincingly argue for?

comment by rohinmshah · 2019-01-11T17:00:12.975Z · score: 4 (2 votes) · LW · GW

I agree that it's an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric's position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.

comment by Wei_Dai · 2019-01-11T17:36:13.168Z · score: 5 (2 votes) · LW · GW

Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about "AGI won’t be much more capable than CAIS" and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)

comment by rohinmshah · 2019-01-12T00:27:57.877Z · score: 4 (2 votes) · LW · GW

I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives.

Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn't really argue for that.

Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn't give you AGI.

Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.)

Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.)

Section 17: Most complex tasks involve several different subtasks that don't interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services.

Section 38: Division of labor + specialization are useful for good performance.

comment by Wei_Dai · 2019-01-13T06:29:50.135Z · score: 5 (2 votes) · LW · GW

Most of these sections seem to only contain arguments that AGI won't come earlier than CAIS, but not that it would come later than CAIS. In other words, they don't argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn't simpler than CAIS, and not that it's more complex, except for this paragraph in the summary, Section 13.5:

To summarize, in each of the areas outlined above, the classic AGI model both obscures and increases complexity: In order for general learning and capabilities to fit a classic AGI model, they must not only exist, but must be integrated into a single, autonomous, self-modifying agent. Further, achieving this kind of integration would increase, not reduce, the challenges of aligning AI behaviors with human goals: These challenges become more difficult when the goals of a single agent must motivate all (and only) useful tasks.

So putting alignment aside (I'm assuming that someone would be willing to build an unaligned AGI if it's easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is "must be integrated into a single, autonomous, self-modifying agent", but why should this integration add a non-negligible amount of complexity? Why can't someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn't seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?

comment by rohinmshah · 2019-01-17T18:19:25.133Z · score: 2 (1 votes) · LW · GW
Why can't someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed?

I think Eric would not call that an AGI agent.

Setting aside what Eric thinks and talking about what I think: There is one conception of "AGI risk" where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is "pointed at" a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself.

With the system you described, there is no end-to-end training, and it doesn't seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me.

I think that Eric's conception of "AGI agent" is like the first thing I described. I agree that this is not what everyone means by "AGI", and it is particularly not the thing you mean by "AGI".

You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add -- such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can't look at things like "generated plans", and you can't check that you have the right goal because the Eric-AGI-agent will not let you change its goal.)

With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like "be as accurate as possible", which will lead to perverse behavior like manipulating humans so that what they approve is easier to predict. In CAIS, this doesn't happen, because the approval model is not searching over possibilities that involve manipulating humans.

comment by PeterMcCluskey · 2019-01-08T20:26:46.727Z · score: 10 (4 votes) · LW · GW

I want to draw separate attention to chapter 40 of Drexler's paper, which uses what looks like a novel approach to argue that current supercomputers likely have more raw processing power than a human brain. I find that scary.

comment by ESRogs · 2019-01-08T21:16:21.659Z · score: 10 (2 votes) · LW · GW

From the conclusion of that section:

Many modern AI tasks, although narrow, are comparable to narrow capacities of neural systems in the human brain. Given an empirical value for the fraction of computational resources required to perform that task with humanlike throughput on a 1 PFLOP/s machine, and an inherently uncertain and ambiguous—yet bounded—estimate of the fraction of brain resources required to perform “the equivalent” of that machine task, we can estimate the ratio of PFLOP/s machine capacity to brain capacity. What are in the author’s judgment plausible estimates for each task are consistent in suggesting that this ratio is ~10 or more. Machine learning and human learning differ in their relationship to costs, but even large machine learning costs can be amortized over an indefinitely large number of task-performing systems and application events.
In light of these considerations, we should expect that substantially superhuman computational capacity will accompany the eventual emergence of a software with broad functional competencies. On present evidence, scenarios that assume otherwise seem unlikely.

I'm not completely sure I'm understanding the first paragraph correctly.

With the bit about "this ratio is ~10 or more" it sounds like he's saying roughly that, "When we use modern AI systems to complete tasks that humans also do, it appears to take 10+ PFLOP/s per human brain."

(Or, since you're not using your whole brain for a given task, maybe a better translation is, "If a task uses 10% of your brain, then a modern AI system will need to use 1+ PFLOP/s to achieve human level performance.")

Does that match other readers' interpretations?

comment by rhaps0dy · 2019-01-09T00:17:32.587Z · score: 5 (3 votes) · LW · GW

Yes, though I'm fairly sure he's talking about using trained neural networks to e.g. classify an image, which is known to be fairly cheap, rather than training them. In other words, he's talking about using an AI service rather than creating one.

He also says that "Machine learning and human learning differ in their relationship to costs" which is also evidence for my interpretation: training is expensive, testing on one example is very cheap.

comment by elityre · 2019-01-08T07:24:21.648Z · score: 10 (6 votes) · LW · GW

As a note, I belive that FHI is planning to publish a(n edited?) version of this document as an actual book ala Superintelligence: Paths, Dangers, Strategies.

comment by ESRogs · 2019-01-08T20:09:45.960Z · score: 5 (3 votes) · LW · GW

After reading the post and some of these comments (including this one) it was unclear to me whether FHI had actually intended to make this public yet.

It seems that in fact they have: https://www.fhi.ox.ac.uk/reframing/

comment by rohinmshah · 2019-01-09T02:25:38.740Z · score: 4 (2 votes) · LW · GW

It's linked in the first sentence of the post. Though I guess I link to the pdf instead of the web page.

I tried to make this a link post, but I got an error message saying that it has already been linked before.

comment by ESRogs · 2019-01-09T04:34:01.013Z · score: 4 (2 votes) · LW · GW

Yeah, saw the link, but since it was direct to the pdf, wasn't sure if there'd been an announcement or anything like that.

(Perhaps I should have enough trust in FHI that if a link is accessible then that's intentional. Not something you can count on in general though. :P)

comment by jimrandomh · 2019-01-09T02:39:13.408Z · score: 2 (1 votes) · LW · GW

The restriction on having multiple linkposts to the same URL is something we inherited from our framework (Vulcan), which doesn't particularly make sense for LW. We've taken it out, so you'll be able to make the linkpost after the next time we deploy an update (which will be later this week).

comment by habryka (habryka4) · 2019-01-10T19:19:45.583Z · score: 2 (1 votes) · LW · GW

I also just went in and appended some random URL parameters to the URL to avoid the duplication filter for now.

comment by Tobias_Baumann · 2019-01-08T17:05:13.115Z · score: 9 (5 votes) · LW · GW

Upvoted. I've long thought that Drexler's work is a valuable contribution to the debate that hasn't received enough attention so far, so it's great to see that this has now been published.

I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, merging of biological and artificial intelligence, brain-computer interfaces, etc.), I find it hard to justify a very high degree of confidence in Drexler's model in particular.

comment by rohinmshah · 2019-01-08T17:23:52.598Z · score: 7 (4 votes) · LW · GW

That seems right. I would argue that CAIS is more likely than any particular one of the other scenarios that you listed, because it is primarily taking trends from the past and projecting them into the future, whereas most other scenarios require something qualitatively new -- eg. an AGI agent (before CAIS) would happen if we find the one true learning algorithm, ems require us to completely map out the brain in a way that we don't have any results for currently, even in simple cases like C. elegans. But CAIS is probably not more likely than a disjunction over all of those possible scenarios.

comment by John_Maxwell_IV · 2019-01-09T04:07:58.726Z · score: 3 (2 votes) · LW · GW

eg. an AGI agent (before CAIS) would happen if we find the one true learning algorithm

I think generality and goal-directedness are likely orthogonal attributes. A "one true learning algorithm" sounds very general, but a priori I don't expect it to be any more goal-directed than the comprehensive AI services idea outlined in this post. I suspect you can take each of your comprehensive AI services and swap out the specific algorithm you were using for a one true learning algorithm without making the result any more of an agent.

I'm thinking about it something like this:

  • Traditional view of superintelligent AI ("top-down"): A superintelligent AI is something that's really good at achieving arbitrary goals. We abstract away the details of its implementation and view it as a generic hyper-competent goal achievement process, with a wide array of actions & strategies at its disposal. This view potentially lets us do FAI research without having to contribute to AI progress or depend overmuch on any particular direction that AI capabilities development proceeds in.

  • CAIS ("bottom-up"): We have a collection of AI services. We can use these services to accomplish specific tasks, including maybe eventually generating additional services. Each service represents a specific algorithm that achieves superior performance along one or more dimensions in a narrow or broad range of circumstances. If we abstract away the details of how tasks are being accomplished, that may lead to an inaccurate view of the system's behavior. For example, our machine learning algorithms may get better and better at performing classification tasks... but we have to look into the details of how the algorithm works in order to figure out whether it will consider strategies for improving its classification ability such as "pwn all other servers in the cluster and order them to search the space of hyperparameters in parallel". Our classification systems have been getting better and better, and arguably also more general, without them considering strategies like the pwnage strategy, and it's plausible this trend will continue until the algorithms are superhuman in all domains. Indeed, this feels to me like a fundamental defining characteristic of superintelligence refers to... it refers to a specific bit of computer code that is able to learn better and faster, using fewer computational resources, than whatever algorithms the human brain uses.

comment by rohinmshah · 2019-01-09T10:08:56.731Z · score: 5 (3 votes) · LW · GW
I suspect you can take each of your comprehensive AI services and swap out the specific algorithm you were using for a one true learning algorithm without making the result any more of an agent.

Mostly agreed, but if we find the one true learning algorithm, then CAIS is no longer on the development path towards AGI agents, and I would predict that someone builds an AGI agent in that world because it could have lots of economic benefits that have not already been captured by CAIS services.

Indeed, this feels to me like a fundamental defining characteristic of superintelligence refers to... it refers to a specific bit of computer code that is able to learn better and faster, using fewer computational resources, than whatever algorithms the human brain uses.

I actually see CAIS as an argument against this. I think we could get superintelligent services by having lots of specialization (unlike humans, who are mostly general and a little bit specialized for their jobs), by aggregating learning across many actors (whereas humans can't learn from other humans' experience), by making models much larger and with much more compute (whereas humans are limited by brain size). Humans could still outperform AI services on things like power usage, sample efficiency, compute requirements, etc. while still having lots of AI services that can perform nearly any task at a superhuman level.

comment by Donald Hobson (donald-hobson) · 2019-01-09T16:08:22.239Z · score: 8 (3 votes) · LW · GW

I disagree outright with

Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.

Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

And the deeper reason for that is that we have no idea how to tell what's a hole.

Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow by blow formal description of what you mean by "cleans cars" then your "service generator" is just a compiler. If you do not give a complete specification of what you mean, where does the information that "chopping off a nearby head to wipe windows with is unacceptable" come from. If the service generator notices that cars need cleaning and build the service by itself, you have an AGI by another name.

Obviously, if you have large amounts of training data made by humans with joysticks, and the robot is sampling from the same distribution, then you should be fine. This system learns that dirtier windshields need more wiping from 100's of examples of humans doing that, it doesn't chop off any heads because the humans didn't.

However, if you want the robot to display remotely novel behavior, then the distance between the training data and the new good solutions, becomes as large as the distance from the training data to bad solutions. If it's smart enough to go to the shops and buy a sponge, without having that strategy hardcoded in when it was built, then its smart enough to break into your neighbors house and nick a sponge.

The only thing that distinguishes one from the other is what humans prefer.

Distinguishing low impact from high impact is also hard.

This might be a good approach, but I don't feel it answers the question "I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?" So far, CAIS looks confused.

comment by ESRogs · 2019-01-09T20:48:53.422Z · score: 6 (3 votes) · LW · GW

It seems like the important thing is how bounded the task is.

For example, in the case of Go, if you just kept training AlphaZero, would you expect it to eventually decide that it needs to break out into the physical world to get more computing power?

It seems to me that it could get to be ultra-super-human at Go without that happening. (Even if there is some theoretical threshold where, with enough computation, it couldn't help but stumble upon a sequence of moves that causes the program to crash. It seems to me that you're likely to get crashing behavior long before you get hack-out-of-the-vm behavior, and the threshold for either may be too high to matter.)

If that's true for Go, then the questions are:

1. How much less bounded of a task can you train a system to do while maintaining the focused-on-the-task property?

and

2. How general of a system can you make by composing such focused systems together?

comment by rohinmshah · 2019-01-09T23:21:11.540Z · score: 5 (3 votes) · LW · GW
Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

Note that under the CAIS worldview, in order to be competent in some domain you need to have some experience in that domain (i.e. competence requires learning). Or at least, that's the worldview under which I find CAIS most compelling. In that case, the AI would have had to try breaking out of the box a few times in order to get good at it, and why would it do that? Even if it ever hit upon this plan, whenever it tried it for the first time it would get a gradient pushing that behavior away, since it didn't help with achieving the goal. Only after significant learning would it be able to execute these weird plans in a way that they actually succeed and help achieve the goal, and that significant learning will not happen.

The only thing that distinguishes one from the other is what humans prefer.

CAIS would definitely use human preference information, see eg. section 22.

This might be a good approach, but I don't feel it answers the question "I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?"

It's not really an approach to AI safety, it's mostly meant to be a different prediction about how we achieve superintelligence. (There are definitely some prescriptive aspects of CAIS, and some arguments that it is safer than AGI agents, but mostly it is meant to be descriptive, I believe.)

comment by Donald Hobson (donald-hobson) · 2019-01-10T18:12:10.211Z · score: 3 (2 votes) · LW · GW

Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don't get stuck in local optima as much as current RL algorithms.

AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.

Current reinforcement learning algorithms aren't very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit "break out the boxish") In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.

I agree that most of the really dangerous break out the boxes probably can't be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)

I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.

I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.

What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI.

I'm also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.

comment by rohinmshah · 2019-01-10T20:14:20.183Z · score: 4 (2 votes) · LW · GW
AIXI would be very good at making complex plans and doing well first time.

Agreed, I claim we have no clue at how to make anything remotely like AIXI in the real world.

Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully.

Agreed, in a CAIS world, the system of interacting services would probably notice the plan but not execute it because of some service that is meant to prevent it from doing crazy things that humans would not want.

What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI.

This definitely seems like the crux for many people. I'm quite unsure about this point; it seems plausible to me that CAIS could in fact do most things such that there aren't very large incentives, especially if the Factored Cognition hypothesis is true.

I'm also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms.

I don't see why it would have to be little tweaks to existing algorithms, it seems plausible to have the R&D services consider entirely new algorithms as well.

comment by avturchin · 2019-01-08T18:00:50.076Z · score: 6 (3 votes) · LW · GW

My main objection to this idea is that it is a local solution, and doesn't have built-in mechanisms to become global AI safety solution, that is, to prevent other AIs creation, which could be agential superintelligences. One can try to make "AI police" as a service, but it could be less effective than agential police.

Another objection is probably Gwern's idea that any Tool AI "wants" to become agential AI.

This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.

comment by rohinmshah · 2019-01-09T02:11:14.240Z · score: 6 (3 votes) · LW · GW

If by agent we mean "system that takes actions in the real world", then services can be agents. As I understand it, Eric is only arguing against monolithic AGI agents that are optimizing a long-term utility function and that can learn/perform any task.

Current factory robots definitely look like a service, and even the soon-to-come robots-trained-with-deep-RL will be services. They execute particular learned behaviors.

If I remember correctly, Gwern's argument is basically that Agent AI will outcompete Tool AI because Agent AI can optimize things that Tool AI cannot, such as its own cognition. In the CAIS world, there are separate services that improve cognition, and so the CAIS services do get the benefit of ever-improving cognition, without being classical AGI agents. But overall I agree with this point (and disagree with Eric) because I expect there to be lots of gains to be had by removing the boundaries between services, at least where possible.

comment by Wei_Dai · 2019-01-08T18:58:38.902Z · score: 4 (2 votes) · LW · GW

One can try to make “AI police” as a service, but it could be less effective than agential police.

This seems likely to me as well, especially since "service" is by definition bounded and agent is not.

comment by rohinmshah · 2019-01-09T02:18:03.032Z · score: 2 (1 votes) · LW · GW

Monitoring surveillance in order to see if anyone is breaking rules seems to be quite a bounded task, and in fact is one that we are already in the process of automating (using our current AI systems, which are basically all bounded).

Of course, there are lots of other tasks that are not as clear. But to the extent that you believe the Factored Cognition hypothesis, you should believe that we can make bounded services that nevertheless do a very good job.

comment by Wei_Dai · 2019-01-09T07:39:41.419Z · score: 3 (1 votes) · LW · GW

Monitoring surveillance in order to see if anyone is breaking rules seems to be quite a bounded task, and in fact is one that we are already in the process of automating (using our current AI systems, which are basically all bounded).

That seems true, but if this surveillance monitoring isn't 100% effective, won't you still need an agential police to deal with any threats that manage to evade the surveillance? Or do you buy Eric's argument [LW · GW] that we can use a period of "unopposed preparation" to make sure that the defense, even though it's bounded, is still much more capable than any agential threat it might face?

comment by rohinmshah · 2019-01-09T10:23:38.062Z · score: 4 (2 votes) · LW · GW

Sorry, when I said "there are lots of other tasks that are not as clear", I meant that there are a lot of other tasks relevant to policing and security that are not as clear, such as police to deal with threats that evade surveillance. I think the optimism here comes from our ability to decompose tasks, such that we can take a task that seems to require goal-directed agency (like "be the police") and turn it into a bunch of subtasks that no longer look agential.

comment by rhaps0dy · 2019-01-09T01:00:18.101Z · score: 3 (2 votes) · LW · GW
This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.

Recursive self-improvement that makes the intelligence "super" quickly is what makes the misaligned utility actually dangerous, as opposed to dangerous like a, say, current day automatized assembly line.

A robot that self-improves would need to have the capacity to control its actuators and also to self-improve. Since none of these capabilities directly depends on the other, each time one of them improves, the improvement is much more likely to be first demonstrated independently of an improvement in the other one.

Thus we're likely to already have some experience with self-improving AI, or the recursively improved AI to help us, when we get to dealing with people wanting to build self-improving robots. Even though with advanced AI in hand to help we should maybe still start early on that, it seems more important to get the not-necessarily-and-also-probably-not-robotic AI right.

comment by avturchin · 2019-01-09T09:00:34.821Z · score: 1 (1 votes) · LW · GW

I meant not that the "robot will self-improve", but that the research in robotics will create AIs which are agential and adapted to act in the real world. Such AIs may start to self-improve later and without robotic body.

comment by ESRogs · 2019-01-08T20:47:43.636Z · score: 4 (2 votes) · LW · GW
The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D.

This conclusion seems similar to the one Paul arrives at here:

In the slow takeoff scenario, pre-AGI systems have a transformative impact that’s only slightly smaller than AGI.

(See also this post from AI Impacts.)

comment by G Gordon Worley III (gworley) · 2019-01-09T02:45:32.451Z · score: 2 (1 votes) · LW · GW

What excites me most about Eric's position since I first learned of it is that it provides a framework for safer AI systems that we might otherwise build if we were trying to target AGI. From this perspective it's valuable for setting policy and missions for AI-focused endeavors in such a way that we potentially delay the creation of AGI.

Although it might be argued that this is inevitable (last time I talked to Eric this was the impression that I got; he felt he was laying out some ideas that would happen anyway and was taking the time to explain why he thinks they will happen that way, rather than trying to nudge us towards a path), having it codified and publicized as a best course of action, it may serve on the margin to more encourage folks to work in a paradigm of doing AI develop with an eye towards incorporation in CAIS rather than as a stepping stone towards AGI. This is important because it will apply optimization pressure to ignore adding the things AGI would need since those may take extra time and cost, and if most of the short and medium term economic and academic benefits can be realized within the CAIS paradigm, then we will see a shift towards optimizing more for CAIS and less for AGI, which seems broadly beneficial from a safety standpoint because CAIS is less integrated and less agentic by design (at least for now; that might be a path from CAIS to AGI). Having this be common knowledge and the accepted paradigm of AI research is thus beneficial for pushing people away from incentive gradients that more directly lead to AGI, buying time for more safety research.

Given this, it's probably worthwhile for folks well positioned to influence other researchers to be made better aware of this work, which might be something folks here can do if they have the ears of those people (or just are those people).

comment by Mitchell_Porter · 2019-01-09T02:23:03.667Z · score: 2 (1 votes) · LW · GW

So what is he saying? We never need to solve the problem of designing a human-friendly superintelligent agent?

comment by rohinmshah · 2019-01-09T09:59:08.672Z · score: 2 (1 votes) · LW · GW

I don't think he'd make a strong claim about that, but I wouldn't be surprised if he assigned that possibility significant credence. I assign that possibility relatively low credence. I assign much more credence to the position that we'll never need to solve the problem of designing a human-friendly superintelligent goal-directed agent.

comment by Charlie Steiner · 2019-01-09T03:46:41.535Z · score: 1 (1 votes) · LW · GW

Thanks for the summary! I agree that this is missing some extra consideration for programs that are planning / searching at test time. We normally think of Google Maps as non-agenty, "tool-like," "task-directed," etc, but it's performing a search for the best route from A to B, and capable of planning to overcome obstacles - as long as those obstacles are within the ontology of its map of ways from A to B.

A thermostat is dumber than Google Maps, but its data is more closely connected to the real world (local temperature rather than general map), and its output is too (directly controlling a heater rather than displaying directions). If we made a "Google Thermostat Maps" website that let you input your thermostat's state, and showed you a heater control value, it would perform the same computations as your thermostat but lose its apparent agency. The condition for us treating the thermostat like an agent isn't just what computation it's doing, it's that its input, search (such as it is), and output ontologies match and extend into the real world well enough that even very simple computation can produce behavior suitable for the intentional stance.

comment by PeterMcCluskey · 2019-01-08T19:58:27.064Z · score: 1 (1 votes) · LW · GW

I consider it important to further clarify the notion of a bounded utility function.

A deployed neural network has a utility function that can be described as outputting a description of the patterns it sees in its most recent input, according to whatever algorithm it's been trained to apply. It's pretty clear to any expert that the neural network doesn't care about anything beyond a specific set of numbers that it outputs.

A neural network that is in the process of being trained is slightly harder to analyze, but essentially the same. It cares about generating an algorithm that will be used in a deployed neural network. At any one training step, it is focused solely on applying fixed algorithms to produce improvements to the deployable algorithm. It has no concept that would lead it to look beyond its immediate task of incremental improvements to that deployable algorithm.

And in some important sense, those steps are the main ways in which AI gets used to produce cars that have superhuman driving ability, and the designers can prove (at least to themselves) that the cars won't go out and buy more processing power, or forage for more energy.

Many forms of AI will be more complex than neural networks (e.g. they might be a mix of RL and neural networks), and I don't have the expertise to extend this analysis to those systems. I'm confident that it's possible in principle to get general-purpose superhuman AIs using only this kind of bounded utility function, but I'm uncertain how practical that is compared to a more unified agent with a broader utility function.

comment by ESRogs · 2019-01-08T20:28:55.488Z · score: 2 (1 votes) · LW · GW

To clarify, when you say "bounded utility function" you mean that it's only defined over a fixed set of inputs, right?

(As opposed to meaning that the output of the function is never infinite, as in this post [LW · GW], which is what I first think of when I hear "bounded utility function". In other words, I expected bounded utility to refer to the range of the function, but you seem to be referring to the domain. Not sure which is more standard, but thought it worth calling out for other readers who may be confused.)

comment by rohinmshah · 2019-01-09T02:23:48.116Z · score: 4 (2 votes) · LW · GW

It sounds like he's talking about services. From the post:

A service is an AI system that delivers bounded results for some task using bounded resources in bounded time.
comment by PeterMcCluskey · 2019-01-08T22:12:50.603Z · score: 3 (2 votes) · LW · GW

I'm not talking about the range. Domain seems possibly right, but not as informative as I'd like. I'm talking about what parts of spacetime it cares about, and saying that it only cares about specific outputs of a specific process. Drexler refers to this as "bounded scope and duration". Note that this will normally be an implicit utility function, that we infer from our understanding of the system.

"bounded utility function" is definitely not an ideal way of referring to this.

comment by atlas · 2019-01-08T11:13:35.230Z · score: 1 (1 votes) · LW · GW
You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process [...]

Does this assume that we'll be able to build generally intelligent systems (e.g. the service-creating-service) that optimize for a bounded task?

comment by rohinmshah · 2019-01-08T16:33:36.651Z · score: 2 (1 votes) · LW · GW

Depends what you mean by "generally intelligent". Any individual service could certainly have deep and broad knowledge about the world (as with eg. a language translation service), but no service will be able to do all tasks (eg. the service-creating-service is not going to be able to edit genomes, except by creating a new service that learns how to edit genomes).

With that caveat, yes, this assumes that we'll be able to build services that optimize for bounded tasks. But this is meant more as a description of how existing AI systems already work. Current RL agents are best modeled as optimizing for maximizing reward obtained for the current episode. (This isn't exactly right, because the value function is trying to capture the reward that can be obtained in the future, but in practice this doesn't make much of a difference.)