(notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach

benito

(notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach

post by Ben Pace (Benito) · 2019-02-04T22:08:34.337Z · LW · GW · 5 comments

  Context and Goals
    The Vector Field Approach
    Quotes
  Efficiency Desiderata
    Expeditious progress
    AI safety
    Conditional stabilization
    Non-turbulence
  Allocation Desiderata
    Universal benefit
    Epsilon-magnanimity
    Continuity
  Population Desiderata
    Mind crime prevention
    Population policy
  Process Desiderata
    First principles thinking, wisdom, and technical understanding
    Speed and decisiveness
    Adaptability
  Changes since 2016
None
5 comments

Meta: I thought I'd spend a little time reading the policy papers that Nick Bostrom has written. I made notes as I went along, so I spent a little while cleaning them up into a summary post. These are my notes on Bostrom, Dafoe and Flynn's 2016 policy desiderata paper, which received significant edits in 2018. I spent 6-8 hours on this post, not a great deal of time, so I've not been maximally careful.

Context and Goals

Overall, this is not a policy proposal. Nor does it commit strongly to a particular moral or political worldview. The goal of this paper is to merely observe which policy challenges are especially important or different in the case of superintelligent AI, that most moral and political worldviews will need to deal with. The paper also makes no positive argument for the importance or likelihood or timeline of superintelligent AI - it instead assumes that this shall occur in the present century, and then explores the policy challenges that would follow.

The Vector Field Approach

Botrom, Dafoe and Flynn spend a fair amount of time explaining that they’re not going to be engaging in what (I think) Robin Hanson would call standard value talk. They’re not going to endorse a particular moral or political theory, nor are they going to adopt various moral or political theories and show how they propose different policies. They’re going to look at the details of this particular policy landscape and try to talk about the regularities that will need to be addressed by most standard moral and political frameworks, and in what direction these regularities suggest changing policy.

They call this the ‘vector field’ approach. If you don't feel like you fully grok the concept, here's the quote where they lay out the formalism (with light editing for readability).

The vector field approach might then attempt to derive directional policy change conclusions of a form that we might schematically represent as follows:

“However much emphasis $X$ you think that states ought, under present circumstances, to give to the objective of economic equality, there are certain special circumstances $Y$ , which can be expected to hold in the radical AI context we described above, that should make you think that in those circumstances states should instead give emphasis $f_{Y} (X)$ to the objective of economic equality."

The idea is that $f$ here is some relatively simple function, defined over a space of possible evaluative standards or ideological positions. For instance, $f$ might simply add a term to $X$ , which would correspond to the claim the emphasis given economic equality should be increased by a certain amount in the circumstances $Y$ (according to all the ideological positions under consideration).

Or $f$ might require telling a more complicated story, perhaps along the lines of:

“However much emphasis you give to economic equality as a policy objective under present circumstances, under conditions Y you should want to conceive of economic equality differently—certain dimensions of economic inequality are likely to become irrelevant and other dimensions are likely to become more important or policy-relevant than they are today.”

I particularly like this quote:

This vector field approach is only fruitful to the extent that there are some patterns in how the special circumstances $Y$ impact policy assessments from different evaluative positions. If the prospect of radical AI had entirely different and idiosyncratic implications for every particular ideology or interest platform, then the function $f$ would amount to nothing more than a lookup table.

I read this as saying something like “This paper only makes sense if facts matter, separate to values.” It’s funny to me that this sentence felt necessary to be written.

Quotes

A few more quotes on what the paper is trying to do.

A strong proposal for the governance of advanced AI would ideally accommodate each of these desiderata to a high degree. There may exist additional desiderata that we have not identified here; we make no claim that our list is complete. Furthermore, a strong policy proposal should presumably also integrate many other normative, prudential, and practical considerations that are either idiosyncratic to particular evaluative positions or are not distinctive to the context of radical AI.

[...]

Using a “vector field” approach to normative analysis, we sought to extract directional policy implications from these special circumstances. We characterized these implications as a set of desiderata—traits of future policies, governance structures, or decision-making contexts that would, by the standards of a wide range of key actors, stakeholders, and ethical views, enhance the prospects of beneficial outcomes in the transition to a machine intelligence era

[...]

By “policy proposals” we refer not only official government documents but also plans and options developed by private actors who take an interest in long-term AI developments. The desiderata, therefore, are also relevant to some corporations, research funders, academic or non-profit research centers, and various other organizations and individuals.

Next are the actual desiderata. They're given under four headings (efficiency, allocation, population, and process), each with 2-4 desiderata. Each subheading below corresponds to a policy desiderata in the paper. For each desiderata I have summarised of all the arguments and considerations in the text that felt new or non-trivial to me personally (e.g. I spent only one sentence on the arguments for AI safety).

If you want to just read the paper's summary, jump down to page 23 which has a table and summarises in their own words.

Efficiency Desiderata

Expeditious progress

We should make sure to take ahold of our cosmic endowment - and the sooner the better.

AI safety

Choose policies that leads us to develop sufficient technical understanding that the AI will do what we expect it to do, and that give these tools to AI builders.

Conditional stabilization

The ability to establish a singleton, or regime of intensive global surveillance, or ability to thoroughly suppress the spread of dangerous or info, should we need to use this ability in the face of otherwise catastrophic global coordination failures.

Non-turbulence

Technology will change rapidly. We don’t want to have to rush regulations through, or alternatively take too long to adapt such that the environment radically changes again. So try to reduce turbulence.

Allocation Desiderata

Universal benefit

If you force someone to take a risk, it is only fair that they are compensated with a share of any reward gained. Existential risks involve everyone, so everyone should get proportional benefit.

Epsilon-magnanimity

Many people’s values have diminishing returns to further resources e.g. income guarantees for all, ensuring all animals have minimally positive lives, aesthetic projects like preserving some artworks, etc. While today they must fight for a cut of the small pie, as long as they are granted a non-zero weighting in the long-run, they can be satisfied. 0.00001% of GDP may be more than enough to give all humans a $40k income, for example.

This is especially good in light of normative uncertainty - as long as we give some weighting to various values, they will get satiated in a basic way in the long-run.

Continuity

Reasons to expect unusually high concentration and permutation of wealth and power:

In the modern world, salary is more evenly distributed than capital. Superintelligent AI is likely to greatly increase the factor share of income accrued from capital, leading to massive increases in inequality and increase concentration of wealth.
If a small group decides how the AI works and its high-level decisions, they could gain a decisive strategic advantage and take over the world.
If there is radical and unpredictable technological change, then it is likely that wealth distribution will change radically and unpredictably.
Automated security and surveillance systems will help a regime stay alive without support from the public or elites - when behaviour is more legible it’s easier to punish or control it. This is also likely to at least sustain concentration of wealth and power, but also to increase it.

As such we wish to implement policies that more sustain existing concentration and distribution of wealth and power.

Also of interest, is (given the high likelihood of redistribution, change in concentration, and general unpredictable turbulence) how much we seem to face a global, real-life, Rawlsian veil-of-ignorance. It might be good to set up things like insurance to make sure everyone gets some minimum of power and self-determination in the future (it seems that people have diminishing returns to power - “most people would much rather be certain to have power over one life (their own) than have a 10% chance of having power over the lives of ten people and a 90% chance of having no power.”

Population Desiderata

Mind crime prevention

Four key factors: novelty, invisibility, difference, and magnitude.

Novelty and invisibility: Sentient digital entities may be moral patients. They would be a novel type of mind, and would not exhibit many characteristics that inform our moral intuitions - they lack facial expressions, physicality, human speech, and so on, if they are being run invisibly in some microprocessor. This means we should worry about policy makers taking an unconscionable moral decision.
Difference: It is also the case that these minds may be very different to human or animal minds, again subverting our intuitions about what behaviour is normative toward them, and increasing the complexity of choosing sensible policies here.
Magnitude: It may be incredibly cheap to create as many people as currently exist in a country, magnifying the concerns of the previous three factors. “With high computational speed or parallelization, a large amount of suffering could be generated in a small amount of wall clock time.” This may mean that mind crime is a principal desideratum in AI policy.

Population policy

This is a worry about malthusian scenarios (where average income falls to subsistence levels). Hanson has written about these scenarios.

This can also undermine democracy (“One person, one vote”). If a political faction can invest in creating more people, they can create the biggest voting block. This leaves the following trilemma of options:

(i) deny equal votes to all persons
(ii) impose constraints on creating new persons
(iii) accept that voting power becomes proportional to ability and willingness to pay to create voting surrogates, resulting in both economically inefficient spending on such surrogates and the political marginalization of those who lack resources or are unwilling to spend them on buying voting power

Some interesting forms of (i):

Make voting rights something you inherit, a 1-1 mapping.
Robin Hanson has suggested ‘speed-weighted voting’, because faster ems are more costly, so you'd actually have to pay a lot for marginal voters. This still looks like richer people getting a stronger vote, but in-principle puts a much higher cost on it.

Process Desiderata

First principles thinking, wisdom, and technical understanding

Overall this is an especially different environment than usual policy-making, which means that we will need to be able to reconsider fundamental assumptions using first-principles thinking to a greater extent than before and be exceptionally wise (able to get the right answer to the most important questions while they are surrounded by confusion and misunderstanding).

Technological innovation is the primary driver of this radical new policy landscape, and so an understanding of the technologies is unusually helpful.

Speed and decisiveness

In many possible futures, historic events will be happening faster than global treaties are typically negotiated, ratified, and implemented. We need a capacity for rapid decision-making and decisive global implementation.

Adaptability

Many fundamental principles will need to be re-examined. Some examples: legitimacy, consent, political participation, accountability.

Voluntary consent. Given AIs that are super-persuaders and can convince anyone of anything, consent becomes a much vaguer and fuzzier concept. Perhaps consent only counts if the consentee has an “AI guardian” or “AI advisor” of some sort.

Political participation. This norm is typically justified on three grounds:

Epistemic benefit of including information from a maximal diversity of sources.
Ensures all interests and preferences are given some weighting in the decision.
Intrinsic good.

However,

The epistemic effect may become negative if the AI making decisions sits at a sufficiently high epistemic vantage point.
AI may be able to construct a process / mechanism that accounts for all values without consistent input from humans.
The intrinsic good is not changed, though it may not be worth the cost if the above to factors become strongly net negative and wasteful.

The above examples, of consent and political participation, are not at all clear, but just go to show that there are many unquestioned assumptions in modern political debate that may need either reformulation, abandonment, or extra vigilance spent on safeguarding their existence into the future.

Changes since 2016

The paper was originally added to Nick Bostrom's website in 2016, and received an update in late 2018 (original, current).

The main updates as I can see them are:

The addition of 'vector field approach' to the title and body. It was lightly alluded to in the initial version. (I wonder if this was due to lots of feedback trying to fit the paper into standard value talk, where it did not want to be.)
Changing the heading from "Mode" to "Process", and fleshing out the three desiderata rather than a single one called "Responsibility and wisdom". If you read the initial paper, this is the main section to re-read to get anything new.

There have definitely being significant re-writings of the opening section, and there may be more, but I did not take the time to compare them section-for-section.

I've added some personal reflection/updates in a comment [LW(p) · GW(p)].

5 comments

Comments sorted by top scores.

comment by Donald Hobson (donald-hobson) · 2019-02-06T00:27:27.407Z · LW(p) · GW(p)

I suspect that the social institutions of Law and Money are likely to become increasingly irrelevant background to the development of ASI.

Deterrence Fails.

If you believe that there is a good chance of immortal utopia, and a large chance of paperclips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is negligible.

The law is blind to safety.

The law is bureaucratic and ossified. It is probably not employing much top talent, as it's hard to tell top talent from the rest if you aren't as good yourself (and it doesn't have the budget or glamor to attract them). Telling whether an organization is on line for not destroying the world is HARD. The safety protocols are being invented on the fly by each team, the system is very complex and technical and only half built. The teams that would destroy the world aren't idiots, they are still producing long papers full of maths and talking about the importance of safety a lot. There are no examples to work with, or understood laws.

Likely as not (not really, too much conjugation here), you get some random inspector with a checklist full of thing that sound like a good idea to people who don't understand the problem. All AI work has to have an emergency stop button that turns the power off. (The idea of an AI circumventing this was not considered by the person who wrote the list).

All the law can really do is tell what public image an AI group want's to present, provide funding to everyone, and get in everyone's way. Telling cops to "smash all GPU's" would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can't even tell an AI project from a maths convention from a normal programming project if the project leaders are incentivized to obfuscate.

After ASI, governments are likely only relevant if the ASI was programmed to care about them. Neither paperclippers or FAI will care about the law. The law might be relevant if we had tasky ASI that was not trivial to leverage into a decisive strategic advantage. (An AI that can put a strawberry on a plate without destroying the world, but that's about the limit of its safe operation.)

Such an AI embodies an understanding of intelligence and could easily be accidentally modified to destroy the world. Such scenarios might involve ASI and timescales long enough for the law to act.

I don't know how the law can handle something that, can easily destroy the world, has some economic value (if you want to flirt danger) and, with further research could grant supreme power. The discovery must be limited to a small group of people, (law of large number of nonexperts, one will do something stupid). I don't think the law could notice what it was, after all the robot in-front of the inspector only puts strawberries on plates. They can't tell how powerful it would be with an unbounded utility function.

comment by Ben Pace (Benito) · 2019-02-04T22:00:43.336Z · LW(p) · GW(p)

Reflections

I definitely am not quite sure what the epistemic state of the paper is, or even its goal. Bostrom, Dafoe and Flynn keep mentioning that this paper is not a complete list of desiderata, but I don't know what portion of key desiderata they think they've hit, or why they think it's worthwhile at this stage to pre-emptively list the desiderata that currently seem important.

(Added: My top hypothesis is that Bostrom was starting a policy group with Dafoe as its head, and thought to himself "What are the actual policy implications of the work in my book?" and then wrote them down, without expecting it to be complete, just an obvious starting point.)

As to my thoughts on whether the recommendations in the paper seem good... to be honest, it all felt so reasonable and simple (added: this is a good thing). There were not big leaps of inference. It didn't feel surprising to me. But here's a few updates/reflections.

I have previously run the thought experiment "What would I do if I were at the start, or just before the start, of the industrial revolution?" Thought pertaining to massive turbulence, redistribution, and concentration, and adaptability, seemed natural focal concerns to me, but I had not made them as precise or as clear as the paper had. Then again I'd been thinking more about what I as an individual should do, not how a government or larger organisation should approach the problem. I definitely hadn't thought about population dynamics in that context (which were also a big deal after the industrial revolution - places like England scaled by an order of magnitude, requiring major infrastructural changes in politics, education, industry, and elsewhere).

I think that the technical details of AI are most important in the sections on Efficiency and Population. The sections on Allocation and Process I would expect to apply to any technological revolution (industrial, agricultural, etc).

I'm not sure that this is consistent with his actions, but I think it's likely that Ben from yesterday would've said the words "In order to make sensible progress on AI policy you require a detailed understanding of the new technology". I realise now that, while it is indeed required to get the overall picture right, there is progress to be made that merely takes heed of this being a technological revolution of historic proportions, and does not need to matter too much which particular technological revolution we're going through.

I've seen another discussion here, along with the Vulnerable World Hypothesis paper (LW discussion here [LW · GW]), for the need for the ability to execute a massive coordination increase. I'm going to definitely think more about 'conditional stabilization', how exactly it follows from the conceptual space of thinking about singletons and coordination, and what possible things it might look like (global surveillance seems terrible on the face of it, I wonder if moving straight to that is premature. I think there's probably a lot more granular ways of thinking about surveillance).

In general this paper is full of very cautious and careful conceptual work, based on simple arguments and technical understandings of AI and coordination. In general I don't trust many people to do this without vetting the ideas in depth myself or without seeing a past history of their success. Bostrom certainly ticks the latter box and weakly ticks the former box for me (I've yet to personally read enough of his writings to say anything stronger there), and given that he's a primary author on this paper, I feel epistemically safe taking on these framings without 30-100 hours of further examination.

I hope to be able to spend a similar effort summarising the many other strategic papers Bostrom and others at the FHI have produced.

Feedback

For future posts of a similar nature, please PM me [LW · GW] if you have any easy changes that would've made this post more useful to you / made it easier to get the info you needed (I will delete public comments on that topic). It'd also be great to (publicly) hear that someone else actually read the paper and checked whether my notes missed something important or are inaccurate.

Replies from: rohinmshah

↑ comment by Rohin Shah (rohinmshah) · 2019-02-06T22:47:09.557Z · LW(p) · GW(p)

It'd also be great to (publicly) hear that someone else actually read the paper and checked whether my notes missed something important or are inaccurate.

I read the paper over a year ago (before the update), and reviewing my notes, they look similar to yours (but less detailed).

comment by Vaniver · 2019-02-05T19:54:11.026Z · LW(p) · GW(p)

I read this as saying something like “This paper only makes sense if facts matter, separate to values.” It’s funny to me that this sentence felt necessary to be written.

I mean, it's more something like "there's a shared way in which facts matter," right? If I mostly think in terms of material consumption by individuals, and you mostly think in terms of human dignity and relationships, the way in which facts matter for both of us is only tenuously related.

comment by Bird Concept (jacobjacob) · 2019-02-05T10:07:42.749Z · LW(p) · GW(p)

Hanson's speed-weighted voting reminds me a bit of quadratic voting.

(notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach

Contents

Context and Goals

The Vector Field Approach

Quotes

Efficiency Desiderata

Expeditious progress

AI safety

Conditional stabilization

Non-turbulence

Allocation Desiderata

Universal benefit

Epsilon-magnanimity

Continuity

Population Desiderata

Mind crime prevention

Population policy

Process Desiderata

First principles thinking, wisdom, and technical understanding

Speed and decisiveness

Adaptability

Changes since 2016

5 comments

Reflections

Feedback