Why I am not currently working on the AAMLS agenda

post by jessicata (jessica.liu.taylor) · 2017-06-01T17:57:24.000Z · LW · GW · 3 comments

Contents

    The agenda
    History
    Progress since the paper
  Difficulty
  Going for the throat
  Theory vs. empiricism
  Doing other things
    Relevant updates I've made
  Against plausibility arguments
  In favor of lots of philosophical hardness
  Against particular agendas
  Against research being optimized for outside understandability
    The current state of the agenda
None
3 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.


The agenda

See the paper. The agenda lists 8 theoretical problems relevant to aligning AI systems substantially similar to current machine learning systems.

History

Around March 2016, I had thoughts about research prioritization: I thought it made sense to AI safety researchers spend more time thinking about machine learning systems. In a similar timeframe, some other researchers updated towards shorter timelines. I had some discussions with Eliezer, Paul, Nate, and others, and came up with a list of problems that seemed useful to think about.

Then some of us (mostly me, with significant help from others) wrote up the paper about the problems. The plan was for some subset of the researchers to work on them.

Progress since the paper

Since writing the paper, progress has been slow:

Why was little progress made?

Difficulty

I think the main reason is that the problems were very difficult. In particular, they were mostly selected on the basis of "this seems important and seems plausibly solveable", rather than any strong intuition that it's possible to make progress.

In comparison, problems in the agent foundations agenda have seen more progress:

One thing to note about these problems is that they were formulated on the basis of a strong intuition that they ought to be solveable. Before logical induction, it was possible to have the intuition that some sort of asymptotic approach could solve many logical uncertainty problems in the limit. It was also possible to strongly think that some sort of self-trust is possible.

With problems in the AAMLS agenda, the plausibility argument was something like:

which, empirically, turned out not to make for tractable research problems.

Going for the throat

In an important sense, the AAMLS agenda is "going for the throat" in a way that other agendas (e.g. the agent foundations agenda) are to a lesser extent: it is attempting to solve the whole alignment problem (including goal specification) given access to resources such as powerful reinforcement learning. Thus, the difficulties of the whole alignment problem (e.g. specification of environmental goals) are more exposed in the problems.

Theory vs. empiricism

Personally, I strongly lean towards preferring theoretical rather than empirical approaches. I don't know how much I endorse this bias overall for the set of people working on AI safety as a whole, but it is definitely a personal bias of mine.

Problems in the AAMLS agenda turned out not to be very amenable to purely-theoretical investigation. This is probably due to the fact that there is not a clear mathematical aesthetic for determining what counts as a solution (e.g. for the environmental goals problem, it's not actually clear that there's a recognizable mathematical statement for what the problem is).

With the agent foundations agenda, there's a clearer aesthetic for recognizing good solutions. Most of the problems in the AAMLS agenda have a less-clear aesthetic. (There are probably additional ways of investigating the AI alignment problem in a highly aesthetic fashion other than the agent foundations agenda, but I don't know of them yet).

Doing other things

Perhaps related to the fact that the problems were so hard, I repeatedly found other things to feel better to think about and work on than AAMLS:

That is, though I was officially lead on AAMLS, I mostly did other things in that time period. I think this was mostly correct (though unfortunately made the official story somewhat misleading): I intuitively expect that the other things I did had a greater payoff than working on AAMLS would have.

Relevant updates I've made

I've made some updates (some due to AAMLS, some not) that make AAMLS look like a worse idea now than before.

Against plausibility arguments

As discussed before, I included problems based on plausibility rather than a strong intuition that the problem is solveable. I've updated against this being a useful research strategy; I think strong intuitions about things being solveable is a better guide as to what to work on. Note that strong intuitions can be miscalibrated; however, even in these cases there is still a strong model behind the intuition that can be tested by pursuing the research implied by the intutiion.

In favor of lots of philosophical hardness

I've updated in favor of the proposition that essential AI safety problems (especially those related to benign induction, bounded logical uncertainty, and environmental goals) are philosophically hard rather than only mathematically hard. That is: just taking our current philosophical thinking and attempting to formalize it will fail, because our current philosophical thinking is confused.

The main reason for this intuition is thinking about these problems for a significant time and then noticing that, in near mode, I don't expect to be able to find satisfying solutions (e.g. a particular thing and a mathematical proof related to the thing that yields high confidence it will work; it's hard to imagine what the premises or conclusions of the mathematical proof would be). So it looks like large ontological shifts will be necessary to even get to the stage of picking the right problems to formalize and solve.

Against particular agendas

I've moved towards a research approach that is less "rigid" than working on a particular agenda. Every particular research agenda for AI alignment that I know of (agent foundations, AAMLS, concrete problems in AI safety, Paul's agenda) offers a useful perspective on the problem, but is quite limited in itself. Each agenda does some combination of (a) containing "impossible" problems, or (b) ignoring large parts of the AI safety problem. If the overall alignment problem is solved, it will probably be solved through researchers obtaining new, not-currently-existing perspectives on the problem.

In general I think the purpose of technical agendas is something like:

Against research being optimized for outside understandability

I've updated against the idea that research should be significantly optimized for being understandable to outsiders. (I previously considered understandability a significant point in favor of working on AAMLS but not one of the main considerations). The intuitions in favor of this type of research are fairly obvious:

I now have additional intuitions against:

Overall it still seems like outside understandability is weakly net-positive, but I don't plan to use it as a significant optimization criterion when deciding which research to do (i.e. I'll aim to just do research good according to my aesthetics and then figure out how to make it understandable later).

The current state of the agenda

3 comments

Comments sorted by top scores.

comment by Ben Smith (ben-smith) · 2021-09-18T02:45:30.262Z · LW(p) · GW(p)

Interesting comments, thanks. Currently exploring an agenda of my own and this is food for thought.

comment by IAFF-User-111 (Imported-IAFF-User-111) · 2017-05-26T01:54:29.000Z · LW(p) · GW(p)

The "benign induction problem" link is broken.

Replies from: jessica.liu.taylor
comment by jessicata (jessica.liu.taylor) · 2017-06-01T17:57:42.000Z · LW(p) · GW(p)

Thanks, fixed.