## Posts

Reframing the evolutionary benefit of sex 2019-09-14T17:00:01.184Z · score: 58 (19 votes)
Ought: why it matters and ways to help 2019-07-25T18:00:27.918Z · score: 82 (32 votes)
Aligning a toy model of optimization 2019-06-28T20:23:51.337Z · score: 52 (17 votes)
What failure looks like 2019-03-17T20:18:59.800Z · score: 208 (83 votes)
Security amplification 2019-02-06T17:28:19.995Z · score: 20 (4 votes)
Reliability amplification 2019-01-31T21:12:18.591Z · score: 21 (5 votes)
Techniques for optimizing worst-case performance 2019-01-28T21:29:53.164Z · score: 23 (6 votes)
Thoughts on reward engineering 2019-01-24T20:15:05.251Z · score: 30 (8 votes)
Learning with catastrophes 2019-01-23T03:01:26.397Z · score: 26 (8 votes)
Capability amplification 2019-01-20T07:03:27.879Z · score: 24 (7 votes)
The reward engineering problem 2019-01-16T18:47:24.075Z · score: 23 (4 votes)
Towards formalizing universality 2019-01-13T20:39:21.726Z · score: 29 (6 votes)
Directions and desiderata for AI alignment 2019-01-13T07:47:13.581Z · score: 29 (6 votes)
Ambitious vs. narrow value learning 2019-01-12T06:18:21.747Z · score: 19 (5 votes)
AlphaGo Zero and capability amplification 2019-01-09T00:40:13.391Z · score: 26 (10 votes)
Supervising strong learners by amplifying weak experts 2019-01-06T07:00:58.680Z · score: 28 (7 votes)
Benign model-free RL 2018-12-02T04:10:45.205Z · score: 10 (2 votes)
Corrigibility 2018-11-27T21:50:10.517Z · score: 39 (9 votes)
Humans Consulting HCH 2018-11-25T23:18:55.247Z · score: 19 (3 votes)
Approval-directed bootstrapping 2018-11-25T23:18:47.542Z · score: 19 (4 votes)
Approval-directed agents 2018-11-22T21:15:28.956Z · score: 22 (4 votes)
Prosaic AI alignment 2018-11-20T13:56:39.773Z · score: 36 (9 votes)
An unaligned benchmark 2018-11-17T15:51:03.448Z · score: 27 (6 votes)
Clarifying "AI Alignment" 2018-11-15T14:41:57.599Z · score: 54 (16 votes)
The Steering Problem 2018-11-13T17:14:56.557Z · score: 38 (10 votes)
Preface to the sequence on iterated amplification 2018-11-10T13:24:13.200Z · score: 39 (14 votes)
The easy goal inference problem is still hard 2018-11-03T14:41:55.464Z · score: 38 (9 votes)
Could we send a message to the distant future? 2018-06-09T04:27:00.544Z · score: 40 (14 votes)
When is unaligned AI morally valuable? 2018-05-25T01:57:55.579Z · score: 101 (31 votes)
Open question: are minimal circuits daemon-free? 2018-05-05T22:40:20.509Z · score: 121 (38 votes)
Weird question: could we see distant aliens? 2018-04-20T06:40:18.022Z · score: 85 (25 votes)
Implicit extortion 2018-04-13T16:33:21.503Z · score: 74 (22 votes)
Prize for probable problems 2018-03-08T16:58:11.536Z · score: 135 (37 votes)
Argument, intuition, and recursion 2018-03-05T01:37:36.120Z · score: 99 (29 votes)
Funding for AI alignment research 2018-03-03T21:52:50.715Z · score: 108 (29 votes)
Funding for independent AI alignment research 2018-03-03T21:44:44.000Z · score: 0 (0 votes)
The abruptness of nuclear weapons 2018-02-25T17:40:35.656Z · score: 95 (35 votes)
Funding opportunity for AI alignment research 2017-08-27T05:23:46.000Z · score: 1 (1 votes)
Ten small life improvements 2017-08-20T19:09:23.673Z · score: 26 (19 votes)
Crowdsourcing moderation without sacrificing quality 2016-12-02T21:47:57.719Z · score: 15 (11 votes)
Optimizing the news feed 2016-12-01T23:23:55.403Z · score: 9 (10 votes)
The universal prior is malign 2016-11-30T22:31:41.000Z · score: 4 (4 votes)
Recent AI control posts 2016-11-29T18:53:57.656Z · score: 12 (13 votes)
My recent posts 2016-11-29T18:51:09.000Z · score: 5 (5 votes)
If we can't lie to others, we will lie to ourselves 2016-11-26T22:29:54.990Z · score: 25 (18 votes)
Less costly signaling 2016-11-22T21:11:06.028Z · score: 14 (16 votes)
Control and security 2016-10-15T21:11:55.000Z · score: 3 (3 votes)
What is up with carbon dioxide and cognition? An offer 2016-04-23T17:47:43.494Z · score: 40 (32 votes)
Time hierarchy theorems for distributional estimation problems 2016-04-20T17:13:19.000Z · score: 2 (2 votes)

Comment by paulfchristiano on Reframing the evolutionary benefit of sex · 2019-09-15T15:40:54.626Z · score: 2 (1 votes) · LW · GW

On this picture, the claim is just that sex is worthwhile if it adds enough variance to make it worthwhile, so you could either reduce costs or increase variance.

(This picture could be wrong though, if the average of two organisms is just more fit on average than the parents and it's not about variance at all.)

(This isn't really affected by Wei Dai's concern above.)

Comment by paulfchristiano on Reframing the evolutionary benefit of sex · 2019-09-15T15:34:15.690Z · score: 2 (1 votes) · LW · GW

I didn't set it up, the LW team did.

Comment by paulfchristiano on Reframing the evolutionary benefit of sex · 2019-09-15T15:16:29.578Z · score: 5 (2 votes) · LW · GW

From an evolutionary perspective, males and females presumably get an equally good deal on the margin (or else the sex ratio would shift). That need not need to look like a productive investment in order to justify this basic picture (e.g. the story would be the same if males fought for territory and protected mates against other males).

If the low-investment sex doesn't add value of any kind then it would change this picture. E.g. if males just compete for mates and then do nothing beyond mate, then females would get an advantage by cloning themselves. Maybe plants actually fit into this picture most straightforwardly of all.

(This would actually also happen if males invested 50%, if they couldn't track paternity and females could secretly fertilize themselves.)

In the case where you get nothing in return, it seems like you are taking a 50% fitness hit from sex by passing on half as many genes. So if differences in fitness between kids were 5%, it would take about 300 generations for sex to break even. If fertilizing yourself is a complicated adaptation, then maybe that's enough to stick with sex (after it evolves in some more equitable species) but it's pretty different from my claim about breaking even in 6 generations. And in the case of plants or hermaphrodites presumably there is an easier physical gradient to self-fertilization, so that's even more puzzling and maybe I'm back to the usual degree to which biologists are puzzled (where it either requires surprisingly forward-looking, or you'd need some story about why the advantage is bigger than it looks).

Comment by paulfchristiano on Conversation with Paul Christiano · 2019-09-12T15:31:58.307Z · score: 5 (4 votes) · LW · GW

Someone from MIRI can chime in. I think that MIRI researchers are much happier to build AI that solves a narrow range of tasks, and isn't necessarily competitive. I think I'm probably the most extreme person on this spectrum.

Comment by paulfchristiano on Conversation with Paul Christiano · 2019-09-12T15:31:09.539Z · score: 4 (3 votes) · LW · GW

Maybe you have a 30% chance of solving the clean theoretical problem. And a 30% chance that you could wing AI alignment with no technical solution. If they were independent, you would have a 50% probability of being able to do one or the other.

But things are worse than this, because both of them are more likely to work if alignment turns out to be easy. So maybe it's more like a 40% probability of being able to do one or the other.

But in reality, you don't need to solve the full theoretical problem or wing the problem without understanding anything more than we do today. You can have a much better theoretical understanding than we currently do, but not good enough to solve the problem. And you can be pretty prepared to wing it, even if it's not good enough to solve the problem without knowing anything it might be good enough if combined with a reasonable theoretical picture.

(Similarly for coordination.)

Comment by paulfchristiano on Counterfactual Oracles = online supervised learning with random selection of training episodes · 2019-09-11T19:08:41.404Z · score: 5 (2 votes) · LW · GW
the Oracle could still hack the grading system to give itself a zero loss

Gradient descent won't optimize for this behavior though, it really seems like you want to study this under inner alignment. (It's hard for me to see how you can meaningfully consider the problems separately.)

Yes, if the oracle gives itself zero loss by hacking the grading system then it will stop being updated, but the same is true if the mesa-optimizer tampers with the outer training process in any other way, or just copies itself to a different substrate, or whatever.

Comment by paulfchristiano on Counterfactual Oracles = online supervised learning with random selection of training episodes · 2019-09-11T19:06:37.358Z · score: 5 (2 votes) · LW · GW

Yes, most of the algorithms in use today are known to converge or roughly converge to optimizing per-episode rewards. In most cases it's relatively clear that there is no optimization across episode boundaries (by the outer optimizer).

Comment by paulfchristiano on Counterfactual Oracles = online supervised learning with random selection of training episodes · 2019-09-11T16:01:30.130Z · score: 5 (2 votes) · LW · GW

How do you even tell that an algorithm is optimizing something?

In most cases we have some argument that an algorithm is optimizing the episodic reward, and it just comes down to the details of that argument.

If you are concerned with optimization that isn't necessarily intended and wondering how to more effectively look out for it, it seems like you should ask "would a policy that has property P be more likely to be produced under this algorithm?" For P="takes actions that lead to high rewards in future episodes" the answer is clearly yes, since any policy that persists for a long time necessarily has property P (though of course it's unclear if the algorithm works at all). For normal RL algorithms there's not any obvious mechanism by which this would happen. It's not obvious that it doesn't, until you prove that these algorithms converge to optimizing per-episode rewards. I don't see any mechanical way to test that (just like I don't see any mechanical way to test almost any property that we talk about in almost any argument about anything).

Comment by paulfchristiano on Counterfactual Oracles = online supervised learning with random selection of training episodes · 2019-09-10T15:28:59.385Z · score: 7 (3 votes) · LW · GW

Episodic learning algorithms will still penalize this behavior if it appears on the training distribution, so it seems reasonable to call this an inner alignment problem.

Comment by paulfchristiano on Counterfactual Oracles = online supervised learning with random selection of training episodes · 2019-09-10T15:28:03.970Z · score: 3 (2 votes) · LW · GW
For example, consider the following episodic learning algorithm

When I talk about an episodic learning algorithm, I usually mean one that actually optimizes performance within an episode (like most of the algorithms in common use today, e.g. empirical risk minimization treating episode initial conditions as fixed). The algorithm you described doesn't seem like an "episodic" learning algorithm, given that it optimizes total performance (and essentially ignores episode boundaries).

Comment by paulfchristiano on [AN #62] Are adversarial examples caused by real but imperceptible features? · 2019-08-31T21:02:50.596Z · score: 2 (1 votes) · LW · GW

I think AI will probably be good enough to pose a catastrophic risk before it can exactly imitate a human. (But as Wei Dai says elsewhere, if you do amplification then you will definitely get into the regime where you can't imitate.)

Comment by paulfchristiano on [AN #62] Are adversarial examples caused by real but imperceptible features? · 2019-08-30T15:26:12.772Z · score: 3 (2 votes) · LW · GW

Against mimicry is mostly motivated by the case of imitating an amplified agent. (I try to separate the problem of distillation and amplification, and imitation learning is a candidate for mimicry.)

You could try to avoid the RL exploiting a security vulnerability in the overseer by:

• Doing something like quantilizing where you are constrained to be near the original policy (we impose a KL constraint that prevents the policy from drifting too far from an attempted-imitation).
• Doing something like meeting halfway.

These solutions seem tricky but maybe helpful. But my bigger concern is that you need to fix security vulnerabilities anyway:

• The algorithm "Search over lots of actions to find the one for which Q(a) is maximized" is a pretty good algorithm, that you need to be able to use at test time in order to be competitive, and which seems to require competitiveness.
• Iterated amplification does optimization anyway (by amplifying the optimization done by the individual humans) and without security you are going to have problems there.

I mostly hope to solve this problem with security amplification (see also).

Comment by paulfchristiano on Soft takeoff can still lead to decisive strategic advantage · 2019-08-25T03:12:00.367Z · score: 6 (4 votes) · LW · GW
Wouldn't one project have more compute than the others, and thus pull ahead so long as funds lasted?

To have "more compute than all the others" seems to require already being a large fraction of all the world's spending (since a large fraction of spending is on computers---or whatever bundle of inputs is about to let this project take over the world---unless you are positing a really bad mispricing). At that point we are talking "coalition of states" rather than "project."

I totally agree that it wouldn't be crazy for a major world power to pull ahead of others technologically and eventually be able to win a war handily, and that will tend happen over shorter and shorter timescales if economic and technological progress accelerate.

(Or you might think the project is a small fraction of world compute but larger than any other project, but if economies of scale are in fact this critical, then you are again suggesting a really gigantic market failure. That's not beyond the pale, but we should be focusing on why this crazy market failure is happening.)

Comment by paulfchristiano on Soft takeoff can still lead to decisive strategic advantage · 2019-08-25T03:06:05.478Z · score: 7 (4 votes) · LW · GW
This gets us into the toy model & its problems. I don't think I understand your alternative model. I maybe don't get what you mean by trading. Does one party giving money to another party in return for access to their technology or products count? If so, then I think my original model still stands: The leading project will be able to hoard technology/innovation and lengthen its lead over the rest of the world so long as it still has funding to buy the necessary stuff.

The reason I let other people use my IP is because they pay me money, with which I can develop even more IP. If the leading project declines to do this, then it will have less IP than any of its normal competitors. If the leading project's IP allows it to be significantly more productive than everyone else, then they could have just taken over the world through the normal mechanism of selling products. (Modulo leaks/spying.) As far as I can tell, until you are a large fraction of the world, the revenue you get from selling lets you grow faster, and I don't think the toy model really undermines that typical argument (which has to go through leaks/spying, market frictions, etc.).

Comment by paulfchristiano on Soft takeoff can still lead to decisive strategic advantage · 2019-08-25T02:51:20.495Z · score: 5 (3 votes) · LW · GW
A coalition strong enough to prevent the world's leading project from maintaining and lengthening its lead would need to have some way of preventing the leading project from accessing the innovations of the coalition. Otherwise the leading project will free-ride off the research done by the coalition. For this reason I think that a coalition would look very different from the world economy; in order to prevent the leading project from accessing innovations deployed in the world economy you would need to have an enforced universal embargo on them pretty much, and if you have that much political power, why stop there? Why not just annex them or shut them down?

Are you saying that the leading project can easily spy on other projects, but other projects can't spy on it? Is this because the rest of the world is trading with each other, and trading opens up opportunities for spying? Some other reason I missed? I don't think it's usually the case that gains from rabbit-holing, in terms of protection from spying, are large enough to outweigh the costs from not trading. It seems weird to expect AI to change that, since you are arguing that the proportional importance of spying will go down, not up, because it won't be accelerated as much.

If the leading project can't spy on everyone else, then how does it differ from all of the other companies who are developing technology, keeping it private, and charging other people to use it? The leading project can use others' technology when it pays them, just like they use each other's technology when they pay each other. The leading project can choose not to sell its technology, but then it just has less money and so falls further and further behind in terms of compute etc. (and at any rate, it needs to be selling something to the other people in order to even be able to afford to use their technology).

Comment by paulfchristiano on Soft takeoff can still lead to decisive strategic advantage · 2019-08-24T15:31:06.762Z · score: 13 (8 votes) · LW · GW

Why would a coalition look very different from the world economy, be controlled by a few people, and be hard to form? My default expectation is that it would look much like the world economy. (With the most obvious changes being a fall in the labor share of income / increasing wage inequality.) A few big underlying disagreements:

• I don't think I agree that most progress in AI is driven by rare smart individuals talking to each other---I think it's not very accurate as a description of current AI progress, that it will be even less true as AI progress becomes a larger share of the world economy, and that most "AI" progress is driven by compute/energy/data/other software rather than stuff that most looks like insight-driven AI progress.
• Your toy model seems wrong: most projects make extensive use of other people's private innovations, by trading with them. So the project that hoards the most innovations can still only be competitive if it trades with others (in order to get access to their hoarded innovations).
• I think my more basic complaint is with the "pressure to make a profit over some timescale" model. It think it's more like: you need inputs from the rest of the economy and so you trade with them. Right now deep learning moonshots don't trade with the rest of the world because they don't make anything of much value, but if they were creating really impactful technology then the projects which traded would be radically faster than the projects which just used their innovations in house. This is true even if all innovations are public, since they need access to physical capital.

I think any of these would be enough to carry my objection. (Though if you reject my first claim, and thought that rare smart individuals drove AI progress even then AI progress was overwhelmingly economically important, then you could imagine a sufficiently well-coordinated cartel of those rare smart individuals having a DSA.)

Comment by paulfchristiano on Clarifying "AI Alignment" · 2019-08-23T19:25:57.611Z · score: 4 (2 votes) · LW · GW
Would you say that the system in my example is both trying to do what H wants it to do, and also trying to do something that H doesn't want? Is it intent aligned period, or intent aligned at some points in time and not at others, or simultaneously intent aligned and not aligned, or something else?

The oracle is not aligned when asked questions that cause it to do malign optimization.

The human+oracle system is not aligned in situations where the human would pose such questions.

For a coherent system (e.g. a multiagent system which has converged to a Pareto efficient compromise), it make sense to talk about the one thing that it is trying to do.

For an incoherent system this abstraction may not make sense, and a system may be trying to do lots of things. I try to use benign when talking about possibly-incoherent systems, or things that don't even resemble optimizers.

The definition in this post is a bit sloppy here, but I'm usually imagining that we are building roughly-coherent AI systems (and that if they are incoherent, some parts are malign). If you wanted to be a bit more careful with the definition, and want to admit vagueness in "what H wants it to do" (such that there can be several different preferences that are "what H wants") we could say something like:

A is aligned with H if everything it is trying to do is "what H wants."

That's not great either though (and I think the original post is more at an appropriate level of attempted-precision).

Comment by paulfchristiano on Clarifying "AI Alignment" · 2019-08-23T02:28:45.605Z · score: 8 (4 votes) · LW · GW

Yes, I'd say that to the extent that "trying to do X" is a useful concept, it applies to systems with lots of agents just as well as it applies to one agent.

Even a very theoretically simple system like AIXI doesn't seem to be "trying" to do just one thing, in the sense that it can e.g. exert considerable optimization power at things other than reward, even in cases where the system seems to "know" that its actions won't lead to reward.

You could say that AIXI is "optimizing" the right thing and just messing up when it suffers inner alignment failures, but I'm not convinced that this division is actually doing much useful work. I think it's meaningful to say "defining what we want is useful," but beyond that it doesn't seem like a workable way to actually analyze the hard parts of alignment or divide up the problem.

(For example, I think we can likely get OK definitions of what we value, along the lines of A Formalization of Indirect Normativity, but I've mostly stopped working along these lines because it no longer seems directly useful.)

It seems more obvious that multiagent systems just falls outside of the definition-optimization framework, which seems to be a point in its favor as far as conceptual clarity is concerned.

I agree.

Of course, it also seems quite likely that AIs of the kind that will probably be built ("by default") also fall outside of the definition-optimization framework. So adopting this framework as a way to analyze potential aligned AIs seems to amount to narrowing the space considerably.

Comment by paulfchristiano on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-18T15:44:37.946Z · score: 20 (15 votes) · LW · GW

The area where I'd be most excited to see philosophical work is "when should we be sad if AI takes over, vs. being happy for it?" This seems like a natural ethical question that could have significant impacts on prioritization. Moreover, if the answer is "we should be fine with some kinds of AI taking over" then we can try to create that kind of AI as an alternative to creating aligned AI.

Comment by paulfchristiano on Open question: are minimal circuits daemon-free? · 2019-08-05T15:23:50.026Z · score: 6 (3 votes) · LW · GW

No, I think a simplicity prior clearly leads to daemons in the limit.

Comment by paulfchristiano on Techniques for optimizing worst-case performance · 2019-07-31T15:47:09.088Z · score: 4 (2 votes) · LW · GW
the number of queries to the model / specification required to obtain worst-case guarantees is orders of magnitude more than the number of queries needed to train the model, and this ratio gets worse the more complex your environment is

Not clear to me whether this is true in general---if the property you are specifying is in some sense "easy" to satisfy (e.g. it holds of the random model, holds for some model near any given model), and the behavior you are training is "hard" (e.g. requires almost all of the model's capacity) then it seems possible that verification won't add too much.

Comment by paulfchristiano on Techniques for optimizing worst-case performance · 2019-07-30T21:34:25.106Z · score: 4 (2 votes) · LW · GW

Making the specification faster than the model doesn't really help you. In this case the specification is a somewhat more expensive than the model itself, but as far as I can tell that should just make verification somewhat more expensive.

Comment by paulfchristiano on Techniques for optimizing worst-case performance · 2019-07-30T21:31:38.798Z · score: 4 (2 votes) · LW · GW
The argument that we can only focus on the training data makes the assumption that the AI system is not going to generalize well outside of the training dataset.

I'm not intending to make this assumption. The claim is: parts of your model that exhibit intelligence need to do something on the training distribution, because "optimize to perform well on the training distribution" is the only mechanism that makes the model intelligent.

Comment by paulfchristiano on Ought: why it matters and ways to help · 2019-07-27T16:24:06.856Z · score: 23 (10 votes) · LW · GW

There is a large economics literature on principal agent problems, optimal contracting, etc.; these usually consider the situation where we can discover the ground truth or see the outcome of a decision (potentially only partially, or at some cost) and the question is how to best structure incentives in light of that. This typically holds for a profit-maximizing firm, at least to some extent, since they ultimately want to make money. I'm not aware of work in economics that addresses the situation where there is no external ground truth, except to prove negative results which justify the use of other assumptions. I don't believe there's much that would be useful to Ought, probably because it's a huge mess and hard to fit into the field's usual frameworks.

(I actually think even the core economics questions relevant to Ought, where you do have a ground truth and expensive monitoring, a pool of risk-averse expert some of whom are malicious, etc.; aren't fully answered in the economics literature, and that these versions of the questions aren't a major focus in economics despite being theoretically appealing from a certain perspective. But (i) I'm much less sure of that, and someone would need to have some discussion with relevant experts to find out, (ii) in that setting I do think economists have things to say even if they haven't answered all of the relevant questions.)

In practice, I think institutions are basically always predicated on one of (i) having some trusted experts, or a principal with understanding of the area, (ii) having someone trusted who can at least understand the expert's reasoning when adequately explained, (iii) being able to monitor outcomes to see what ultimately works well. I don't really know of institutions that do well when none of (i)-(iii) apply. Those work OK in practice today but seem to break down quickly as you move to the setting with powerful AI (though even today I don't think they work great and would hope that a better understanding could help, I just wouldn't necessarily expect it to help as much as work that engages directly with existing institutions and their concrete failures).

Comment by paulfchristiano on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-25T18:18:31.633Z · score: 2 (1 votes) · LW · GW This seems to ignore regularizers that people use to try to prevent overfitting and to make their models generalize better. Isn't that liable to give you bad intuitions versus the actual training methods people use and especially the more advanced methods of generalization that people will presumably use in the future? "The best model" is usually regularized. I don't think this really changes the picture compared to imagining optimizing over some smaller space (e.g. space of models with regularize<x). In particular, I don't think my intuitions are sensitive to the difference. I don't understand what you mean in this paragraph (especially "since each possible parameter setting is being evaluated on what other parameter settings say anyway") The normal procedure is: I gather data, and am using the model (and other ML models) while I'm gathering data. I search over parameters to find the ones that would make the best predictions on that data. I'm not finding parameters that result in good predictive accuracy when used in the world. I'm generating some data, and then finding the parameters that make the best predictions about that data. That data was collected in a world where there are plenty of ML systems (including potentially a version of my oracle with different parameters). Yes, the normal procedure converges to a fixed point. But why do we care / why is that bad? I wonder if you could write a fuller explanation of your views here, and maybe include your response to Stuart's reasons for changing his mind? (Or talk to him again and get him to write the post for you. :) I take a perspective where I want to use ML techniques (or other AI algorithms) to do useful work, without introducing powerful optimization working at cross-purposes to humans. On that perspective I don't think any of this is a problem (or if you look at it another way, it wouldn't be a problem if you had a solution that had any chance at all of working). I don't think Stuart is thinking about it in this way, so it's hard to engage at the object level, and I don't really know what the alternative perspective is, so I also don't know how to engage at the meta level. Is there a particular claim where you think there is an interesting disagreement? Couldn't you simulate that with Opt by just running it repeatedly? If I care about competitiveness, rerunning OPT for every new datapoint is pretty bad. (I don't think this is very important in the current context, nothing depends on competitiveness.) Comment by paulfchristiano on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-20T16:08:33.824Z · score: 4 (2 votes) · LW · GW
So this is an argument against the setup of the contest, right? Because the OP seems to be asking us to reason from incentives, and presumably will reward entries that do well under such analysis:

This is an objection to reasoning from incentives, but it's stronger in the case of some kinds of reasoning from incentives (e.g. where incentives come apart from "what kind of policy would be selected under a plausible objective"). It's hard for me to see how nested vs. sequential really matters here.

On a more object level, for reasoning from selection, what model class and training method would you suggest that we assume?

(I don't think model class is going to matter much.)

I think training method should get pinned down more. My default would just be the usual thing people do: pick the model that has best predictive accuracy over the data so far, considering only data where there was an erasure.

(Though I don't think you really need to focus on erasures, I think you can just consider all the data, since each possible parameter setting is being evaluated on what other parameter settings say anyway. I think this was discussed in one of Stuart's posts about "forward-looking" vs. "backwards-looking" oracles?)

I think it's also interesting to imagine internal RL (e.g. there are internal randomized cognitive actions, and we use REINFORCE to get gradient estimates---i.e. you try to increase the probability of cognitive actions taken in rounds where you got a lower loss than predicted, and decrease the probability of actions taken in rounds where you got a higher loss), which might make the setting a bit more like the one Stuart is imagining.

ETA: Is an instance of the idea to see if we can implement something like counterfactual oracles using your Opt? I actually did give that some thought and nothing obvious immediately jumped out at me. Do you think that's a useful direction to think?

Seems like the counterfactually issue doesn't come up in the Opt case, since you aren't training the algorithm incrementally---you'd just collect a relevant dataset before you started training. I think the Opt setting throws away too much for analyzing this kind of situation, and would want to do an online learning version of OPT (e.g. you provide inputs and losses one at a time, and it gives you the answer of the mixture of models that would do best so far).

Comment by paulfchristiano on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-17T22:54:10.311Z · score: 8 (4 votes) · LW · GW You may well be right about this, but I'm not sure what reason from selection means. Can you give an example or say what it implies about nested vs sequential queries? What I want: "There is a model in the class that has property P. Training will find a model with property P." What I don't want: "The best way to get a high reward is to have property P. Therefore a model that is trying to get a high reward will have property P." Example of what I don't want: "Manipulative actions don't help get a high reward (at least for the episodic reward function we intended), so the model won't produce manipulative actions." Comment by paulfchristiano on The AI Timelines Scam · 2019-07-12T04:32:14.737Z · score: 15 (5 votes) · LW · GW See Marcus's medium article for more details on how he's been criticized Skimming that post it seems like he mentions two other incidents (beyond the thread you mention). Gary Marcus: @Ylecun Now that you have joined the symbol-manipulating club, I challenge you to read my arxiv article Deep Learning: Critical Appraisal carefully and tell me what I actually say there that you disagree with. It might be a lot less than you think. Yann LeCun: Now that you have joined the gradient-based (deep) learning camp, I challenge you to stop making a career of criticizing it without proposing practical alternatives. Yann LeCun: Obviously, the ability to criticize is not contingent on proposing alternatives. However, the ability to get credit for a solution to a problem is contingent on proposing a solution to the problem. Gary Marcus: Folks, let’s stop pretending that the problem of object recognition is solved. Deep learning is part of the solution, but we are obviously still missing something important. Terrific new examples of how much is still be solved here: #AIisHarderThanYouThink Critic: Nobody is pretending it is solved. However, some people are claiming that people are pretending it is solved. Name me one researcher who is pretending? Gary Marcus: Go back to Lecun, Bengio and Hinton’s 9 page Nature paper in 2015 and show me one hint there that this kind of error was possible. Or recall initial dismissive reaction to https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf … Yann LeCun: Yeah, obviously we "pretend" that image recognition is solved, which is why we have a huge team at Facebook "pretending" to work on image recognition. Also why 6500 people "pretended" to attend CVPR 2018. The most relevant quote from the Nature paper he is criticizing (he's right that it doesn't discuss methods working poorly off distribution): Unsupervised learning had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning. Although we have not focused on it in this Review, we expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object. Human vision is an active process that sequentially samples the optic array in an intelligent, task-specific way using a small, high-resolution fovea with a large, low-resolution surround. We expect much of the future progress in vision to come from systems that are trained end-toend and combine ConvNets with RNNs that use reinforcement learning to decide where to look. Systems combining deep learning and reinforcement learning are in their infancy, but they already outperform passive vision systems at classification tasks and produce impressive results in learning to play many different video games. Natural language understanding is another area in which deep learning is poised to make a large impact over the next few years. We expect systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time. Ultimately, major progress in artificial intelligence will come about through systems that combine representation learning with complex reasoning. Although deep learning and simple reasoning have been used for speech and handwriting recognition for a long time, new paradigms are needed to replace rule-based manipulation of symbolic expressions by operations on large vectors Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-11T20:02:39.369Z · score: 4 (2 votes) · LW · GW Housing markets move because they depend on the expectation of future rents. If I want to expose myself to future rents, I have to take on volatility in the expectation of future rents, that's how the game goes. Comment by paulfchristiano on The AI Timelines Scam · 2019-07-11T18:31:35.511Z · score: 55 (15 votes) · LW · GW in part because I don't have much to say on this issue that Gary Marcus hasn't already said. It would be interesting to know which particular arguments made by Gary Marcus you agree with, and how you think they relate to arguments about timelines. In this preliminary doc, it seems like most the disagreement is driven by saying there is a 99% probability that training a human-level AI would take more than 10,000x more lifetimes than AlphaZero took games of go (while I'd be at more like 50%, and have maybe 5-10% chance that it will take many fewer lifetimes). Section 2.0.2 admits this is mostly guesswork, but ends up very confident the number isn't small. It's not clear where that particular number comes from, the only evidence gestured at is "the input is a lot bigger, so it will take a lot more lifetimes" which doesn't seem to agree with our experience so far or have much conceptual justification. (I guess the point is that the space of functions is much bigger? but if comparing the size of the space of functions, why not directly count parameters?) And why is this a lower bound? Overall this seems like a place you disagree confidently with many people who entertain shorter timelines, and it seems unrelated to anything Gary Marcus says. Comment by paulfchristiano on The AI Timelines Scam · 2019-07-11T08:42:48.090Z · score: 139 (47 votes) · LW · GW I agree with: • Most people trying to figure out what's true should be mostly trying to develop views on the basis of public information and not giving too much weight to supposed secret information. • It's good to react skeptically to someone claiming "we have secret information implying that what we are doing is super important." • Understanding the sociopolitical situation seems like a worthwhile step in informing views about AI. • It would be wild if 73% of tech executives thought AGI would be developed in the next 10 years. (And independent of the truth of that claim, people do have a lot of wild views about automation.) I disagree with: • Norms of discourse in the broader community are significantly biased towards short timelines. The actual evidence in this post seems thin and cherry-picked. I think the best evidence is the a priori argument "you'd expect to be biased towards short timelines given that it makes our work seem more important." I think that's good as far as it goes but the conclusion is overstated here. • "Whistleblowers" about long timelines are ostracized or discredited. Again, the evidence in your post seems thin and cherry-picked, and your contemporary example seems wrong to me (I commented separately). It seems like most people complaining about deep learning or short timelines have a good time in the AI community, and people with the "AGI in 20 years" view are regarded much more poorly within academia and most parts of industry. This could be about different fora and communities being in different equilibria, but I'm not really sure how that's compatible with "ostracizing." (It feels like you are probably mistaken about the tenor of discussions in the AI community.) • That 73% of tech executives thought AGI would be developed in the next 10 years. Willing to bet against the quoted survey: the white paper is thin on details and leaves lots of wiggle room for chicanery, while the project seems thoroughly optimized to make AI seem like a big deal soon. The claim also just doesn't seem to match my experience with anyone who might be called tech executives (though I don't know how they constructed the group). Comment by paulfchristiano on The AI Timelines Scam · 2019-07-11T07:50:18.378Z · score: 57 (22 votes) · LW · GW For reference, the Gary Marcus tweet in question is: “I’m not saying I want to forget deep learning... But we need to be able to extend it to do things like reasoning, learning causality, and exploring the world .” - Yoshua Bengio, not unlike what I have been saying since 2012 in The New Yorker. I think Zack Lipton objected to this tweet because it appears to be trying to claim priority. (You might have thought it's ambiguous whether he's claiming priority, but he clarifies in the thread: "But I did say this stuff first, in 2001, 2012 etc?") The tweet and his writings more generally imply that people in the field have recently changed their view to agree with him, but many people in the field object strongly to this characterization. The tweet is mostly just saying "I told you so." That seems like a fine time for people to criticize him about making a land grab rather than engaging on the object level, since the tweet doesn't have much object-level content. For example: "Saying it louder ≠ saying it first. You can't claim credit for differentiating between reasoning and pattern recognition." [...] is essentially a claim that everybody knows that deep learning can't do reasoning. But, this is essentially admitting that Marcus is correct, while still criticizing him for saying it. Hopefully Zack's argument makes more sense if you view it as a response to Gary Marcus claiming priority. Which is what Gary Marcus was doing and clearly what Zack is responding to. This is not a substitute for engagement on the object level. Saying "someone else, and in fact many people in the relevant scientific field, already understood this point" is an excellent response to someone who's trying to claim credit for the point. There are reasonable points to make about social epistemology here, but I think you're overclaiming about the treatment of critics, and that this thread in particular is a bad example to point to. It also seems like you may be mistaken about some of the context. (Zack Lipton has no love for short-timelines-pushers and isn't shy about it. He's annoyed at Gary Marcus for making bad arguments and claiming unwarranted credit, which really is independent of whether some related claims are true.) Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-08T22:03:05.446Z · score: 8 (4 votes) · LW · GW I'm not really making a claim about momentum, I'm just skeptical of your basic analysis. Real 30-year interest rates are ~1%, taxes are ~1%, and I think maintenance averages ~1%. So that's ~3%/year total cost, which seems comparable to rent in areas like SF. On top of that I think historical appreciation is around 1% (we should expect it to be somewhere between "no growth" and "land stays a constant fraction of GDP"). So that looks like buying should ballpark 10-30% cheaper if you ignore all the transaction costs, presumably because rent prices are factoring in a bunch of frictions. That sounds plausible enough to me, but in reality I expect this is a complicated mess that you can't easily sort out in a short blog post and varies from area to area. If you want to argue for "buying is usually a terrible idea, investors are idiots or speculators" I think you should be getting into the actual numbers. Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-08T21:54:44.365Z · score: 4 (2 votes) · LW · GW I'm claiming that with covariance data such a thing could be constructed. I'll bet against. Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-07T16:43:31.243Z · score: 2 (1 votes) · LW · GW I meant that you can get a better deal. You can get something that is only marginally more correlated but much much cheaper in the market. What's the alternative? Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T20:26:46.000Z · score: 9 (5 votes) · LW · GW This assumes rational markets. It's like buying a bar because you assume bar owners are making money, or buying an airline because you assume airline owners are making money. [...] that doesn't matter if you're playing against idiots with incoherent time preferences in popular markets. You'll need to outbid someone making a poor financial decision. If you want to convince the reader "it turns out most investors are getting a bad deal, and are only doing it because they are idiots" then I think the burden of proof is switched, now you are the one claiming that the reader should be convinced to disagree with investors who've thought about it a lot more. (I also don't think you are right on the object level, but it's a bit hard to say without seeing the analysis spelled out. There's no way that nominal rather than real interest rates are the important thing for decision-making in this case---if we just increase inflation and mortgage rates and rent increase caps by 5%, that should clearly have no impact on the buy vs. rent decision. You can convert the appreciation into more cashflow if you want to.) Why do landlords do it if they aren't building meaningful equity at super high rent to buy ratios? Leveraged speculation (aka high stakes gambling). The basic argument in favor is: if I want to keep living in the area, then I'm going to have to pay if the rent goes up. Not buying is more of a gamble than buying: if you buy you know that you are just going to keep making mortgage payments + taxes + maintenance, while if you don't buy you have no idea what your expenses are going to be in twenty year's time. (But I totally agree that buying a house depends on knowing where you want to live for a reasonably long time. I also agree that e.g. when housing prices go up your wages are likely to go up, so you don't want to totally hedge this out.) the optimal hedge is whatever grows your money the fastest in a way that isn't correlated with your other cash flows + assets. The optimal hedge is anticorrelated with your other cashflows, not uncorrelated. In this case, future rent is one of your biggest cashflows. A portfolio that is better hedged against local markets and less volatile than housing can be constructed and will grow faster than rent reliably. What's better correlated with my future rent, then an asset whose value is exactly equal to my future rent payments? It seems like your argument should come down to one of: • You don't know where you'll be living in the future. I think this is really the core question---how confident should you be about where you are living before buying makes sense? My guess is that it's in the ballpark of the rent-to-buy ratio. I think people could make better decisions by (i) pinning down that number for the markets they care about, (ii) thinking more clearly about that question, e.g. correcting for overconfidence. • You may be tied down to an area, but should be willing to move out of your house if it becomes more expensive, so you shouldn't hedge against idiosyncratic risk in this exact neighborhood. This depends on how attached people get to houses and how large is the idiosyncratic real estate risk. (Maybe this is what you mean by "hedged against local markets"?) I don't know the numbers here, but I don't think it's obvious. Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T20:02:38.175Z · score: 2 (1 votes) · LW · GW 100% exposure to local real estate is different than 100% exposure to a particular house Sure. If you are happy living in that house it doesn't seem like a big deal. Maybe you could do better by being willing to move out of that house and into another house if this one became more expensive, but for lots of people that's a pain in the ass (and the idiosyncratic volatility is a small enough deal) that they wouldn't want to do it and would prefer just hedge against the risk. It might be the best hedge available due to taxes and leverage for some individuals but only once they have enough money that they won't be screwed if their housing investment goes sideways. If you can make your mortgage payments, how do you end up screwed if the investment goes sideways? In general, how does this change the basic calculus: I need to pay rent in this house, so I want to hedge against changes in rent? I agree we can have a quantitative discussion about what's optimal, but to do that you have to actually engage with the quantitative question. Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T16:51:28.314Z · score: 9 (5 votes) · LW · GW First let's address the idea that renting is 'throwing money away'. This ignores the opportunity cost of investing extra income and the lump sum of the down payment into a house instead of the stock market. People spend on average 50% more money on mortgages, taxes, and fees than they spend on rent. People who rent houses make money from doing it---landlords aren't in it out of the goodness of their hearts. I guess maybe you can get this kind of gap if you use nominal rather than real interest rates? But that doesn't seem financially relevant. To first order you are going to be paying the same amount whether you buy or rent, unless the market is irrational. The most important corrections are (i) you'll be making a big investment in housing and the sales price will incorporate the value of that investment, so you need to think about whether you want that investment, (ii) there are transaction costs and frictions from both buying and renting, and the buying transaction costs will be larger unless you are living in one place for a reasonably long time. (Maybe 10 years is the right ballpark for amortizing out transaction costs of buying---e.g. if those costs are 10% per sale, then over ten years that's 1% of the value of the house per year. At rent to buy of 30, that's about one third of rent. Some of the buying overhead will be embedded in rent, since landlords need to break even. And I'd bet the total inefficiencies of renting are in the ballpark of 10-20% of rent.) So if you are living long enough to amortize out the buying transaction costs, now we are comparing the investment in housing to the investment in the stock market. The most important argument here is that if rent goes up your obligations go up, so you want investments that go up in that case as well. If you are staying in one place, owning a house is basically the optimal hedge against future rent obligations (since an increase in expected future rents will increased your wealth by exactly the same amount). I think there are a lot of caveats to that argument, but it's a reasonable first order approximation and it's a mistake to think about the finances of buying a house without considering it. This depends mostly on living in the same real estate market for a long time, not living in exactly the same house (though it's even better if you are living in the exact same house). Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T16:30:40.169Z · score: 13 (7 votes) · LW · GW When thinking about whether to invest your money into a house or something else, typically you want to decouple that decision from your living situation and see if it still makes sense. Regardless of where I live—given my net worth, does it make sense for$XXX,000 of my investment portfolio to be tied into this specific piece of real estate?

The usual financial justification for owning a house is that you are "naturally short housing." If you expect to live in the same place your whole life, then you'll owe more money when rent goes up. So you want investments that go up in money when rent goes up and down when rent goes down. Being 100% exposed to the local real estate market is roughly optimal (though less if you aren't certain you are living there, or if you are more likely to move neighborhoods and you don't want exposure to this particular house). I don't think you can decouple these two.

Comment by paulfchristiano on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-04T04:59:05.720Z · score: 3 (2 votes) · LW · GW in the sequential case manipulation can only change the distribution of inputs that the Oracles receive, but it doesn't improve performance on any particular given input Why is that? Doesn't my behavior on question #1 affect both question #2 and its answer? Also, this feels like a doomed game to me---I think we should be trying to reason from selection rather than relying on more speculative claims about incentives. Comment by paulfchristiano on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-03T22:37:49.549Z · score: 2 (1 votes) · LW · GW

I mean, if the oracle hasn't yet looked at the question they could use simulation warfare to cause the preceding oracles to take actions that lead to them getting given easier questions. Once you start unbarring all holds, stuff gets wild.

Comment by paulfchristiano on Contest: \$1,000 for good questions to ask to an Oracle AI · 2019-07-03T17:11:59.250Z · score: 7 (4 votes) · LW · GW

I'm not sure I understand the concern. Isn't the oracle answering each question to maximize its payoff on that question in event of an erasure? So it doesn't matter if you ask it other questions during the evaluation period. (If you like, you can say that you are asking them to other oracles---or is there some way that an oracle is a distinguished part of the environment?)

If the oracle cares about its own performance in a broader sense, rather than just performance on the current question, then don't we have a problem anyway? E.g. if you ask it question 1, it will be incentivized to make it get an easier question 2? For example, if you are concerned about coordination amongst different instances of the oracle, this seems like it's a problem regardless.

I guess you can construct a model where the oracle does what you want, but only if you don't ask any other oracles questions during the evaluation period, but it's not clear to me how you would end up in that situation and at that point it seems worth trying to flesh out a more precise model.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:57:53.820Z · score: 2 (1 votes) · LW · GW

Determining whether aligned AI is impossible seems harder than determining whether there is any hope for a knowably-aligned AI.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:56:16.837Z · score: 3 (2 votes) · LW · GW

The point of working in this setting is mostly to constrain the search space or make it easier to construct an impossibility argument.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:55:03.459Z · score: 2 (1 votes) · LW · GW
When you say "test" do you mean testing by writing a single program that outputs whether the model performs badly on a given input (for any input)?
If so, I'm concerned that we won't be able to write such a program.

That's the hope. (Though I assume we mostly get it by an application of Opt, or more efficiently by modifying our original invocation of Opt to return a program with some useful auxiliary functions, rather than by writing it by hand.)

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:49:19.102Z · score: 2 (1 votes) · LW · GW

If dropping competitiveness, what counts as a solution? Is "imitate a human, but run it fast" fair game? We could try to hash out the details in something along those lines, and I think that's worthwhile, but I don't think it's a top priority and I don't think the difficulties will end up being that similar. I think it may be productive to relax the competitiveness requirement (e.g. to allow solutions that definitely have at most a polynomial slowdown), but probably not a good idea to eliminate it altogether.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-01T16:58:57.727Z · score: 3 (2 votes) · LW · GW

That seems fine though. If the model behaves badly on any input we can test that. If the model wants to behave well on every input, then we're happy. If it wants to behave badly on some input, we'll catch it.

Are you concerned that we can't test whether the model is behaving badly on a particular input? I think if you have that problem you are also in trouble for outer alignment.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T19:10:17.028Z · score: 2 (1 votes) · LW · GW

The idea is that "the task being trained" is something like: 50% what you care about at the object level, 50% the subtasks that occur in the evaluation process. The model may sometimes get worse at the evaluation process, or at the object level task, you are just trying to optimize some weighted combination.

There are a bunch of distinct difficulties here. One is that the distribution of "subtasks that occur in the evaluation process" is nonstationary. Another is that we need to set up the game so that doing both evaluation and the object level task is not-much-harder than just doing the object level task.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T17:13:16.556Z · score: 2 (1 votes) · LW · GW

I think that when a design problem is impossible, there is often an argument for why it's impossible. Certainly that's not obvious though, and you might just be out of luck. (That said, it's also not obvious that problems in are easier to solve than , both contain problems you just can't solve and so you are relying on extra structural facts in either case.)

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T17:10:08.000Z · score: 2 (1 votes) · LW · GW

If we can test whether the model is behaving badly on a given input, then we can use Opt to search for any input where the model behaves badly. So we can end up with a system that works well on distribution and doesn't work poorly off distribution. If it's possible to handle outer alignment in this setting, I expect inner alignment will also be possible. (Though it's not obvious, since you might take longer to learn given an unaligned learned optimizer.)