Model Mis-specification and Inverse Reinforcement Learning

2018-11-09T15:33:02.630Z · score: 32 (10 votes)

Latent Variables and Model Mis-Specification

2018-11-07T14:48:40.434Z · score: 19 (7 votes)
Comment by jsteinhardt on Ben Hoffman's donor recommendations · 2018-07-30T17:59:04.732Z · score: 2 (1 votes) · LW · GW

I don't understand why this is evidence that "EA Funds (other than the global health and development one) currently funges heavily with GiveWell recommended charities", which was Howie's original question. It seems like evidence that donations to OpenPhil (which afaik cannot be made by individual donors) funge against donations to the long-term future EA fund.

Comment by jsteinhardt on RFC: Philosophical Conservatism in AI Alignment Research · 2018-05-15T04:24:03.648Z · score: 13 (3 votes) · LW · GW

I like the general thrust here, although I have a different version of this idea, which I would call "minimizing philosophical pre-commitments". For instance, there is a great deal of debate about whether Bayesian probability is a reasonable philosophical foundation for statistical reasoning. It seems that it would be better, all else equal, for approaches to AI alignment to not hinge on being on the right side of this debate.

I think there are some places where it is hard to avoid pre-commitments. For instance, while this isn't quite a philosophical pre-commitment, it is probably hard to develop approaches that are simultaneously optimized for short and long timelines. In this case it is probably better to explicitly do case splitting on the two worlds and have some subset of people pursuing approaches that are good in each individual world.

Comment by jsteinhardt on [deleted post] 2018-03-19T19:43:11.984Z

FWIW I understood Zvi's comment, but feel like I might not have understood it if I hadn't played Magic: The Gathering in the past.

EDIT: Although I don't understand the link to Sir Arthur's green knight, unless it was a reference to the fact that M:tG doesn't actually have a green knight card.

Comment by jsteinhardt on Takeoff Speed: Simple Asymptotics in a Toy Model. · 2018-03-06T13:54:41.342Z · score: 28 (8 votes) · LW · GW

Thanks for writing this Aaron! (And for engaging with some of the common arguments for/against AI safety work.)

I personally am very uncertain about whether to expect a singularity/fast take-off (I think it is plausible but far from certain). Some reasons that I am still very interested in AI safety are the following:

  • I think AI safety likely involves solving a number of difficult conceptual problems, such that it would take >5 years (I would guess something like 10-30 years, with very wide error bars) of research to have solutions that we are happy with. Moreover, many of the relevant problems have short-term analogues that can be worked on today. (Indeed, some of these align with your own research interests, e.g. imputing value functions of agents from actions/decisions; although I am particularly interested in the agnostic case where the value function might lie outside of the given model family, which I think makes things much harder.)
  • I suppose the summary point of the above is that even if you think AI is a ways off (my median estimate is ~50 years, again with high error bars) research is not something that can happen instantaneously, and conceptual research in particular can move slowly due to being harder to work on / parallelize.
  • While I have uncertainty about fast take-off, that still leaves some probability that fast take-off will happen, and in that world it is an important enough problem that it is worth thinking about. (It is also very worthwhile to think about the probability of fast take-off, as better estimates would help to better direct resources even within the AI safety space.)
  • Finally, I think there are a number of important safety problems even from sub-human AI systems. Tech-driven unemployment is I guess the standard one here, although I spend more time thinking about cyber-warfare/autonomous weapons, as well as changes in the balance of power between nation-states and corporations. These are not as clearly an existential risk as unfriendly AI, but I think in some forms would qualify as a global catastrophic risk; on the other hand I would guess that most people who care about AI safety (at least on this website) do not care about it for this reason, so this is more idiosyncratic to me.

Happy to expand on/discuss any of the above points if you are interested.

Best,

Jacob

Comment by jsteinhardt on Takeoff Speed: Simple Asymptotics in a Toy Model. · 2018-03-06T13:32:48.176Z · score: 13 (3 votes) · LW · GW

Very minor nitpick, but just to add, FLI is as far as I know not formally affiliated with MIT. (FHI is in fact a formal institute at Oxford.)

Comment by jsteinhardt on Zeroing Out · 2017-11-05T22:19:45.863Z · score: 28 (10 votes) · LW · GW

Hi Zvi,

I enjoy reading your posts because they often consist of clear explanations of concepts I wish more people appreciated. But I think this is the first instance where I feel I got something that I actually hadn't thought about before at all, so I wanted to convey extra appreciation for writing it up.

Best,

Jacob

Comment by jsteinhardt on Seek Fair Expectations of Others’ Models · 2017-10-20T03:53:12.702Z · score: 8 (4 votes) · LW · GW

I think the conflation is "decades out" and "far away".

Comment by jsteinhardt on [deleted post] 2017-10-17T03:04:59.264Z

Galfour was specifically asked to write his thought up in this thread: https://www.lesserwrong.com/posts/BEtzRE2M5m9YEAQpX/there-s-no-fire-alarm-for-artificial-general-intelligence/kAywLDdLrNsCvXztL

It seems either this was posted to the wrong place, or there is some disagreement within the community (e.g. between Ben in that thread and the people downvoting).

Comment by jsteinhardt on Oxford Prioritisation Project Review · 2017-10-14T18:08:10.872Z · score: 3 (2 votes) · LW · GW

Points 1-5 at the beginning of the post are all primarily about community-building and personal development externalities of the project, and not about the donation itself.

Comment by jsteinhardt on Oxford Prioritisation Project Review · 2017-10-14T03:58:56.583Z · score: 13 (3 votes) · LW · GW

?? If you literally mean minimum-wage, I think that is less than 10,000 pounds... although agree with the general thrust of your point about the money being more valuanle than the time (but think you are missing the spirit of the exercise as outlined in the post).

Comment by jsteinhardt on Robustness as a Path to AI Alignment · 2017-10-11T05:46:54.275Z · score: 11 (3 votes) · LW · GW

You might be interested in my work on learning from untrusted data (see also earlier work on aggregating unreliable human input). I think it is pretty relevant to what you discussed, although if you do not think it is, then I would also be pretty interested in understanding that.

Unrelated, but for quantilizers, isn't the biggest issue going to be that if you need to make a sequence of decisions, the probabilities are going to accumulate and give exponential decay? I don't see how to make a sequence of 100 decisions in a quantilizing way unless the base distribution of policies is very close to the target policy.

Comment by jsteinhardt on [deleted post] 2017-05-27T20:51:14.016Z

Parts of the house setup pattern-match to a cult, cult members aren't good at realizing when they need to leave, but their friends can probably tell much more easily.

(I don't mean the above as negatively as it sounds connotatively, but it's the most straightforward way to say what I think is the reason to want external people. I also think this reasoning degrades gracefully with the amount of cultishness.)

Comment by jsteinhardt on [deleted post] 2017-05-27T17:29:34.887Z

I think there's a difference between a friend that one could talk to (if they decide to), and a friend tasked with the specific responsibility of checking in and intervening if things seem to be going badly.

Comment by jsteinhardt on Scenario analysis: a parody · 2017-04-28T04:52:09.218Z · score: 1 (1 votes) · LW · GW

I feel like you're straw-manning scenario analysis. Here's an actual example of a document produced via scenario analysis: Global Trends 2035.

Comment by jsteinhardt on Effective altruism is self-recommending · 2017-04-21T23:35:42.407Z · score: 5 (5 votes) · LW · GW

When you downvote something on the EA forum, it becomes hidden. Have you tried viewing it while not logged in to your account? It's still visible to me.

Comment by jsteinhardt on An OpenAI board seat is surprisingly expensive · 2017-04-20T21:03:58.348Z · score: 1 (2 votes) · LW · GW

then the technical advisors at OPP must have a very specific approach to AI safety they are pushing very hard to get support for, but are unwilling or unable to articulate why they prefer theirs so strongly.

I don't think there is consensus among technical advisors on what directions are most promising. Also, Paul has written substantially about his preferred approach (see here for instance), and I've started to do the same, although so far I've been mostly talking about obstacles rather than positive approaches. But you can see some of my writing here and here. Also my thoughts in slide form here, although those slides are aimed at ML experts.

Comment by jsteinhardt on I Want To Live In A Baugruppe · 2017-03-20T02:31:47.253Z · score: 3 (3 votes) · LW · GW

Any attempt to enforce rationalists moving in is illegal.

Is this really true? Based on my experience (not any legal experience, just seeing what people generally do that is considered fine) I think in the Bay Area the following are all okay:

  • Only listing a house to your friends / social circle.
  • Interviewing people who want to live with you and deciding based on how much you like them.

The following are not okay:

  • Having a rule against pets that doesn't have an exception for seeing-eye dogs.
  • Explicitly deciding not to take someone as a house-mate only on the basis of some protected trait like race, etc. (but gender seems to be fine?).
Comment by jsteinhardt on [deleted post] 2016-12-13T21:03:05.679Z

Hi Eugene!

Comment by jsteinhardt on CFAR's new mission statement (on our website) · 2016-12-10T10:10:48.630Z · score: 8 (8 votes) · LW · GW

Thanks for posting this, I think it's good to make these things explicit even if it requires effort. One piece of feedback: I think someone who reads this who doesn't already know what "existential risk" and "AI safety" are will be confused (they suddenly show up in the second bullet point without being defined, though it's possible I'm missing some context here).

Comment by jsteinhardt on [deleted post] 2016-12-02T05:55:36.341Z

70% this account is Eugene.

Comment by jsteinhardt on Why GiveWell can't recommend MIRI or anything like it · 2016-12-01T02:59:20.093Z · score: 4 (4 votes) · LW · GW

I don't think you are actually making this argument, but this comes close to an uncharitable view of GiveWell that I strongly disagree with, which goes something like "GiveWell can't recommend MIRI because it would look weird and be bad for their brand, even if they think that MIRI is actually the best place to donate to." I think GiveWell / OpenPhil are fairly insensitive to considerations like this and really just want to recommend the things they actually think are best independent of public opinion. The separate branding decision seems like a clearly good idea to me, but I think that if for some reason OpenPhil were forced to have inseparable branding from GiveWell, they would be making the same recommendations.

Comment by jsteinhardt on Why GiveWell can't recommend MIRI or anything like it · 2016-11-30T04:34:17.989Z · score: 7 (7 votes) · LW · GW

Also like: here is a 4000-word evaluation of MIRI by OpenPhil. ???

Comment by jsteinhardt on Open Thread, Aug. 15. - Aug 21. 2016 · 2016-08-16T08:29:28.145Z · score: 12 (12 votes) · LW · GW

(FYI, this was almost certainly downvoted by Eugene_Nier's sockpuppets rather than actual people. I upvoted, hopefully others will as well to counteract the trollling.)

Comment by jsteinhardt on Open thread, Jul. 18 - Jul. 24, 2016 · 2016-07-18T17:31:28.007Z · score: 1 (1 votes) · LW · GW

Wait what? How are you supposed to meet your co-founder / early employees without connections? College is like the ideal place to meet people to start start-ups with.

Comment by jsteinhardt on Open thread, Jul. 11 - Jul. 17, 2016 · 2016-07-14T01:17:27.506Z · score: 1 (1 votes) · LW · GW

I don't think I need that for my argument to work. My claim is that if people get, say, less than 70% of a meal's worth of food, an appreciable fraction (say at least 30%) will get cranky.

Comment by jsteinhardt on Open thread, Jul. 11 - Jul. 17, 2016 · 2016-07-13T21:01:50.643Z · score: 1 (1 votes) · LW · GW

But like, there's variation in how much food people will end up eating, and at least some of that is not variation that you can predict in advance. So unless you have enough food that you routinely end up with more than can be eaten, you are going to end up with a lot of cranky people a non-trivial fraction of the time. You're not trying to peg production to the mean consumption, but (e.g.) to the 99th percentile of consumption.

Comment by jsteinhardt on Open thread, Jul. 11 - Jul. 17, 2016 · 2016-07-13T15:45:52.406Z · score: 2 (2 votes) · LW · GW

I don't think this is really a status thing, more a "don't be a dick to your guests" thing. Many people get cranky if they are hungry, and putting 30+ cranky people together in a room is going to be a recipe for unpleasantness.

Comment by jsteinhardt on Open thread, Jul. 04 - Jul. 10, 2016 · 2016-07-06T06:58:50.094Z · score: 2 (2 votes) · LW · GW

I don't really think this is spending idiosyncrasy credits... but maybe we hang out in different social circles.

Comment by jsteinhardt on Open Thread April 25 - May 1, 2016 · 2016-04-29T06:26:53.461Z · score: 1 (1 votes) · LW · GW

I assume at least some of the downvotes are from Eugene sockpuppets (he tends to downvote any suggestions that would make it harder to do his trolling).

Comment by jsteinhardt on Yoshua Bengio on AI progress, hype and risks · 2016-01-30T08:01:43.829Z · score: 13 (15 votes) · LW · GW

+1 To go even further, I would add that it's unproductive to think of these researchers as being on anyone's "side". These are smart, nuanced people and rounding their comments down to a specific agenda is a recipe for misunderstanding.

Comment by jsteinhardt on Voiceofra is banned · 2015-12-25T22:11:05.511Z · score: 4 (6 votes) · LW · GW

I'm well aware. It is therefore even more problematic if this account is abused --- note that there have been multiple confirmations that username2 has been used to downvote the same people that VoiceOfRa was downvoting before; in addition, VoiceOfRa has used the username2 account to start arguments with NancyLebovitz in a way that makes it look like a 3rd party is disagreeing with the decision, rather than VoiceOfRa himself. At the very least, it is better if everyone is aware of this situation, and ideally we would come up with a way to prevent such abuse.

Comment by jsteinhardt on Voiceofra is banned · 2015-12-24T22:09:40.905Z · score: 4 (4 votes) · LW · GW

I was 85% sure at the time that username2's comment was posted. I'm now 98% sure for a variety of reasons.

I'm only 75% sure that the upvotes on "username2"/VoiceOfRa's comments above are from sockpuppets.

Comment by jsteinhardt on Voiceofra is banned · 2015-12-24T07:43:01.802Z · score: 9 (9 votes) · LW · GW

Requesting a transparency report.

I think it's bad form to make costly (in terms of time) requests to moderators unless you're willing to be part of the solution. In this case, it would be good at minimum to outline exactly what you mean by a "transparency report" --- concretely, what sort of information would you like to see, and why would it be helpful? It would be even better if you were willing to volunteer to help in creating the report to the extent that the help can be utilized.

Comment by jsteinhardt on Voiceofra is banned · 2015-12-24T06:20:20.024Z · score: 7 (15 votes) · LW · GW

I'm 85% sure that you're VoiceOfRa.

Comment by jsteinhardt on Voiceofra is banned · 2015-12-24T06:16:47.920Z · score: 11 (13 votes) · LW · GW

I'm dubious that that constitutes abusing her power; AdvancedAtheist was highly and consistently downvoted for a long period of time before being banned.

Comment by jsteinhardt on Voiceofra is banned · 2015-12-24T02:14:55.722Z · score: 5 (5 votes) · LW · GW

As Romeo noted, Nancy was appointed roughly by popular acclaim (more like, a small number of highly dedicated and respected users appointing her, and no one objecting). I think it's reasonable in general to give mods a lot of discretionary power, and trust other veteran users to step in if things take a turn for the worse.

Comment by jsteinhardt on Marketing Rationality · 2015-11-20T17:15:28.992Z · score: 12 (16 votes) · LW · GW

My main update from this discussion has been a strong positive update about Gleb Tsipursky's character. I've been generally impressed by his ability to stay positive even in the face of criticism, and to continue seeking feedback for improving his approaches.

Comment by jsteinhardt on [Link] Lifehack Article Promoting LessWrong, Rationality Dojo, and Rationality: From AI to Zombies · 2015-11-17T17:14:27.534Z · score: 8 (10 votes) · LW · GW

I can understand your dislike of Gleb's approach and even see many of your concerns as justified; do you really think your actions in this thread are helping you get what you want though? They certainly won't make Gleb himself listen to you, and they also don't make you sympathetic to onlookers. To the extent that you have issues with Gleb's actions, it seems like pointing them out in a non-abusive way for others to judge would be far more effective.

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-15T07:48:24.518Z · score: 0 (0 votes) · LW · GW

Yeah; I discussed this with some others and came to the same conclusion. I do still think that one should explain why the preferred basis ends up being as meaningful as it does, but agree that this is a much more minor objection.

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-10T08:37:52.806Z · score: 2 (2 votes) · LW · GW

Cool, thanks for the paper, interesting read!

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-10T08:29:52.077Z · score: 1 (1 votes) · LW · GW

Yeah I should be a bit more careful on number 4. The point is that many papers which argue that a given NN is learning "natural" representations do so by looking at what an individual hidden unit responds to (as opposed to looking at the space spanned by the hidden layer as a whole). Any such argument seems dubious to me without further support, since it relies on a sort of delicate symmetry-breaking which can only come from either the training procedure or noise in the data, rather than the model itself. But I agree that if such an argument was accompanied by justification of why the training procedure or data noise or some other factor led to the symmetry being broken in a natural way, then I would potentially be happy.

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-10T04:05:14.284Z · score: 1 (1 votes) · LW · GW

Without defining a natural representation (since I don't know how to), here's 4 properties that I think a representation should satisfy before it's called natural (I also give these in my response to Vika):

(1) Good performance on different data sets in the same domain.

(2) Good transference to novel domains.

(3) Robustness to visually imperceptible perturbations to the input image.

(4) "Canonicality": replacing the learned features with a random invertible linear transformation of the learned features should degrade performance.

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-10T04:04:05.843Z · score: 2 (2 votes) · LW · GW

I know there are many papers that show that neural nets learn features that can in some regimes be given nice interpretations. However in all cases of which I am aware where these representations have been thoroughly analyzed, they seem to fail obvious tests of naturality, which would include things like:

(1) Good performance on different data sets in the same domain.

(2) Good transference to novel domains.

(3) Robustness to visually imperceptible perturbations to the input image.

Moreover, ANNs almost fundamentally cannot learn natural representations because they fail what I would call the "canonicality" test:

(4) Replacing the learned features with a random invertible linear transformation of the learned features should degrade performance.

Note that the reason for (4) is that if you want to interpret an individual hidden unit in an ANN as being meaningful, then it can't be the case that a random linear combination of lots of units is equally meaningful (since a random linear combination of e.g. cats and dogs and 100 other things is not going to have much meaning).

That was a bit long-winded, but my question is whether the linked paper or any other papers provide representations that you think don't fail any of (1)-(4).

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-10T03:53:59.383Z · score: 1 (1 votes) · LW · GW

Human's fastest recognition capability still takes 100 ms or so, and operating in that mode (rapid visual presentation), human inference accuracy is considerably less capable than modern ANNs.

This doesn't seem right, assuming that "considerably less capable" means "considerably worse accuracy at classifying objects not drawn from ImageNet". Do you have a study in mind that shows this? In either case, I don't think this is strong enough to support the claim that the classifier isn't breaking down --- it's pretty clearly making mistakes where humans would find the answer obvious. I don't think that saying that the ANN answers more quickly is a very strong defense.

Comment by jsteinhardt on Open thread, Nov. 02 - Nov. 08, 2015 · 2015-11-06T17:02:04.068Z · score: 7 (7 votes) · LW · GW

Note that their implicit definition of "replicable" is very narrow --- under their procedure, one can fail to be "replicable" simply by failing to reply to an e-mail from the authors asking for code. This is somewhat of a word play, since typically "failure to replicate" means that one is unable to get the same results as the authors while following the same procedure. Based on their discussion at the end of section 3, it appears that (at most) 9 of the 30 "failed replications" are due to actually running the code and getting different results.

Comment by jsteinhardt on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-03T17:07:33.172Z · score: 3 (3 votes) · LW · GW

Thanks for writing this; a couple quick thoughts:

For example, it turns out that a learning algorithm tasked with some relatively simple tasks, such as determining whether or not English sentences are valid, will automatically build up an internal representation of the world which captures many of the regularities of the world – as a pure side effect of carrying out its task.

I think I've yet to see a paper that convincingly supports the claim that neural nets are learning natural representations of the world. For some papers that refute this claim, see e.g.

http://arxiv.org/abs/1312.6199 http://arxiv.org/abs/1412.6572

I think the Degrees of Freedom thesis is a good statement of one of the potential problems. Since it's essentially making a claim about whether a certain very complex statistical problem is identifiable, I think it's very hard to know whether it's true or not without either some serious technical analysis or some serious empirical research --- which is a reason to do that research, because if the thesis is true then that has some worrisome implications about AI safety.

Comment by jsteinhardt on The trouble with Bayes (draft) · 2015-10-29T04:08:14.427Z · score: 2 (2 votes) · LW · GW

I wrote up a pretty detailed reply to Luke's question: http://lesswrong.com/lw/kd4/the_power_of_noise/

Comment by jsteinhardt on Vegetarianism Ideological Turing Test Results · 2015-10-15T03:34:53.850Z · score: 0 (0 votes) · LW · GW

Yes, I agree, I meant the (unobserved) probability that each judge gets a given question correct (which will of course differ from the observed fraction of the time each judge is correct. But it appears that at least one judge may have done quite well (as gjm points out). I don't think that the analysis done so far provides much evidence about how many judges are doing better than chance. It's possible that there just isn't enough data to make such an inference, but one possible thing you could do is to plot the p-values in ascending order and see how close they come to a straight line.

Comment by jsteinhardt on Vegetarianism Ideological Turing Test Results · 2015-10-14T15:26:22.361Z · score: 2 (2 votes) · LW · GW

I think you should distinguish between "average score across judges is close to 50%" and "every single judge is close to 50%". I suspect the latter is not true, as pointed out in one of the other comments.

Comment by jsteinhardt on Open thread, Oct. 12 - Oct. 18, 2015 · 2015-10-13T13:25:50.124Z · score: 1 (3 votes) · LW · GW

What why would this be true? Utility functions don't have to be linear, it could even be the case that I place no additional utility on happiness beyond a certain level.

[link] Essay on AI Safety

2015-06-26T07:42:11.581Z · score: 12 (13 votes)

The Power of Noise

2014-06-16T17:26:30.329Z · score: 28 (31 votes)

A Fervent Defense of Frequentist Statistics

2014-02-18T20:08:48.833Z · score: 47 (58 votes)

Another Critique of Effective Altruism

2014-01-05T09:51:12.231Z · score: 20 (23 votes)

Macro, not Micro

2013-01-06T05:29:38.689Z · score: 19 (22 votes)

Beyond Bayesians and Frequentists

2012-10-31T07:03:00.818Z · score: 36 (41 votes)

Recommendations for good audio books?

2012-09-16T23:43:31.596Z · score: 4 (9 votes)

What is the evidence in favor of paleo?

2012-08-27T07:07:07.105Z · score: 13 (18 votes)

PM system is not working

2012-08-02T16:09:06.846Z · score: 11 (12 votes)

Looking for a roommate in Mountain View

2012-08-01T19:04:59.872Z · score: 11 (18 votes)

Philosophy and Machine Learning Panel on Ethics

2011-12-17T23:32:20.026Z · score: 8 (11 votes)

Help me fix a cognitive bug

2011-06-25T22:22:31.484Z · score: 4 (7 votes)

Utility is unintuitive

2010-12-09T05:39:34.176Z · score: -5 (16 votes)

Interesting talk on Bayesians and frequentists

2010-10-23T04:10:27.684Z · score: 7 (12 votes)