Posts

ML Systems Will Have Weird Failure Modes 2022-01-26T01:40:13.134Z
Anchor Weights for ML 2022-01-20T16:20:20.390Z
Thought Experiments Provide a Third Anchor 2022-01-18T16:00:20.795Z
Future ML Systems Will Be Qualitatively Different 2022-01-11T19:50:11.377Z
More Is Different for AI 2022-01-04T19:30:20.352Z
From Considerations to Probabilities 2021-12-31T02:10:14.682Z
Prioritizing Information 2021-12-24T00:00:22.448Z
The "Other" Option 2021-12-16T20:20:29.611Z
Combining Forecasts 2021-12-10T02:10:14.402Z
Common Probability Distributions 2021-12-02T01:50:17.115Z
Base Rates and Reference Classes 2021-11-24T22:30:18.741Z
Forecasting: Zeroth and First Order 2021-11-18T01:30:19.127Z
Measuring and Forecasting Risks from AI 2021-11-12T02:30:20.959Z
How should we compare neural network representations? 2021-11-05T22:10:18.677Z
Measuring and forecasting risks 2021-10-29T07:27:32.836Z
Deliberate Play 2021-10-24T02:50:16.947Z
On The Risks of Emergent Behavior in Foundation Models 2021-10-18T20:00:15.896Z
How much slower is remote work? 2021-10-08T02:00:17.857Z
Unsolved ML Safety Problems 2021-09-29T16:00:19.466Z
Let Us Do Our Work As Well 2021-09-17T00:40:18.443Z
Economic AI Safety 2021-09-16T20:50:50.335Z
Film Study for Research 2021-09-14T18:53:25.831Z
Does Diverse News Decrease Polarization? 2021-09-11T02:30:16.583Z
Measurement, Optimization, and Take-off Speed 2021-09-10T19:30:57.189Z
Model Mis-specification and Inverse Reinforcement Learning 2018-11-09T15:33:02.630Z
Latent Variables and Model Mis-Specification 2018-11-07T14:48:40.434Z
[link] Essay on AI Safety 2015-06-26T07:42:11.581Z
The Power of Noise 2014-06-16T17:26:30.329Z
A Fervent Defense of Frequentist Statistics 2014-02-18T20:08:48.833Z
Another Critique of Effective Altruism 2014-01-05T09:51:12.231Z
Macro, not Micro 2013-01-06T05:29:38.689Z
Beyond Bayesians and Frequentists 2012-10-31T07:03:00.818Z
Recommendations for good audio books? 2012-09-16T23:43:31.596Z
What is the evidence in favor of paleo? 2012-08-27T07:07:07.105Z
PM system is not working 2012-08-02T16:09:06.846Z
Looking for a roommate in Mountain View 2012-08-01T19:04:59.872Z
Philosophy and Machine Learning Panel on Ethics 2011-12-17T23:32:20.026Z
Help me fix a cognitive bug 2011-06-25T22:22:31.484Z
Utility is unintuitive 2010-12-09T05:39:34.176Z
Interesting talk on Bayesians and frequentists 2010-10-23T04:10:27.684Z

Comments

Comment by jsteinhardt on ML Systems Will Have Weird Failure Modes · 2022-01-26T02:08:25.476Z · LW · GW

@Mods: Looks like the LaTeX isn't rendering. I'm not sure what the right way to do that is on LessWrong. On my website, I do it with code injection. You can see the result here, where the LaTeX all renders in MathJax: https://bounded-regret.ghost.io/ml-systems-will-have-weird-failure-modes-2/

Comment by jsteinhardt on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-23T20:29:28.849Z · LW · GW

I feel like you are arguing for a very strong claim here, which is that "as soon as you have an efficient way of determining whether a problem is solved, and any way of generating a correct solution some very small fraction of the time, you can just build an efficient solution that solves it all of the time"

Hm, this isn't the claim I intended to make. Both because it overemphasizes on "efficient" and because it adds a lot of "for all" statements.

If I were trying to state my claim more clearly, it would be something like "generically, for the large majority of problems of the sort you would come across in ML, once you can distinguish good answers you can find good answers (modulo some amount of engineering work), because non-convex optimization generally works and there are a large number of techniques for solving the sparse rewards problem, which are also getting better over time".

Comment by jsteinhardt on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-23T17:52:55.221Z · LW · GW

Thanks for the push-back and the clear explanation. I still think my points hold and I'll try to explain why below.

In order to even get a single expected datapoint of approval, I need to sample 10^8 examples, which in our current sampling method would take 10^8 * 10 hours, e.g. approximately 100,000 years. I don't understand how you could do "Learning from Human Preferences" on something this sparse

This is true if all the other datapoints are entirely indistinguishable, and the only signal is "good" vs. "bad". But in practice you would compare / rank the datapoints, and move towards the ones that are better.

Take the backflip example from the human preferences paper: if your only signal was "is this a successful backflip?", then your argument would apply and it would be pretty hard to learn. But the signal is "is this more like a successful backflip than this other thing?" and this makes learning feasible.

More generally, I feel that the thing I'm arguing against would imply that ML in general is impossible (and esp. the human preferences work), so I think it would help to say explicitly where the disanalogy occurs.

I should note that comparisons is only one reason why the situation isn't as bad as you say. Another is that even with only non-approved data points to label, you could do things like label "which part" of the plan is non-approved. And with very sophisticated AI systems, you could ask them to predict which plans would be approved/non-approved, even if they don't have explicit examples, simply by modeling the human approvers very accurately in general.

I feel even beyond that, this still assumes that the reason it is proposing a "good" plan is pure noise, and not the result of any underlying bias that is actually costly to replace.

When you say "costly to replace", this is with respect to what cost function? Do you have in mind the system's original training objective, or something else?

If you have an original cost function F(x) and an approval cost A(x), you can minimize F(x) + c * A(x), increasing the weight on c until it pays enough attention to A(x). For an appropriate choice of c, this is (approximately) equivalent to asking "Find the most approved policy such that F(x) is below some threshold"--more generally, varying c will trace out the Pareto boundary between F and A.

so even if we get within 33 bits (which I do think seems unlikely)

Yeah, I agree 33 bits would be way too optimistic. My 50% CI is somewhere between 1,000 and 100,000 bits needed. It just seems unlikely to me that you'd be able to generate, say, 100 bits but then run into a fundamental obstacle after that (as opposed to an engineering / cost obstacle).

Like, I feel like... this is literally a substantial part of the P vs. NP problem, and I can't just assume my algorithm just like finds efficient solution to arbitrary NP-hard problems.

I don't think the P vs. NP analogy is a good one here, for a few reasons:

 * The problems you're talking about above are statistical issues (you're saying you can't get any statistical signal), while P vs. NP is a computational question.

 * In general, I think P vs. NP is a bad fit for ML. Invoking related intuitions would have led you astray over the past decade--for instance, predicting that neural networks should not perform well because they are solving a problem (non-convex optimization) that is NP-hard in the worst case.

Comment by jsteinhardt on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-23T00:30:13.810Z · LW · GW

This would imply a fixed upper bound on the number of bits you can produce (for instance, a false negative rate of 1 in 128 implies at most 7 bits). But in practice you can produce many more than 7 bits, by double checking your answer, combining multiple sources of information, etc.

Comment by jsteinhardt on Anchor Weights for ML · 2022-01-22T21:00:01.800Z · LW · GW

Maybe, but I think some people would disagree strongly with this list even in the abstract (putting almost no weight on Current ML, or putting way more weight on humans, or something else). I agree that it's better to drill down into concrete disagreements, but I think right now there are implicit strong disagreements that are not always being made explicit, and this is a quick way to draw them out.

Comment by jsteinhardt on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-22T20:58:04.420Z · LW · GW

Basically the same techniques as in Deep Reinforcement Learning from Human Preferences and the follow-ups--train a neural network model to imitate your judgments, then chain it together with RL.

I think current versions of that technique could easily give you 33 bits of information--although as noted elsewhere, the actual numbers of bits you need might be much larger than that, but the techniques are getting better over time as well.

Comment by jsteinhardt on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-21T06:38:51.411Z · LW · GW

Yes, I think I understand that more powerful optimizers can find more spurious solutions. But the OP seemed to be hypothesizing that you had some way to pick out the spurious from the good solutions, but saying it won't scale because you have 10^50, not 100, bad solutions for each good one. That's the part that seems wrong to me.

Comment by jsteinhardt on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-21T03:17:41.481Z · LW · GW

I'm not sure I understand why it's important that the fraction of good plans is 1% vs .00000001%. If you have any method for distinguishing good from bad plans, you can chain it with an optimizer to find good plans even if they're rare. The main difficulty is generating enough bits--but in that light, the numbers I gave above are 7 vs 33 bits--not a clear qualitative difference. And in general I'd be kind of surprised if you could get up to say 50 bits but then ran into a fundamental obstacle in scaling up further.

Comment by jsteinhardt on Thought Experiments Provide a Third Anchor · 2022-01-19T23:26:11.928Z · LW · GW

Thanks! Yes, this makes very similar points :) And from 4 years ago!

Comment by jsteinhardt on Thought Experiments Provide a Third Anchor · 2022-01-19T03:38:57.023Z · LW · GW

The fear of anthropomorphising AI is one of the more ridiculous traditional mental blindspots in the LW/rationalist sphere.

You're really going to love Thursday's post :).

Jokes aside, I actually am not sure LW is that against anthropomorphising. It seems like a much stronger injunction among ML researchers than it is on this forum.

I personally am not very into using humans as a reference class because it is a reference class with a single data point, whereas e.g. "complex systems" has a much larger number of data points.

In addition, it seems like intuition about how humans behave is already pretty baked in to how we think about intelligent agents, so I'd guess by default we overweight it and have to consciously get ourselves to consider other anchors.

I would agree that it's better to do this by explicitly proposing additional anchors, rather than never talking about humans.

Comment by jsteinhardt on Future ML Systems Will Be Qualitatively Different · 2022-01-14T00:00:34.651Z · LW · GW

Okay I think I get what you're saying now--more SGD steps should increase "effective model capacity", so per the double descent intuition we should expect the validation loss to first increase then decrease (as is indeed observed). Is that right?

Comment by jsteinhardt on Future ML Systems Will Be Qualitatively Different · 2022-01-13T23:56:48.902Z · LW · GW

But if you keep training, GD should eventually find a low complexity high test scoring solution - if one exists - because those solutions have an even higher score (with some appropriate regularization term). Obviously much depends on the overparameterization and relative reg term strength - if it's too strong GD may fail or at least appear to fail as it skips the easier high complexity solution stage. I thought that explanation of grokking was pretty clear.

I think I'm still not understanding. Shouldn't the implicit regularization strength of SGD be higher, not lower, for fewer iterations? So running it longer should give you a higher-complexity, not a lower-complexity solution. (Although it's less clear how this intuition pans out once you already have very low training loss, maybe you're saying that double descent somehow kicks in there?)

Comment by jsteinhardt on Future ML Systems Will Be Qualitatively Different · 2022-01-13T20:24:55.377Z · LW · GW

I'm not sure I get what the relation would be--double descent is usually with respect to the model size (vs. amout of data), although there is some work on double descent vs. number of training iterations e.g. https://arxiv.org/abs/1912.02292. But I don't immediately see how to connect this to grokking.

(I agree they might be connected, I'm just saying I don't see how to show this. I'm very interested in models that can explain grokking, so if you have ideas let me know!)

Comment by jsteinhardt on Future ML Systems Will Be Qualitatively Different · 2022-01-13T20:07:08.974Z · LW · GW

I don't think it's inferior -- I think both of them have contrasting strengths and limitations. I think the default view in ML would be to use 95% empiricism, 5% philosophy when making predictions, and I'd advocate for more like 50/50, depending on your overall inclinations (I'm 70-30 since I love data, and I think 30-70 is also reasonable, but I think neither 95-5 or 5-95 would be justifiable).

I'm curious what in the post makes you think I'm claiming philosophy is superior. I wrote this:

> Confronting emergence will require adopting mindsets that are less familiar to most ML researchers and utilizing more of the Philosophy worldview (in tandem with Engineering and other worldviews).

This was intended to avoid making a claim of superiority in either direction.

Comment by jsteinhardt on San Francisco shares COVID data only when it's too late · 2021-12-27T17:10:41.793Z · LW · GW

Also my personal take is that SF, on a pure scientific/data basis, has had one of the best responses in the nation, probably benefiting from having UCSF for in-house expertise. (I'm less enthusiastic about the political response--I think we erred way too far on the "take no risks" side, and like everyone else prioritized restaurants over schools which seems like a clear mistake. But on the data front I feel like you're attacking one of the singularly most reasonable counties in the U.S.)

Comment by jsteinhardt on San Francisco shares COVID data only when it's too late · 2021-12-27T17:06:54.071Z · LW · GW

It seems like the main alternative would be to have something like Alameda County's reporting, which has a couple days fewer lag at the expense of less quality control: https://covid-19.acgov.org/data.page?#cases.

It's really unclear to me that Alameda's data is more informative than SF's. (In fact I'd say it's the opposite--I tend to look at SF over Alameda even though I live in Alameda County.)

I think there is some information lost in SF's presentation, but it's generally less information lost than most alternatives on the market. SF is also backdating the data to when the tests were actually performed, thus being transparent about the fact that most test data is about what happened several days ago. Websites that claim to give you more up-to-date information are not actually doing so, they're just hiding this fact.

If you looked at the next 4 days in the time series it would probably look something like: 500, 200, 100, 100. Not because Omicron is abating but because most tests taken in the last 4 days haven't had time to be processed and recorded. I think if I was careful I could squeeze a small amount of information out of those numbers (e.g. based on whether the 500 was actually 400 or 600) but it would require a lot of work. I tried this in the past when working with some public health researchers and it's surprisingly hard to not fool yourself into thinking that cases are going down again when it's actually reporting lag.

Comment by jsteinhardt on Worst-case thinking in AI alignment · 2021-12-27T16:09:41.369Z · LW · GW

Finding the min-max solution might be easier, but what we actually care about is an acceptable solution. My point is that the min-max solution, in most cases, will be unacceptably bad.

And in fact, since min_x f(theta,x) <= E_x[f(theta,x)], any solution that is acceptable in the worst case is also acceptable in the average case.

Comment by jsteinhardt on Worst-case thinking in AI alignment · 2021-12-24T17:27:31.877Z · LW · GW

Thanks! I appreciated these distinctions. The worst-case argument for modularity came up in a past argument I had with Eliezer, where I argued that this was a reason for randomization (even though Bayesian decision theory implies you should never randomize). See section 2 here: The Power of Noise.

Re: 50% vs. 10% vs. 90%. I liked this illustration, although I don't think your argument actually implies 50% specifically. For instance if it turns out that everyone else is working on the 50% worlds and no one is working on the 90% worlds, you should probably work on the 90% worlds. In addition:

 *  It seems pretty plausible that the problem is overall more tractable in 10% worlds than 50% worlds, so given equal neglectedness you would prefer the 10% world.

 * Many ideas will generalize across worlds, and recruitment / skill-building / organization-building also generalizes across worlds. This is an argument towards working on problems that seem tractable and relevant to any world, as long as they are neglected enough that you are building out distinct ideas and organizational capacity (vs. just picking from the same tree as ML generally). I don't think that this argument dominates considerations, but it likely explains some of our differences in approach.

In the terms laid out in your post, I think my biggest functional disagreement (in terms of how it affects what problems we work on) is that I expect most worst-case assumptions make the problem entirely impossible, and I am more optimistic that many empirically-grounded assumptions will generalize quite far, all the way to AGI. To be clear, I am not against all worst-case assumptions (for instance my entire PhD thesis is about this) but I do think they are usually a source of significant added difficulty and one has to be fairly careful where they are making them.

For instance, as regards Redwood's project, I expect making language models fully adversarially robust is impossible with currently accessible techniques, and that even a fairly restricted adversary will be impossible to defend against while maintaining good test accuracy. On the other hand I am still pretty excited about Redwood's project because I think you will learn interesting things by trying. (I spent some time trying to solve the unrestricted adversarial example competition, totally failed, but still felt it was a good use of time for similar reasons, and the difficulties for language models seem interestingly distinct in a way that should generate additional insight.) I'm actually not sure if this differs that much from your beliefs, though.

Comment by jsteinhardt on Worst-case thinking in AI alignment · 2021-12-24T17:10:08.339Z · LW · GW

I think this probably depends on the field. In machine learning, solving problems under worst-case assumptions is usually impossible because of the no free lunch theorem. You might assume that a particular facet of the environment is worst-case, which is a totally fine thing to do, but I don't think it's correct to call it the "second-simplest solution", since there are many choices of what facet of the environment is worst-case.

One keyword for this is "partial specification", e.g. here is a paper I wrote that makes a minimal set of statistical assumptions and worst-case assumptions everywhere else: https://arxiv.org/abs/1606.05313. (Unfortunately the statistical assumptions are not really reasonable so the method was way too brittle in practice.) This kind of idea is also common in robust statistics. But my take would not be that it is simpler--in general it is way harder than just working with the empirical distribution in front of you.

Comment by jsteinhardt on Understanding and controlling auto-induced distributional shift · 2021-12-14T04:38:05.232Z · LW · GW

Cool paper! One brief comment is this seems closely related to performative prediction and it seems worth discussing the relationship.

Edit: just realized this is a review, not a new paper, so my comment is a bit less relevant. Although it does still seem like a useful connection to make.

Comment by jsteinhardt on Base Rates and Reference Classes · 2021-11-25T05:54:42.209Z · LW · GW

Oh okay got it! It looks like the behavior is as intended, but one downside from my perspective is that the blog link is not very visually prominent as is--I would expect most readers to not notice it. I care about this mostly because I would like more people to know about my blog's existence, and I think it could be fixed if there was the option to add a small avatar next to the blog name to make it more visually prominent (I could imagine lots of other fixes too but just throwing a concrete one out there).

On a separate not it looks like the latex is not rendering in the post: I used to go in and out of math mode, but I'm not sure the LW editor parses that. (My blog embeds a javascript header that loads mathjax but I assume that is not loaded with the automatic crossposting.)

Comment by jsteinhardt on Base Rates and Reference Classes · 2021-11-24T23:05:49.599Z · LW · GW

@LW mods: Looks like this one also doesn't link back to Bounded Regret? Could it be because of the italicized text that I put at the top?

Comment by jsteinhardt on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T22:41:43.419Z · LW · GW

My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I'd guess we'll see at least 2 such examples in 2022 and at least 100 by 2030.

At the point where there's a "FOOM", such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not certain--maybe 33%?) that once you have the right phase transition to kick off the rest, everything else happens pretty quickly (within a few days).

Is this take more consistent with Paul's or Eliezer's? I'm not totally sure. I'd guess closer to Paul's, but maybe the "1 day" world is consistent with Eliezer's?

(One candidate for the "big" phase transition would be if the model figures out how to go off and learn on its own, so that number of SGD updates is no longer the primary bottleneck on model capabilities. But I could also imagine us getting that even when models are still fairly "dumb".)

Comment by jsteinhardt on Forecasting: Zeroth and First Order · 2021-11-18T05:09:35.652Z · LW · GW

Awesome, thanks a lot!

Comment by jsteinhardt on Forecasting: Zeroth and First Order · 2021-11-18T01:59:04.930Z · LW · GW

@LW Mods: It looks like the embedded IFrame from the original post didn't copy over. Is there some way either to embed it here, or else just copy it over as an image? (Also, it looks like this post doesn't actually link back to my blog like it normally does, not sure why...)

Comment by jsteinhardt on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-15T17:14:40.797Z · LW · GW

Thanks. For time/brevity, I'll just say which things I agree / disagree with:

> sufficiently capable and general AI is likely to have property X as a strong default [...] 

I generally agree with this, although for certain important values of X (such as "fooling humans for instrumental reasons") I'm probably more optimistic than you that there will be a robust effort to get not-X, including by many traditional ML people. I'm also probably more optimistic (but not certain) that those efforts will succeed.

[inside view, modest epistemology]: I don't have a strong take on either of these. My main take on inside views is that they are great for generating interesting and valuable hypotheses, but usually wrong on the particulars.

> less weight on reasoning like 'X was true about AI in 1990, in 2000, in 2010, and in 2020; therefore X is likely to be true about AGI when it's developed

I agree, see my post On the Risks of Emergent Behavior in Foundation Models. In the past I think I put too much weight on this type of reasoning, and also think most people in ML put too much weight on it.

> MIRI thinks AGI is better thought of as 'a weird specific sort of AI', rather than as 'like existing AI but more so'.

Probably disagree but hard to tell. I think there will both be a lot of similarities and a lot of differences.

> AGI is mostly insight-bottlenecked (we don't know how to build it), rather than hardware-bottlenecked

Seems pretty wrong to me. We probably need both insight and hardware, but the insights themselves are hardware-bottlenecked: once you can easily try lots of stuff and see what happens, insights are much easier, see Crick on x-ray crystallography for historical support (ctrl+f for Crick).

> I'd want to look at more conceptual work too, where I'd guess MIRI is also more pessimistic than you

I'm more pessimistic than MIRI about HRAD, though that has selection effects. I've found conceptual work to be pretty helpful for pointing to where problems might exist, but usually relatively confused about how to address them or how specifically they're likely to manifest. (Which is to say, overall highly valuable, but consistent with my take above on inside views.)

[experiments are either predictable or uninformative]: Seems wrong to me. As a concrete example: Do larger models have better or worse OOD generalization? I'm not sure if you'd pick "predictable" or "uninformative", but my take is:
 * The outcome wasn't predictable: within ML there are many people who would have taken each side. (I personally was on the wrong side, i.e. predicting "worse".)
 * It's informative, for two reasons: (1) It shows that NNs "automatically" generalize more than I might have thought, and (2) Asymptotically, we expect the curve to eventually reverse, so when does that happen and how can we study it?

See also my take on Measuring and Forecasting Risks from AI, especially the section on far-off risks.

> Most ML experiments either aren't about interpretability and 'cracking open the hood', or they're not approaching the problem in a way that MIRI's excited by.

Would agree with "most", but I think you probably meant something like "almost all", which seems wrong. There's lots of people working on interpretability, and some of the work seems quite good to me (aside from Chris, I think Noah Goodman, Julius Adebayo, and some others are doing pretty good work).

Comment by jsteinhardt on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-15T03:22:51.462Z · LW · GW

Not sure if this helps, and haven't read the thread carefully, but my sense is your framing might be eliding distinctions that are actually there, in a way that makes it harder to get to the bottom of your disagreement with Adam. Some predictions I'd have are that:

 * For almost any experimental result, a typical MIRI person (and you, and Eliezer) would think it was less informative about AI alignment than I would.
 * For almost all experimental results you would think they were so much less informative as to not be worthwhile.
 * There's a small subset of experimental results that we would think are comparably informative, and also a some that you would find much more informative than I would.

(I'd be willing to take bets on these or pick candidate experiments to clarify this.)

In addition, a consequence of these beliefs is that compared to me you think we should be spending way more time sitting around thinking about stuff, and way less time doing experiments, than I do.

I would agree with you that "MIRI hates all experimental work" / etc. is not a faithful representation of this state of affairs, but I think there is nevertheless an important disagreement MIRI has with typical ML people, and that the disagreement is primarily about what we can learn from experiments.

Comment by jsteinhardt on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-13T06:36:10.288Z · LW · GW

Would running the method in this paper on EfficientNet count?

What if we instead used a weaker but still sound method (e.g. based on linear programs instead of semidefinite programs)?

Comment by jsteinhardt on How much slower is remote work? · 2021-10-08T16:17:20.624Z · LW · GW

It is definitely useful in some settings! For instance it's much easier to collaborate with people not at Berkeley, and in some cases those people have valuable specialized skills that easily outweigh the productivity hit.

Comment by jsteinhardt on How much slower is remote work? · 2021-10-08T16:13:51.742Z · LW · GW

I personally have Wednesdays, plus Thursday mornings, as "no meeting days". I think it works pretty well and I know other faculty who do something similar (sometimes just setting mornings as meeting-free). So this does seem like a generally good idea!

Comment by jsteinhardt on Film Study for Research · 2021-09-29T16:18:35.103Z · LW · GW

Thanks, those are really cool!

Comment by jsteinhardt on Where do your eyes go? · 2021-09-20T02:13:15.056Z · LW · GW

I enjoyed this quite a bit. Vision is very important in sports as well, but I hadn't thought to apply it to other areas, despite generally being into applying sports lessons to research (i.e. https://bounded-regret.ghost.io/film-study/).

In sports, you have to choose between watching the person you're guarding and watching the ball / center of play. Or if you're on offense, between watching where you're going and watching the ball. Eye contact is also important for (some) passing.

What's most interesting is the second-level version of this, where good players watch their opponent's gaze (for instance, making a move exactly when the opponent's gaze moves somewhere else). I wonder if there's an analog in video games / research?

Comment by jsteinhardt on Let Us Do Our Work As Well · 2021-09-17T14:34:10.557Z · LW · GW

Thanks, really appreciate the references!

Comment by jsteinhardt on Economic AI Safety · 2021-09-17T00:04:45.552Z · LW · GW

If there was a feasible way to make the algorithm open, I think that would be good (of course FB would probably strongly oppose this). As you say, people wouldn't directly design / early adopt new algorithms, but once early adopters found an alternative algorithm that they really liked, word of mouth would lead many more people to adopt it. So I think you could eventually get widespread change this way.

Comment by jsteinhardt on Film Study for Research · 2021-09-15T02:07:26.748Z · LW · GW

Thanks for the feedback!

I haven't really digged into Gelman's blog, but the format you mention is a perfect example of the expertise of understanding some research. Very important skill, but not the same as actually conducting the research that goes into a paper.

Research consists of many skills put together. Understanding prior work and developing the taste to judge it is one of the more important individual skills in research (moreso than programming, at least in most fields). So I think the blog example is indeed a central one.

In research, especially in a weird new field like alignment, it's rare to find another researcher who want to conduct precisely the same research. But that's the basis of every sport and game: people want to win the same game. It make the whole "learning from other" slightly more difficult IMO. You can't just look for what works, you constantly have to repurpose ideas that work in slightly different field and/or approaches and check for the loss in translation.

I agree with this, although I think creative new ideas often come from people who have also mastered the "standard" skills. And indeed, most research is precisely about coming up with new ideas, which is a skill that you can cultivate my studying how others generate ideas.

More tangentially, you may be underestimating the amount of innovation in sports. Harden and Jokic both innovate in basketball (among others), but I am pretty sure they also do lots of film study. Jokic's innovation probably comes from having mastered other sports like water polo and the resulting skill transfer. I would guess that mastery of fruitfully adjacent fields is a productive way to generate ideas.

Comment by jsteinhardt on Measurement, Optimization, and Take-off Speed · 2021-09-11T02:39:06.210Z · LW · GW

Thanks, sounds good to me!

Comment by jsteinhardt on Experimentally evaluating whether honesty generalizes · 2021-07-13T01:19:55.142Z · LW · GW

Actually, another issue is that unsupervised translation isn't "that hard" relative to supervised translation--I think that you can get pretty far with simple heuristics, such that I'd guess making the model 10x bigger matters more than making the objective more aligned with getting the answer right (and that this will be true for at least a couple more 10x-ing of model size, although at some point the objective will matter more).

This might not matter as much if you're actually outputting explanations and not just translating from one language to another. Although it is probably true that for tasks that are far away from the ceiling, "naive objective + 10x larger model" will outperform "correct objective".

Comment by jsteinhardt on Experimentally evaluating whether honesty generalizes · 2021-07-13T01:12:50.121Z · LW · GW

Thanks Paul, I generally like this idea.

Aside from the potential concerns you bring up, here is the most likely way I could see this experiment failing to be informative: rather than having checks and question marks in your tables above, really the model's ability to solve each task is a question of degree--each table entry will be a real number between 0 and 1. For, say, tone, GPT-3 probably doesn't have a perfect model of tone, and would get <100% performance on a sentiment classification task, especially if done few-shot.

The issue, then, is that the "fine-tuning for correctness" and "fine-tuning for coherence" processes are not really equivalent--fine-tuning for correctness is in fact giving GPT-3 additional information about tone, which improves its capabilities. In addition, GPT-3 might not "know" exactly what humans mean by the word tone, and so fine-tuning for correctness also helps GPT-3 to better understand the question.

Given these considerations, my modal expectation is that fine-tuning for correctness will provide moderately better results than just doing coherence, but it won't be clear how to interpret the difference--maybe in both cases GPT-3 provides incoherent outputs 10% of the time, and then additionally coherent but wrong outputs 10% of the time when fine-tuned for correctness, but 17% of the time when fine-tuned only for coherence. What would you conclude from a result like that? I would still have found the experiment interesting, but I'm not sure I would be able to draw a firm conclusion.

So perhaps my main feedback would be to think about how likely you think such an outcome is, how much you mind that, and if there are alternative tasks that avoid this issue without being significantly more complicated.

Comment by jsteinhardt on AI x-risk reduction: why I chose academia over industry · 2021-03-15T05:52:28.441Z · LW · GW

This doesn't seem so relevant to capybaralet's case, given that he was choosing whether to accept an academic offer that was already extended to him.

Comment by jsteinhardt on Covid 2/18: Vaccines Still Work · 2021-02-19T16:16:25.082Z · LW · GW

I think if you account for undertesting, then I'd guess 30% or more of the UK was infected during the previous peak, which should reduce R by more than 30% (the people most likely to be infected are also most likely to spread further), and that is already enough to explain the drop.

Comment by jsteinhardt on Making Vaccine · 2021-02-06T01:18:27.894Z · LW · GW

I wasn't sure what you meant by more dakka, but do you mean just increasing the dose? I don't see why that would necessarily work--e.g. if the peptide just isn't effective.

I'm confused because we seem to be getting pretty different numbers. I asked another bio friend (who is into DIY stuff) and they also seemed pretty skeptical, and Sarah Constantin seems to be as well: https://twitter.com/s_r_constantin/status/1357652836079837189.

Not disbelieving your account, just noting that we seem to be getting pretty different outputs from the expert-checking process and it seems to be more than just small-sample noise. I'm also confused because I generally trust stuff from George Church's group, although I'm still near the 10% probability I gave above.

I am certainly curious to see whether this does develop measurable antibodies :).

Comment by jsteinhardt on Making Vaccine · 2021-02-05T02:52:45.316Z · LW · GW

Ah got it, thanks!

Comment by jsteinhardt on Making Vaccine · 2021-02-05T02:24:30.096Z · LW · GW

Have you run this by a trusted bio expert? When I did this test (picking a bio person who I know personally, who I think of as open-minded and fairly smart), they thought that this vaccine is pretty unlikely to be effective and that the risks in this article may be understated (e.g. food grade is lower-quality than lab grade, and it's not obvious that inhaling food is completely safe). I don't know enough biology to evaluate their argument, beyond my respect for them.

I'd be curious if the author, or others who are considering trying this, have applied this test.

My (fairly uninformed) estimates would be:
 - 10% chance that the vaccine works in the abstract
 - 4% chance that it works for a given LW user
 - 3% chance that a given LW user has an adverse reaction
  -12% chance at least 1 LW user has an adverse reaction

Of course, from a selfish perspective, I am happy for others to try this. In the 10% of cases where it works I will be glad to have that information. I'm more worried that some might substantially overestimate the benefit and underestimate the risks, however.

Comment by jsteinhardt on Making Vaccine · 2021-02-05T02:18:13.308Z · LW · GW

I don't think I was debating the norms, but clarifying how they apply in this case. Most of my comment was a reaction to the "pretty important" and "timeless life lessons", which would apply to Raemon's comment whether or not he was a moderator.

Comment by jsteinhardt on Making Vaccine · 2021-02-05T02:16:28.006Z · LW · GW

Often, e.g. Stanford profs claiming that COVID is less deadly than the flu for a recent and related example.

Comment by jsteinhardt on Making Vaccine · 2021-02-04T19:38:54.276Z · LW · GW

Hmm, important as in "important to discuss", or "important to hear about"?

My best guess based on talking to a smart open-minded biologist is that this vaccine probably doesn't work, and that the author understates the risks involved. I'm interpreting the decision to frontpage as saying that you think I'm wrong with reasonably high confidence, but I'm not sure if I should interpret it that way.

Comment by jsteinhardt on Covid 12/24: We’re F***ed, It’s Over · 2021-01-16T06:13:56.568Z · LW · GW

That seems irrelevant to my claim that Zvi's favored policy is worse than the status quo.

Comment by jsteinhardt on Covid 12/24: We’re F***ed, It’s Over · 2021-01-16T06:11:45.627Z · LW · GW

This isn't based on personal anecdote, sudies that try to estimate this come up with 3x. See eg the MicroCovid page: https://www.microcovid.org/paper/6-person-risk

Comment by jsteinhardt on Covid 12/31: Meet the New Year · 2021-01-03T07:32:32.957Z · LW · GW

You may well be right. I guess we don't really know what the sampling bias is (it would have to be pretty strongly skewed towards incoming UK cases though to get to a majority, since the UK itself was near 50%).

Comment by jsteinhardt on Covid 12/31: Meet the New Year · 2021-01-01T07:54:58.249Z · LW · GW

See here: https://cov-lineages.org/global_report.html