Robin Hanson on whether governments can squash COVID-19 2020-03-19T18:23:57.574Z · score: 11 (4 votes)
Should we all be more hygenic in normal times? 2020-03-17T06:14:23.093Z · score: 9 (2 votes)
Did any US politician react appropriately to COVID-19 early on? 2020-03-17T06:12:31.523Z · score: 22 (11 votes)
An Analytic Perspective on AI Alignment 2020-03-01T04:10:02.546Z · score: 53 (15 votes)
How has the cost of clothing insulation changed since 1970 in the USA? 2020-01-12T23:31:56.430Z · score: 14 (3 votes)
Do you get value out of contentless comments? 2019-11-21T21:57:36.359Z · score: 28 (12 votes)
What empirical work has been done that bears on the 'freebit picture' of free will? 2019-10-04T23:11:27.328Z · score: 9 (4 votes)
A Personal Rationality Wishlist 2019-08-27T03:40:00.669Z · score: 43 (26 votes)
Verification and Transparency 2019-08-08T01:50:00.935Z · score: 37 (17 votes)
DanielFilan's Shortform Feed 2019-03-25T23:32:38.314Z · score: 19 (5 votes)
Robin Hanson on Lumpiness of AI Services 2019-02-17T23:08:36.165Z · score: 16 (6 votes)
Test Cases for Impact Regularisation Methods 2019-02-06T21:50:00.760Z · score: 65 (19 votes)
Does freeze-dried mussel powder have good stuff that vegan diets don't? 2019-01-12T03:39:19.047Z · score: 18 (5 votes)
In what ways are holidays good? 2018-12-28T00:42:06.849Z · score: 22 (6 votes)
Kelly bettors 2018-11-13T00:40:01.074Z · score: 23 (7 votes)
Bottle Caps Aren't Optimisers 2018-08-31T18:30:01.108Z · score: 66 (26 votes)
Mechanistic Transparency for Machine Learning 2018-07-11T00:34:46.846Z · score: 55 (21 votes)
Research internship position at CHAI 2018-01-16T06:25:49.922Z · score: 25 (8 votes)
Insights from 'The Strategy of Conflict' 2018-01-04T05:05:43.091Z · score: 73 (27 votes)
Meetup : Canberra: Guilt 2015-07-27T09:39:18.923Z · score: 1 (2 votes)
Meetup : Canberra: The Efficient Market Hypothesis 2015-07-13T04:01:59.618Z · score: 1 (2 votes)
Meetup : Canberra: More Zendo! 2015-05-27T13:13:50.539Z · score: 1 (2 votes)
Meetup : Canberra: Deep Learning 2015-05-17T21:34:09.597Z · score: 1 (2 votes)
Meetup : Canberra: Putting Induction Into Practice 2015-04-28T14:40:55.876Z · score: 1 (2 votes)
Meetup : Canberra: Intro to Solomonoff induction 2015-04-19T10:58:17.933Z · score: 1 (2 votes)
Meetup : Canberra: A Sequence Post You Disagreed With + Discussion 2015-04-06T10:38:21.824Z · score: 1 (2 votes)
Meetup : Canberra HPMOR Wrap Party! 2015-03-08T22:56:53.578Z · score: 1 (2 votes)
Meetup : Canberra: Technology to help achieve goals 2015-02-17T09:37:41.334Z · score: 1 (2 votes)
Meetup : Canberra Less Wrong Meet Up - Favourite Sequence Post + Discussion 2015-02-05T05:49:29.620Z · score: 1 (2 votes)
Meetup : Canberra: the Hedonic Treadmill 2015-01-15T04:02:44.807Z · score: 1 (2 votes)
Meetup : Canberra: End of year party 2014-12-03T11:49:07.022Z · score: 1 (2 votes)
Meetup : Canberra: Liar's Dice! 2014-11-13T12:36:06.912Z · score: 1 (2 votes)
Meetup : Canberra: Econ 101 and its Discontents 2014-10-29T12:11:42.638Z · score: 1 (2 votes)
Meetup : Canberra: Would I Lie To You? 2014-10-15T13:44:23.453Z · score: 1 (2 votes)
Meetup : Canberra: Contrarianism 2014-10-02T11:53:37.350Z · score: 1 (2 votes)
Meetup : Canberra: More rationalist fun and games! 2014-09-15T01:47:58.425Z · score: 1 (2 votes)
Meetup : Canberra: Akrasia-busters! 2014-08-27T02:47:14.264Z · score: 1 (2 votes)
Meetup : Canberra: Cooking for LessWrongers 2014-08-13T14:12:54.548Z · score: 1 (2 votes)
Meetup : Canberra: Effective Altruism 2014-08-01T03:39:53.433Z · score: 1 (2 votes)
Meetup : Canberra: Intro to Anthropic Reasoning 2014-07-16T13:10:40.109Z · score: 1 (2 votes)
Meetup : Canberra: Paranoid Debating 2014-07-01T09:52:26.939Z · score: 1 (2 votes)
Meetup : Canberra: Many Worlds + Paranoid Debating 2014-06-17T13:44:22.361Z · score: 1 (2 votes)
Meetup : Canberra: Decision Theory 2014-05-26T14:44:31.621Z · score: 1 (2 votes)
[LINK] Scott Aaronson on Integrated Information Theory 2014-05-22T08:40:40.065Z · score: 22 (23 votes)
Meetup : Canberra: Rationalist Fun and Games! 2014-05-01T12:44:58.481Z · score: 0 (3 votes)
Meetup : Canberra: Life Hacks Part 2 2014-04-14T01:11:27.419Z · score: 0 (1 votes)
Meetup : Canberra Meetup: Life hacks part 1 2014-03-31T07:28:32.358Z · score: 0 (1 votes)
Meetup : Canberra: Meta-meetup + meditation 2014-03-07T01:04:58.151Z · score: 3 (4 votes)
Meetup : Second Canberra Meetup - Paranoid Debating 2014-02-19T04:00:42.751Z · score: 1 (2 votes)


Comment by danielfilan on DanielFilan's Shortform Feed · 2020-05-15T17:33:48.773Z · score: 12 (7 votes) · LW · GW

Hot take: the norm of being muted on video calls is bad. It makes it awkward and difficult to speak, clap, laugh, or make "I'm listening" sounds. A better norm set would be:

  • use zoom in gallery mode, so somebody making noise doesn't make them more focussed than they were before
  • call from a quiet room
  • be more tolerant of random background sounds, the way we are IRL
Comment by danielfilan on Against strong bayesianism · 2020-05-02T06:58:24.662Z · score: 2 (1 votes) · LW · GW

Actually, I think the synthesis is that many of the things that Bob is saying are implications of Eliezer's description and ways of getting close to Bayesian reasoning, but seem like they're almost presented as concessions. I could try to get into some responses chosen by you if that would be helpful.

Comment by danielfilan on Against strong bayesianism · 2020-05-02T05:47:15.802Z · score: 2 (1 votes) · LW · GW

A lot of Bob's responses seem like natural consequences of Eliezer's claim, but some of them aren't.

Comment by danielfilan on DanielFilan's Shortform Feed · 2020-05-02T02:56:23.418Z · score: 11 (6 votes) · LW · GW

I think the use of dialogues to illustrate a point of view is overdone on LessWrong. Almost always, the 'Simplicio' character fails to accurately represent the smart version of the viewpoint he stands in for, because the author doesn't try sufficiently hard to pass the ITT of the view they're arguing against. As a result, not only is the dialogue unconvincing, it runs the risk of misleading readers about the actual content of a worldview. I think this is true to a greater extent than posts that just state a point of view and argue against it, because the dialogue format naively appears to actually represent a named representative of a point of view, and structurally discourages disclaimers of the type "as I understand it, defenders of proposition P might state X, but of course I could be wrong".

Comment by danielfilan on Against strong bayesianism · 2020-05-02T02:56:00.461Z · score: 4 (2 votes) · LW · GW

Also (crossposted to shortform):

I think the use of dialogues to illustrate a point of view is overdone on LessWrong. Almost always, the 'Simplicio' character fails to accurately represent the smart version of the viewpoint he stands in for, because the author doesn't try sufficiently hard to pass the ITT of the view they're arguing against. As a result, not only is the dialogue unconvincing, it runs the risk of misleading readers about the actual content of a worldview. I think this is true to a greater extent than posts that just state a point of view and argue against it, because the dialogue format naively appears to actually represent a named representative of a point of view, and structurally discourages disclaimers of the type "as I understand it, defenders of proposition P might state X, but of course I could be wrong".

Comment by danielfilan on Against strong bayesianism · 2020-05-02T02:48:56.908Z · score: 5 (3 votes) · LW · GW

I feel like a lot of Bob's responses are natural consequences of Eliezer's position that you describe as "strong bayesianism", except where he talks about what he actually recommends, and as such this post feels very uncompelling to me. Where they aren't, "strong bayesianism" is correct: it seems useful for someone to actually think about what the likelihood ratio of "a random thought popped into my head" is, and similarly about how likely skeptical hypotheses are.


In other words, an ideal bayesian is not thinking in any reasonable sense of the word - instead, it’s simulating every logically possible universe. By default, we should not expect to learn much about thinking based on analysing a different type of operation that just happens to look the same in the infinite limit.

seems like it just isn't an argument against

Whatever approximation you use, it works to the extent that it approximates the ideal Bayesian calculation - and fails to the extent that it departs.

(and also I dispute the gatekeeping around the term 'thinking': when I simulate future worlds, that sure feels like thinking to me! but this is less important)

In general, I feel like I must be missing some aspect of your world-view that underlies this, because I'm seeing almost no connection between your arguments and the thesis you're putting forwards.

Comment by danielfilan on Subjective implication decision theory in critical agentialism · 2020-04-28T21:35:49.088Z · score: 4 (2 votes) · LW · GW

I'm kind of tired right now, so I might be missing something obvious, but:

It seems that subjective implication decision theory agrees with timeless decision theory on the problems considered, while diverging from causal decision theory, evidential decision theory, and functional decision theory.

Why do you say that it diverges from evidential decision theory (EDT)? AFAICT on all problems listed it does the same thing as EDT, and the style of reasoning seems pretty similar. Would you mind saying what SIDT would do in XOR mugging? (I'd try to work this out myself but for the aforementioned tiredness and the fear that I don't quite understand SIDT well enough).

Comment by danielfilan on Holiday Pitch: Reflecting on Covid and Connection · 2020-04-23T19:24:51.469Z · score: 3 (2 votes) · LW · GW

As the post says:

I wanted to tell all my friends “hey! Are you feeling lonely and disconnected? Try a Seder!”... but, well, I’m not Jewish, and most people aren’t Jewish, and… the story of Seder really deeply assumes “you are a part of Jewish history, or at least the people hosting the event are.”

Personally, as somebody who isn't Jewish and doesn't have Jewish ancestry, I would feel weird hosting a Seder or making one happen (where the feeling is that it would be the bad kind of cultural appropriation), and would also feel weird about it being a Rationalist holiday rather than a holiday for Rationalist Jewish people, just like I'd feel weird about Rationalists adopting Christmas or Obon as a Rationalist holiday, where the feeling is that religion is Actually Bad and rationalists shouldn't have religious ceremonies be an important part of their communities if they can help it.

Comment by danielfilan on Don't Use Facebook Blocking · 2020-04-22T18:38:12.148Z · score: 2 (1 votes) · LW · GW

Or have discussions that don't include disagreeing pairs.

The type of discussion I'm talking about is open-ended community discussion, so it would be weird to limit it such.

Or by having a public block list.

TBC, you need the vast majority of people to have a public block list for this to work.


Thanks, fixed.

Comment by danielfilan on Don't Use Facebook Blocking · 2020-04-22T03:18:53.760Z · score: 7 (3 votes) · LW · GW

When I block people on FB, I do so because I don't consider their contributions to discussions valuable and don't really care about them. If I'm correct about how much they matter, then presumably it's fine if they can't meaningfully participate in conversations. Furthermore, I don't think this is an unusual blocking pattern for people who block people on FB and participate in rationality community discussions.


  • There's a unilateralist curse problem where if just one person underestimates how valuable a person's contributions are, they don't get to fully participate in discussions. Hopefully this can be fixed by holding a high bar for blocking?
  • Even if you don't want people to participate in discussions, you often want them to see the discussion. A common case of this is when a norm is being hashed out that you want everybody to follow. (You could attempt to fix this by only blocking people who are peripheral enough to communities of concern that you don't care about their behaviour, but sadly your friends probably have slightly different communities of concern that you're bad at determining the boundaries of.)
  • If you're reading a big community thread, then even if nobody has blocked you, if you don't know that nobody has blocked you (and you don't) then you have an overhead of not knowing if you can see all the discussion, which probably makes discussions worse. This is a cost that you can only eliminate by having blocking be extremely uncommon.
Comment by danielfilan on April Coronavirus Open Thread · 2020-04-17T05:41:23.456Z · score: 5 (3 votes) · LW · GW

Metaculus is running a competition for accurate, publicly-posted, well-reasoned predictions about how COVID-19 will hit El Paso, Texas, in order to help the city with its disaster response. The top prize is $1,000.

Comment by danielfilan on Where should LessWrong go on COVID? · 2020-04-14T03:43:29.869Z · score: 7 (4 votes) · LW · GW

Note that Metaculus also estimates things that are likely inputs to models e.g. "the" IFR.

Comment by danielfilan on Reason as memetic immune disorder · 2020-04-13T04:31:36.175Z · score: 4 (2 votes) · LW · GW

My guess is that his name is "A. J. Jacobs" and not "A. J. Acobs"

Comment by danielfilan on [Announcement] LessWrong will be down for ~1 hour on the evening of April 10th around 10PM PDT (5:00AM GMT) · 2020-04-10T18:10:17.084Z · score: 7 (4 votes) · LW · GW

PSA: if you just say PT you won't be wrong in summer or winter.

Comment by danielfilan on An Orthodox Case Against Utility Functions · 2020-04-09T22:48:18.655Z · score: 6 (3 votes) · LW · GW

Specifically, discontinuous utility functions have always seemed basically irrational to me, for reasons related to incomputability.

Comment by danielfilan on COVID-19 and the US Elections · 2020-04-08T20:48:37.502Z · score: 2 (1 votes) · LW · GW

Relevant Metaculus question: Will the US hold mass-turnout elections for President on schedule in 2020?

Comment by danielfilan on An Orthodox Case Against Utility Functions · 2020-04-08T04:48:09.775Z · score: 7 (4 votes) · LW · GW

Yeah, a didactic problem with this post is that when I write everything out, the "reductive utility" position does not sound that tempting.

I actually found the position very tempting until I got to the subjective utility section.

Comment by danielfilan on Hanson & Mowshowitz Debate: COVID-19 Variolation · 2020-04-08T03:59:26.560Z · score: 9 (5 votes) · LW · GW

Thoughts re: my question and the responses:

  • The leverage argument for working on COVID-19 rather than existential risk (x-risk) seems weak by itself. My guess is that working on COVID-19 has about 7 orders of magnitude more tractability than x-risk (2 for knowing which risk you're working on, 2 for understanding it, and 3 for less prior effort), but that the scale of x-risk is more than 10 orders of magnitude higher than the direct disease burden of COVID-19.
  • I'm unsure if world-ending pandemics look like this one, but it's a good point.
  • The civilisational threat point seems pretty legitimate, but I'm unsure how to weigh it.
  • The harms done by factory farming of non-human animals seem comparable to the direct disease burden of COVID-19 to me.
  • OpenPhil may be funding some forecasting work with the Good Judgement Project, which is what I was referring to, but as far as I'm concerned, Metaculus (which I'm involved in) is doing better forecasting (and might also be funded by OpenPhil?). See this dashboard of predictions.
  • This situation seems like it reveals a lot of information about governance, but if all one wants to do is learn from the situation, it seems better to document it a bit now and wait later. However, if one wants to contribute to a probable effort to improve governance of pandemics at the tail end of this outbreak, that would require careful analysis and action now.
  • If I mostly wanted to raise the status of the effective altruism movement, I wouldn't push a policy proposal as unpopular as variolation - but it might become more popular as people become better at marketing it?

Overall, I now think that it's worth the full time of a small contingent of effective altruists to focus on policy responses to COVID-19 as well as broadly understanding it, and that we are probably assigning too little focus to this (although within the portfolio of attention, I wish more were directed towards neglected high-leverage interventions). Most of the arguments presented I find convincing, except for the one that I said was convincing during the call.

Comment by danielfilan on A quick and crude comparison of epidemiological expert forecasts versus Metaculus forecasts for COVID-19 · 2020-04-01T17:43:00.408Z · score: 33 (13 votes) · LW · GW

If they were perfectly calibrated on this one-off prediction, about 14 should've had the actual outcome fall in their 80% confidence interval.

Nope. Suppose I roll a 100-sided die, and all LessWrongers write down their centred 80% credible interval for where the answer should fall. If the LWers are rational and calibrated, that interval should be [10,90]. So the actual outcome will fall in everybody's credible interval or nobody's. The relevant averaging should happen across questions, not across predictors.

Comment by danielfilan on The case for C19 being widespread · 2020-03-29T21:52:05.022Z · score: 18 (5 votes) · LW · GW

Out of 645 tests done in Colorado on first responders and their families, there were zero positive results.

Comment by danielfilan on The case for C19 being widespread · 2020-03-28T05:35:59.679Z · score: 7 (4 votes) · LW · GW

Note: half of carriers don't show symptoms at the time they tested positive, could well be that they show symptoms later.

Comment by danielfilan on The case for C19 being widespread · 2020-03-28T03:05:20.686Z · score: 6 (5 votes) · LW · GW

Yes, [comparing the evidence against the theory to the evidence for it is] what I'm trying to do here.

It looks more like you listed all the evidence you could find for the theory and didn't do anything else.

Although you can have problems of self-selection and bias, when you’ve got big data like this you tend to trust it more.

I don't think this is actually how selection effects work.

You'd expect to see people to many severe cases amongst people who travelled for business a lot in January and February.

Those people are less famous so you wouldn't necessarily hear about them.

I don't quite understand what you're saying here.

That the asymptomatic rate isn't all that high, and in at least one population where everybody could get a test, you don't see a big fraction of the population testing positive.

Comment by danielfilan on The case for C19 being widespread · 2020-03-28T01:49:10.515Z · score: 3 (3 votes) · LW · GW

I'm not just cherry picking the tail-end of a normal distribution of IFRs etc. The Gupta study in particular and some of the other studies suggest a fundamentally different theory of the pandemic.

The point remains: given that some people have such a different theory, it's unclear how many supporting pieces of evidence your should expect to see, and it's important to compare the evidence against the theory to the evidence for it.

The King's Professor seems to find this number convincing.

With all due respect it's not that hard to get data that you yourself find convincing, even if you're a professor.

Tom Hanks, Prince Charles and Boris Johnson don't talk meet more people everyday then your typical Uber driver cashier etc.

They do meet more different populations of people though. So if a small number of cities have relatively widespread infection, people who visit many cities are unusually likely to get infected.

Crucially depends on the asymptomatic rate, which might very well be very high.

Not likely. About 1% of Icelanders without symptoms test positive, and all the stats on which tested people are asymptomatic that I've seen (Iceland, Diamond Princess) give about 1/2 asymptomatic at time of testing (presumably many later get sick).

Comment by danielfilan on The case for C19 being widespread · 2020-03-28T01:12:28.610Z · score: 8 (4 votes) · LW · GW

I'm particularly unimpressed by the dot points noting things that happened to very few people:

There were a few dengue in Australia and Florida where it is unusual...

Difficulties in False Negative Diagnosis of Coronavirus Disease 2019: A Case Report. Note that this was a highly symptomatic person...

One person had persistent negative swab, but tested positive through fecal samples...

“Chinese journalists have uncovered other cases of people testing negative six times before a seventh test confirmed they had the disease.”

Comment by danielfilan on The case for C19 being widespread · 2020-03-28T01:05:37.097Z · score: 20 (11 votes) · LW · GW

This seems pretty hard to evaluate because with a large number of published pre-prints on the outbreak, it's not very surprising that there would be many suggesting higher-than-expected spread. The question is how that weighs up against the opposing evidence, and to evaluate that I'd have to look at all the opposing evidence, which I don't want to do. That being said, broadly I am unconvinced. Notes on some of the dot points:

10% of 650,000 UK users of their C19 symptom tracker app showed mild symptoms. Thus 6.5m people in UK are infected

Presumably some of these people are hypochondriacs or have the flu? Also, I bet people with symptoms are more likely to use the app.

IFR=0.12% (95%CrI: 0.08-0.17%), several orders of magnitude smaller than the crude CFR estimated at 4.19%.

This isn't very important but 0.12 is only 1.5 orders of magnitude smaler than 4.19, which I wouldn't call "several".

High proportion of special populations are infected (celebrities, athletes and politicians).

Couldn't this be explained by those populations travelling more, shaking more hands, meeting more people, etc.?

Widespread testing (which isn’t random) in Iceland suggests an even lower IFR [than 0.3%].

Iceland has 2 deaths and 97 recoveries. I would say that isn't good evidence for an IFR of under 0.3%. Admittedly the number of deaths so far is 0.2% of the total number of cases, but given exponential spread most of the cases will be new and won't have had time to die yet, so the deaths to recoveries ratio seems more important (although upward-biased given who gets tested).

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-27T23:50:44.690Z · score: 2 (1 votes) · LW · GW

I am mostly interested in allowing the developers of AI systems to determine whether their system has the cognitive ability to cause human extinction, and whether their system might try to cause human extinction.

The way I read this, if the research community enables the developers to determine these things at prohibitive cost, then we mostly haven't "allowed" them to do it, but if the cost is manageable then we have. So I'd say my desiderata here (and also in my head) include the cost being manageable. If the cost of any such approach were necessarily prohibitive, I wouldn't be very excited about it.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-27T23:43:41.968Z · score: 2 (1 votes) · LW · GW

I guess I still don't understand why you believe mechanistic transparency is hard. The way I want to use the term, as far as I can tell, laptops are acceptably mechanistically transparent to the companies that create them. Do you think I'm wrong?

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-27T23:35:30.805Z · score: 4 (2 votes) · LW · GW

You could imagine a law "we will not build AI systems that use >X amount of compute unless they are mechanistically transparent". Then research on mechanistic transparency reduces the cost of such a law, making it more palatable to implement it.

If mechanistic transparency barely works and/or is super expensive, then presumably this law doesn't look very good compared to other potential laws that prevent the building of powerful AI, so you'd think that marginal changes in how good we are at mechanistic transparency would do basically nothing (unless you've got the hope of 'crossing the threshold' to the point where this law becomes the most viable such law).

The other bullet points make sense though.

Comment by danielfilan on Authorities and Amateurs · 2020-03-26T00:50:53.420Z · score: 2 (1 votes) · LW · GW

Nowhere in that comment does he say that his post was or contained a "math error". The closest thing I can find is this:

Great point about the step function. That convinces me that Bach would not have drawn a different qualitative conclusion if he had used a different functional form, no matter which one. I’ve updated my post with a note about this.

[EDIT: AFAICT Douglas_Knight is saying that Nostalgebrist's initial guess that Bach's post was sensitive to the functional form is a "math error". I wouldn't call it that, but perhaps reasonable people could disagree about this.]

Comment by danielfilan on Authorities and Amateurs · 2020-03-26T00:41:37.261Z · score: 6 (3 votes) · LW · GW

Can you link to Nostalgebrist saying that his post was a math error at SSC? I can't find it. Also, see Nostalgebrist's update at the start of the critique:

To be clear, Bach’s use of a Gaussian is not the core problem here, it’s just a symptom of the core problem.

The core problem is that his curves do not come from a model of how disease is acquired, transmitted, etc. Instead they are a convenient functional form fitted to some parameters, with Bach making the call about which parameters should change – and how much – across different hypothetical scenarios.

Having a model is crucial when comparing one scenario to another, because it “keeps your accounting honest”: if you change one thing, everything causally downstream from that thing should also change.

Without a model, it’s possible to “forget” and not update a value after you change one of the inputs to that value.

That is what Bach does here: He assumes the number of total cases over the course of the epidemic will stay the same, whether or not we do what he calls “mild mitigation measures.” But the estimate he uses for this total – like most if not all such estimates out there – was computed directly from a specific value of the replication rate of the disease. Yet, all of the “mild mitigation measures” on the table right now would lower the replication rate of the disease – that’s what “slowing it down” means – and thus would lower the total.

Comment by danielfilan on Are veterans more self-disciplined than non-veterans? · 2020-03-24T22:24:11.277Z · score: 2 (1 votes) · LW · GW

Oh, I assumed that your post was about the average effect of military training, not whether any subgroup at all benefits, since the average effect seems more relevant.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-24T06:27:02.052Z · score: 4 (2 votes) · LW · GW

This paper on scaling laws for training language models seems like it should help us make a rough guess for how training scales. According to the paper, your loss in nats if you're only limited by cost scales as , and if you're only limited by number of parameters it scales with . If we can equate those in the limit, which is not at all obvious to me, that suggests that cost goes as number of parameters to the 1.6 power, and number of parameters itself is polynomial in the number of neurons. So, the comprehension can be a little polynomial in the number of neurons, but it certainly can't be exponential.

Comment by danielfilan on Are veterans more self-disciplined than non-veterans? · 2020-03-24T05:58:53.574Z · score: 2 (1 votes) · LW · GW

I'm not sure that follows.

It doesn't strictly follow, but I think it's pretty good evidence considering how easy it is to come by.

For many jobs, we know that people in their mid 30s are generally more productive than people who are in early career, for example. But there are still anti-discrimination laws against not hiring old people.

I wouldn't consider mid-30s to be old, and my guess is that those laws are protecting people at least 40 years old - although I guess the general point holds that 'you can't discriminate based on variable X' doesn't tell you which values would be discriminated against, maybe the point of these laws is to protect non-veterans.

Re: charities, the relevant fact is that they're charities specifically for employment opportunities for this group. You don't see charities to help e.g. ex-firefighters to be employed.

Comment by danielfilan on Are veterans more self-disciplined than non-veterans? · 2020-03-24T03:04:02.931Z · score: 2 (1 votes) · LW · GW

My understanding is that there are a bunch of charitable efforts to try to convince employers to hire veterans, and that it's illegal in the US to discriminate against veterans in employment. If military veterans were actually more productive, I don't think these charities or anti-discrimination laws would exist.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-23T22:35:00.369Z · score: 2 (1 votes) · LW · GW

Re: the two claims, that's different from what I thought you meant by the distinction. I would describe both dot points as being normative claims buttressed by empirical claims. To the extent that I see a difference, it's that the first dot point is perhaps addressing low-probability risks, while the second is addressing medium-to-high-probability risks. I think that pushing for mechanistic transparency would address medium-to-high-probability risks, but don't argue for that here, since I think the arguments for medium-to-high-probability risk from AI are better made elsewhere.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-23T22:27:36.554Z · score: 2 (1 votes) · LW · GW

How this perspective could reduce the probability of catastrophes

I want to emphasize that I think the general research direction is good and will be useful and I want more people to work on it (it makes the first, second, fifth and sixth bullet points above more effective); I only disagree with the story you've presented for how it reduces x-risk.

To be clear: the way I imagine this research direction working is that somebody comes up with a theory of how to build aligned AI, roughly does that, and then uses some kind of transparency to check whether or not they succeeded. A big part of the attraction to me is that it doesn't really depend on what exact way aligned AI gets built, as long as it's built using methods roughly similar to modern neural network training. That being said, if it's as hard as you think it will be, I don't understand how it could usefully contribute to the dot points you mention.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-23T22:26:50.512Z · score: 2 (1 votes) · LW · GW

The costs of mechanistic transparency

I think my argument was more like "in the world where your modularity research works out perfectly, you get linear scaling, and then it still costs 100x to have a mechanistically-understood AI system relative to a black-box AI system, which seems prohibitively expensive".

I guess I don't understand why linear scaling would imply this - in fact, I'd guess that training should probably be super-linear, since each backward pass takes linear time, but the more neurons you have, the bigger the parameter space, and so the greater number of gradient steps you need to take to reach the optimum, right?

At any rate, I agree that 100x cost is probably somewhat too expensive. If that estimate comes from OpenAI's efforts to understand image recognition, I think it's too high, since we presumably learned a bunch about what to look for from their efforts. I also think you're underweighing the benefits of having a better theory of how effective cognition is structured. Responding to your various bullet points:

Right now we're working with subhuman AI systems where we already have concepts that we can use to understand AI systems; this will become much more difficult with superhuman AI systems.

I can't think of any way around the fact that this will likely make the work harder. Ideally it would bring incidental benefits, though (once you understand new super-human concepts you can use them in other systems).

All abstractions are leaky; as you build up hierarchies of abstractions for mechanistically understanding a neural net, the problems with your abstractions can cause you to miss potential problems.

Once you have a model of a module such that if the module worked according to your model things would be fine, you can just train the module to better fit your model. Hopefully by re-training the modules independently, to the extent you have errors they're uncorrelated and result in reduced performance rather than catastrophic failure.

With image classifiers we have the benefit of images being an input mechanism we are used to; it will presumably be a lot harder with input mechanisms we aren't used to.

I think this is a minor benefit. In most domains, specialists will understand the meanings of input data to their systems: I can't think of any counterexamples, but perhaps you can. Then, once you understand the initial modules, you can understand their outputs in terms of their inputs, and by recursion it seems like you should be able to understand the inputs and outputs of all modules.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-23T21:06:37.492Z · score: 2 (1 votes) · LW · GW

Do we mechanistically understand laptops?

I'd be shocked if there was anyone to whom it was mechanistically transparent how a laptop loads a website, down to the gates in the laptop.

So, I don't think I'm saying that you have to mechanistically understand how every single gate works - rather, that you should be able to understand intermediate-level sub-systems and how they combine to produce the functionality of the laptop. The understanding of the intermediate-level sub-systems has to be pretty complete, but probably need not be totally complete - in the laptop case, you can just model a uniform random error rate and you'll be basically right, and I imagine there should be something analogous with neural networks. Of course, you need somebody to be in charge of understanding the neurons in order to build to your understanding of the intermediate-level sub-systems, but it doesn't seem to me that there needs to be any single person who understands all the neurons entirely - or indeed even any single person who needs to understand all the intermediate-level sub-systems entirely.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-23T21:01:27.682Z · score: 2 (1 votes) · LW · GW

You could imagine a synthesis of the two stories: train a medium-level smart thing end-to-end, look at what all the modules are doing, and use those modules when training a smarter thing.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-22T01:07:25.054Z · score: 4 (2 votes) · LW · GW


I think that an important part of this is ‘agent foundations’, by which I broadly mean a theory of what agents should look like, and what structural facts about agents could cause them to display undesired behaviour. [emphasis Rohin's]

Huh? Surely if you're trying to understand agents that arise, you should have a theory of arbitrary agents rather than ideal agents.

You're right that you don't just want a theory of ideal agents. But I think it's sufficient to only have a theory of very good agents, and discard the systems that you train that aren't very good agents. This is more true the more optimistic you are about ML producing very good agents.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-22T01:07:05.277Z · score: 4 (2 votes) · LW · GW


I agree that none of the papers are incredibly convincing on their own. I'd say the most convincing empirical work so far should be the sequence of posts on 'circuits' on Distill, starting with this one, but even that isn't totally compelling. They're just meant to provide some evidence that this sort of thing is possible, and to stand in the face of the lack of papers proving that it isn't (although of course even if true it would be hard to prove).

Re: the Rashomon paper, you're right, that implication doesn't hold. But it is suggestive that there may well be 'interpretable' models that are near-optimal.

Re: the regularisation paper, I agree that it doesn't work that well. But it's the first paper in this line of work, and I think it's plausibly illustrative of things that might work.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-22T01:06:44.859Z · score: 6 (3 votes) · LW · GW


Mechanistic transparency seems incredibly difficult to achieve to me.

I think it's quite difficult to achieve but not impossible, and worth aiming for. My main take is that (a) it seems plausibly achievable and (b) we don't really know how difficult it is to achieve because most people don't seem very interested in trying to achieve it, so some people should spend a bunch of effort trying and seeing how it pans out. But, as mentioned in the first dotpoint in the criticisms section, I do regard this as an open question.

As an analogy, I don't think I understand how a laptop works at a mechanistic level, despite having a lot of training in Computer Science.

Note that I'm not asking for systems to be mechanistically transparent to people with backgrounds and training in the relevant field, just that they be mechanistically transparent to their developers. This is still difficult, but as far as I know it's possible for laptops (although I could be wrong about this, I'm not a laptop expert).

I also think that mechanistic transparency becomes much more difficult as systems become more complex: in the best case where the networks are nice and modular, it becomes linearly harder, which might keep the cost ratio the same (seems plausible to scale human effort spent understanding the net at the same rate that we scale model capacity), but if it is superlinearly harder (seems more likely to me, because I don't expect it to be easy to identify human-interpretable modularity even when present), then as model capacity increases, human oversight becomes a larger and larger fraction of the cost.

This basically seems right to me, and as such I'm researching how to make networks modular and identify their modularity structure. It feels to me like this research is doing OK and is not obviously doomed.

Otoh, things that aren't image classifiers are probably harder to mechanistically understand, especially things that are better-than-human, as in e.g. AlphaGo's move 37.

I disagree: for instance, it seems way more likely to me that planning involves crisp mathematisable algorithms than that image recognition involves such algorithms.

[Regularising models for transparency during training] will be easier to do if the transparency method is simpler, more ‘mathematical’, and minimally reliant on machine learning.

Controversial, I'm pretty uncertain but weakly lean against.

Whoa, I'm so confused by that. It seems pretty clear to me that it's easier to regularise for properties that have nicer, more 'mathematical' definitions, and if that's false then I might just be fundamentally misunderstanding something.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-22T01:06:01.343Z · score: 4 (2 votes) · LW · GW

Background desiderata

This seems normative rather than empirical.

For what it's worth, I really dislike this terminology. Of course saying "I want X" is normative, and of course it's based on empirical beliefs.

Why is the distinction between training and deployment important? Most methods of training involve the AI system acting. Are you hoping that the training process (e.g. gradient descent) leads to safety?

I'm imagining that during training, your ML system doesn't control actuators which would allow it to pose an existential risk or other catastrophe (e.g. a computer screen watched by a human, the ability to send things over the internet). Basically, I want the zero-shot analysis to be done before the AI system can cause catastrophe, which during this piece I'm conflating with the training phase, although I guess they're not identical.

I certainly hope that the training process of an advanced AI system leads to safety, but I'm not assuming that in this piece, as per the background beliefs.

Presumably many forms of interpretability techniques involve computing specific outputs of the neural net in order to understand them. Why doesn't this count as "running" the neural net?

It counts if the neural network's outputs are related to actuators that can plausibly cause existential risk or other catastrophe. As such, I think these forms of interpretability techniques are suspect, but could be fine (e.g. if you could somehow construct a sandbox environment to test your neural network where the network's sandboxed behaviour was informative about whether the network would cause catastrophe in the real world). That being said, I'm confused by this question, because I don't think I claimed in the piece that typical interpretability techniques were useful.

What about training schemes in which the agent gradually becomes more and more exposed to the real world? Where is "deployment" then?

I am basically abstracting away from the problem of figuring out when your neural network has access to actuators that can pose existential risk or other catastrophe, and hope somebody else solves this. I'd hope that in the training schemes you describe, you can determine that the agent won't cause catastrophe before its first exposure to the real world, otherwise such a scheme seems irresponsible for systems that could cause catastrophes.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-22T01:05:25.072Z · score: 4 (2 votes) · LW · GW

Background beliefs

If you include these sorts of things as guarantees, then I think the training process "by default" won't provide enough guarantees, but we might be able to get it to provide enough guarantees, e.g. by adversarial training.

I do include those sorts of things as guarantees. I do think it's possible that adversarial training will provide such guarantees, but I think it's difficult for the reasons that I've mentioned, and that a sufficient adversary will itself need to have a good deal of transparency into the system in order to come up with cases where the system will fail.

Comment by danielfilan on Did any US politician react appropriately to COVID-19 early on? · 2020-03-21T22:45:12.010Z · score: 4 (2 votes) · LW · GW

US Senator Josh Hawley (Republican from Missouri) on the 27th of February proposed legislation that would

Require that manufacturers report imminent or forecasted shortages of life-saving or life-sustaining medical devices to the FDA just as they currently do for pharmaceutical drugs. This new information on devices would be added to the FDA’s annual report to Congress on drug shortages.

Allow the FDA to expedite the review of essential medical devices that require pre-market approval in the event of an expected shortage reported by a manufacturer.

Give new authority to the FDA to request information from manufacturers of essential drugs or devices regarding all aspects of their manufacturing capacity, including sourcing of component parts, sourcing of active pharmaceutical ingredients, use of any scarce raw materials, and any other details the FDA deems relevant to assess the security of the U.S. medical product supply chain.

The purpose is allegedly to ensure that the FDA is aware of the extent to which the US medical product supply chain depends on other countries.

He also sent a letter to the heads of the Departments of Health and Human Services, Homeland Security, State, and Transportation on the 24th of January asking for guidance about when the US government would implement travel restrictions from China and other countries, and sent another letter to the FDA commissioner on the 24th of February asking about how the FDA would deal with drug and medical device shortages. It is not obvious to me whether this is meaningless grandstanding or actually useful.

Comment by danielfilan on March Coronavirus Open Thread · 2020-03-21T21:47:49.463Z · score: 5 (3 votes) · LW · GW

The New York Times heavily implies that many sick package delivery workers are feeling pressured to show up to work despite their illness.

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-21T21:40:48.345Z · score: 4 (2 votes) · LW · GW

Thanks for the detailed comment! As is typical for me, I'll respond to the easiest and least important part first.

Compare: There are a large number of NBA players, meaning that ones who are short can be found.

Short NBA players have existed: according to Wikipedia, Muggsy Bogues was 1.60 m tall (or 5 feet 3 inches) and active until 2001. The shortest currently existing NBA player is Isaiah Thomas, who is 1.75 m tall (or 5 feet 9 inches). This is apparently basically the median male height in the USA (surprisingly-to-me, both among all Americans and among African-Americans).

Comment by danielfilan on An Analytic Perspective on AI Alignment · 2020-03-21T20:04:57.180Z · score: 2 (1 votes) · LW · GW

FYI, it would be useful to know if people liked having the workflowy link.

Comment by danielfilan on March Coronavirus Open Thread · 2020-03-19T18:19:26.377Z · score: 4 (2 votes) · LW · GW

Presumably COVID-19 should update me on natural pandemics happening more frequently than I would have otherwise thought though, right?

Comment by danielfilan on How to have a happy quarantine · 2020-03-18T17:29:54.421Z · score: 2 (1 votes) · LW · GW

Care to share your username? Mine is loewenheim_swolem.