What empirical work has been done that bears on the 'freebit picture' of free will? 2019-10-04T23:11:27.328Z · score: 7 (3 votes)
A Personal Rationality Wishlist 2019-08-27T03:40:00.669Z · score: 42 (25 votes)
Verification and Transparency 2019-08-08T01:50:00.935Z · score: 36 (16 votes)
DanielFilan's Shortform Feed 2019-03-25T23:32:38.314Z · score: 19 (5 votes)
Robin Hanson on Lumpiness of AI Services 2019-02-17T23:08:36.165Z · score: 16 (6 votes)
Test Cases for Impact Regularisation Methods 2019-02-06T21:50:00.760Z · score: 65 (19 votes)
Does freeze-dried mussel powder have good stuff that vegan diets don't? 2019-01-12T03:39:19.047Z · score: 17 (4 votes)
In what ways are holidays good? 2018-12-28T00:42:06.849Z · score: 22 (6 votes)
Kelly bettors 2018-11-13T00:40:01.074Z · score: 23 (7 votes)
Bottle Caps Aren't Optimisers 2018-08-31T18:30:01.108Z · score: 55 (23 votes)
Mechanistic Transparency for Machine Learning 2018-07-11T00:34:46.846Z · score: 55 (21 votes)
Research internship position at CHAI 2018-01-16T06:25:49.922Z · score: 25 (8 votes)
Insights from 'The Strategy of Conflict' 2018-01-04T05:05:43.091Z · score: 73 (27 votes)
Meetup : Canberra: Guilt 2015-07-27T09:39:18.923Z · score: 1 (2 votes)
Meetup : Canberra: The Efficient Market Hypothesis 2015-07-13T04:01:59.618Z · score: 1 (2 votes)
Meetup : Canberra: More Zendo! 2015-05-27T13:13:50.539Z · score: 1 (2 votes)
Meetup : Canberra: Deep Learning 2015-05-17T21:34:09.597Z · score: 1 (2 votes)
Meetup : Canberra: Putting Induction Into Practice 2015-04-28T14:40:55.876Z · score: 1 (2 votes)
Meetup : Canberra: Intro to Solomonoff induction 2015-04-19T10:58:17.933Z · score: 1 (2 votes)
Meetup : Canberra: A Sequence Post You Disagreed With + Discussion 2015-04-06T10:38:21.824Z · score: 1 (2 votes)
Meetup : Canberra HPMOR Wrap Party! 2015-03-08T22:56:53.578Z · score: 1 (2 votes)
Meetup : Canberra: Technology to help achieve goals 2015-02-17T09:37:41.334Z · score: 1 (2 votes)
Meetup : Canberra Less Wrong Meet Up - Favourite Sequence Post + Discussion 2015-02-05T05:49:29.620Z · score: 1 (2 votes)
Meetup : Canberra: the Hedonic Treadmill 2015-01-15T04:02:44.807Z · score: 1 (2 votes)
Meetup : Canberra: End of year party 2014-12-03T11:49:07.022Z · score: 1 (2 votes)
Meetup : Canberra: Liar's Dice! 2014-11-13T12:36:06.912Z · score: 1 (2 votes)
Meetup : Canberra: Econ 101 and its Discontents 2014-10-29T12:11:42.638Z · score: 1 (2 votes)
Meetup : Canberra: Would I Lie To You? 2014-10-15T13:44:23.453Z · score: 1 (2 votes)
Meetup : Canberra: Contrarianism 2014-10-02T11:53:37.350Z · score: 1 (2 votes)
Meetup : Canberra: More rationalist fun and games! 2014-09-15T01:47:58.425Z · score: 1 (2 votes)
Meetup : Canberra: Akrasia-busters! 2014-08-27T02:47:14.264Z · score: 1 (2 votes)
Meetup : Canberra: Cooking for LessWrongers 2014-08-13T14:12:54.548Z · score: 1 (2 votes)
Meetup : Canberra: Effective Altruism 2014-08-01T03:39:53.433Z · score: 1 (2 votes)
Meetup : Canberra: Intro to Anthropic Reasoning 2014-07-16T13:10:40.109Z · score: 1 (2 votes)
Meetup : Canberra: Paranoid Debating 2014-07-01T09:52:26.939Z · score: 1 (2 votes)
Meetup : Canberra: Many Worlds + Paranoid Debating 2014-06-17T13:44:22.361Z · score: 1 (2 votes)
Meetup : Canberra: Decision Theory 2014-05-26T14:44:31.621Z · score: 1 (2 votes)
[LINK] Scott Aaronson on Integrated Information Theory 2014-05-22T08:40:40.065Z · score: 22 (23 votes)
Meetup : Canberra: Rationalist Fun and Games! 2014-05-01T12:44:58.481Z · score: 0 (3 votes)
Meetup : Canberra: Life Hacks Part 2 2014-04-14T01:11:27.419Z · score: 0 (1 votes)
Meetup : Canberra Meetup: Life hacks part 1 2014-03-31T07:28:32.358Z · score: 0 (1 votes)
Meetup : Canberra: Meta-meetup + meditation 2014-03-07T01:04:58.151Z · score: 3 (4 votes)
Meetup : Second Canberra Meetup - Paranoid Debating 2014-02-19T04:00:42.751Z · score: 1 (2 votes)


Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:19:20.192Z · score: 4 (2 votes) · LW · GW


I don’t know that MIRI actually believes that what we need to do is write a bunch of proofs about our AI system, but it sure sounds like it, and that seems like a too difficult, and basically impossible task to me, if the proofs that we’re trying to write are about alignment or beneficialness or something like that.

FYI: My understanding of what MIRI (or at least Buck) thinks is that you don't need to prove your AI system is beneficial, but you should have a strong argument that stands up to strict scrutiny, and some of the sub-arguments will definitely have to be proofs.

RS Seems plausible, I think I feel similarly about that claim

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:18:00.724Z · score: 2 (1 votes) · LW · GW


I also don’t think there’s a discrete point at which you can say, “I’ve won the race.” I think it’s just like capabilities keep improving and you can have more capabilities than the other guy, but at no point can you say, “Now I have won the race.”

I think that (a) this isn't a disanalogy to nuclear arms races and (b) it's a sign of danger, since at no point do people feel free to slow down and test safety.

RS I’m confused by (a). Surely you “win” the nuclear arms race once you successfully make a nuke that can be dropped on another country?

(b) seems right, idr if I was arguing for safety or just arguing for disanalogies and wanting more research

DF re (a), if you have nukes that can be dropped on me, I can then make enough nukes to destroy all your nukes. So you make more nukes, so I make more nukes (because I'm worried about my nukes being destroyed) etc. This is historically how it played out, see mid-20th C discussion of the 'missile gap'.

re (b) fair enough

(it doesn't actually necessarily play out as clearly as I describe: maybe you get nuclear submarines, I get nuclear submarine detection skills...)

RS (a) Yes, after the first nukes are created, the remainder of the arms race is relatively similar. I was thinking of the race to create the first nuke. (Arguably the US should have used their advantage to prevent all further nukes.)

DF I guess it just seems more natural to me to think of one big long arms race, rather than a bunch of successive races - like, I think if you look at the actual history of nuclear armament, at no point before major powers have tons of nukes are they in a lull, not worrying about making more. But this might be an artefact of me mostly knowing about the US side, which I think was unusual in its nuke production and worrying.

RS Seems reasonable, I think which frame you take will depend on what you’re trying to argue, I don’t remember what I was trying to argue with that. My impression was that when people talk about the “nuclear arms race”, they were talking about the one leading to the creation of the bomb, but I’m not confident in that (and can’t think of any evidence for it right now)


My impression was that when people talk about the “nuclear arms race”, they were talking about the one leading to the creation of the bomb

ah, I did not have that impression. Makes sense.

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:15:14.752Z · score: 2 (1 votes) · LW · GW

(Looking back on this, I'm now confused why Rohin doesn't think mesa-optimisers wouldn't end up being approximately optimal for some objective/utility function)

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:13:43.341Z · score: 6 (3 votes) · LW · GW


I think it would be… AGI would be a mesa optimizer or inner optimizer, whichever term you prefer. And that that inner optimizer will just sort of have a mishmash of all of these heuristics that point in a particular direction but can’t really be decomposed into ‘here are the objectives, and here is the intelligence’, in the same way that you can’t really decompose humans very well into ‘here are the objectives and here is the intelligence’.

... but it leads to not being as confident in the original arguments. It feels like this should be pushing in the direction of ‘it will be easier to correct or modify or change the AI system’. Many of the arguments for risk are ‘if you have a utility maximizer, it has all of these convergent instrumental sub-goals’ and, I don’t know, if I look at humans they kind of sort of pursued convergent instrumental sub-goals, but not really.

Huh, I see your point as cutting the opposite way. If you have a clean architectural separation between intelligence and goals, I can swap out the goals. But if you have a mish-mash, then for the same degree of vNM rationality (which maybe you think is unrealistic), it's harder to do anything like 'swap out the goals' or 'analyse the goals for trouble'.

in general, I think the original arguments are: (a) for a very wide range of objective functions, you can have agents that are very good at optimising them (b) convergent instrumental subgoals are scary

I think 'humans don't have scary convergent instrumental subgoals' is an argument against (b), but I don't think (a) or (b) rely on a clean architectural separation between intelligence and goals.

RS I agree both (a) and (b) don’t depend on an architectural separation. But you also need (c): agents that we build are optimizing some objective function, and I think my point cuts against that

DF somewhat. I think you have a remaining argument of 'if we want to do useful stuff, we will build things that optimise objective functions, since otherwise they randomly waste resources', but that's definitely got things to argue with.

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:11:47.280Z · score: 4 (2 votes) · LW · GW


A straw version of this, which isn’t exactly what I mean but sort of is the right intuition, would be like maybe if you run the same… What’s the input that maximizes the output of this neuron? You’ll see that this particular neuron is a deception classifier. It looks at the input and then based on something, does some computation with the input, maybe the input’s like a dialogue between two people and then this neuron is telling you, “Hey, is person A trying to deceive person B right now?” That’s an example of the sort of thing I am imagining.

Huh - plausible that I'm misunderstanding you, but I imagine this being insufficient for safety monitoring because (a) many non-deceptive AIs are going to have the concept of deception anyway, because it's useful, (b) statically you can't tell whether or not the network is going to aim for deception just from knowing that it has a representation of deception, and (c) you don't have a hope of monitoring it online to check if the deception neuron is lighting up when it's talking to you.

FWIW I believe in the negation of some version of my point (b), where some static analysis reveals some evaluation and planning model, and you find out that in some situations the agent prefers itself being deceptive, where of course this static analysis is significantly more sophisticated than current techniques

RS Yeah, I agree with all of these critiques. I think I’m more pointing at the intuition at why we should expect this to be easier than we might initially think, rather than saying that specific idea is going to work.

E.g. maybe this is a reason that (relaxed) adversarial training actually works great, since the adversary can check whether the deception neuron is lighting up

DF Seems fair, and I think this kind of intuition is why I research what I do.

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:09:46.596Z · score: 3 (2 votes) · LW · GW


And the concept of 3D space seems like it’s probably going to be useful for an AI system no matter how smart it gets. Currently, they might have a concept of 3D space, but it’s not obvious that they do. And I wouldn’t be surprised if they don’t.

Presumably at some point they start actually using the concept of 4D locally-Minkowski spacetime instead (or quantum loops or whatever)

and in general - if you have things roughly like human notions of agency or cause, but formalised differently and more correctly than we would, that makes them harder to analyse.

RS I suspect they don’t use 4D spacetime, because it’s not particularly useful for most tasks, and takes more computation.

But I agree with the broader point that abstractions can be formalized differently, and that there can be more alien abstractions. But I’d expect that this happens quite a bit later

DF I mean maybe once you've gotten rid of the pesky humans and need to start building dyson spheres... anyway I think curved 4d spacetime does require more computation than standard 3d modelling, but I don't think that using minkowski spacetime does.

RS Yeah, I think I’m often thinking of the case where AI is somewhat better than humans, rather than building Dyson spheres. Who knows what’s happening at Dyson sphere level. Probably should have said that in the conversation. (I think about it this way because it seems more important to align the first few AIs, and then have them help with aligning future ones.)

DF Sure. But even when you have AI that's worrying about signal transmission between different cities and the GPS system, SR is not that much more computationally intensive than Newtonian 3D space, and critical for accuracy.

Like I think the additional computational cost is in fact very low, but non-negative.

RS So like in practice if robots end up doing tasks like the ones we do, they develop intuitive physics models like ours, rather than Newtonian mechanics. SR might be only a bit more expensive than Newtonian, but I think most of the computational cost is in switching from heuristics / intuitive physics to a formal theory

(If they do different tasks than what we do, I expect them to develop their own internal physics which is pretty different from ours that they use for most tasks, but still not a formal theory)

DF Ooh, I wasn't accounting for that but it seems right.

I do think that plausibly in some situations 'intuitive physics' takes place in minkowski spacetime.

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:05:07.262Z · score: 4 (2 votes) · LW · GW

DF From your AI impacts interview:

And then I claim that conditional on that scenario having happened, I am very surprised by the fact that we did not know this deception in any earlier scenario that didn’t lead to extinction. And I don’t really get people’s intuitions for why that would be the case. I haven’t tried to figure that one out though.

I feel like I believe that people notice deception early on but are plausibly wrong about whether or not they've fixed it

RS After a few failures, you’d think we’d at least know to expect it?

DF Sure, but if your AI is also getting smarter, then that probably doesn't help you that much in detecting it, and only one person has to be wrong and deploy (if actually fixing takes a significantly longer time than sort of but not really fixing it) [this comment was written with less than usual carefulness]

RS Seems right, but in general human society / humans seem pretty good at being risk-averse (to the point that it seems to me that on anything that isn’t x-risk the utilitarian thing is to be more risk-seeking), and I’m hopeful that the same will be true here. (Also I’m assuming that it would take a bunch of compute, and it’s not that easy for a single person to deploy an AI, though even in that case I’d be optimistic, given that smallpox hasn’t been released yet.)

DF sorry by 'one person' I meant 'one person in charge of a big team'

RS The hope is that they are constrained by all the typical constraints on such people (shareholders, governments, laws, public opinion, the rest of the team, etc.) Also this significantly decreases the number of people who can do the thing, restricts it to people who are “broadly reasonable” (e.g. no terrorists), and allows us to convince each such person individually. Also I rarely think there is just one person — at the very least you need one person with a bunch of money and resources and another with the technical know-how, and it would be very difficult for these to be the same person

DF Sure. I guess even with those caveats my scenario doesn't seem that unlikely to me.

RS Sure, I don’t think this is enough to say “yup, this definitely won’t happen”. I think we do disagree on the relative likelihood of it happening, but maybe not by that much. (I’m hesitant to write a number because the scenario isn’t really fleshed out enough yet for us to agree on what we’re writing a number about.)

Comment by danielfilan on Rohin Shah on reasons for AI optimism · 2019-11-02T01:02:18.812Z · score: 11 (5 votes) · LW · GW

I had a chat with Rohin about portions of this interview in an internal slack channel, which I'll post as replies to this comment (there isn't much shared state between different threads, I think).

Comment by danielfilan on Open & Welcome Thread - October 2019 · 2019-10-22T05:40:09.701Z · score: 3 (2 votes) · LW · GW

Rationality is basically therapy [citation needed]. A common type of therapy is couples therapy. As such, you'd think that 'couples rationality' would exist. I guess it partially does (Double Crux, Againstness, "group rationality" when n=2, polyamory advocacy), but it seems less prevalent than you'd naively think. Maybe because rationalists tend to be young unmarried people? Still, it seems like a shame that it's not more of a thing.

Comment by danielfilan on Thoughts on "Human-Compatible" · 2019-10-12T07:27:47.977Z · score: 4 (2 votes) · LW · GW

As a noun: "reward uncertainty" refers to uncertainty about how valuable various states of the world are, and usually also implies some way of updating beliefs about that based on something like 'human actions', under the assumption that humans to some degree/in some way know which states of the world are more valuable and act accordingly.

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-10-11T23:48:53.653Z · score: 41 (14 votes) · LW · GW

Hot take: if you think that we'll have at least 30 more years of future where geopolitics and nations are relevant, I think you should pay at least 50% as much attention to India as to China. Similarly large population, similarly large number of great thinkers and researchers. Currently seems less 'interesting', but that sort of thing changes over 30-year timescales. As such, I think there should probably be some number of 'India specialists' in EA policy positions that isn't dwarfed by the number of 'China specialists'.

Comment by danielfilan on What empirical work has been done that bears on the 'freebit picture' of free will? · 2019-10-05T20:03:11.594Z · score: 3 (2 votes) · LW · GW

I think you might have misunderstood the question: I'm primarily asking about work done on points 1 and 2 listed in the question test, which don't mention 'free will'.

Comment by danielfilan on What empirical work has been done that bears on the 'freebit picture' of free will? · 2019-10-05T04:17:57.740Z · score: 8 (4 votes) · LW · GW

I guess you weren't interested in talking about whether this makes any sense in relation to free will

Indeed, most of your comment is off-topic in a way that I asked comments not to be. If you want to discuss that point, please write your own post or shortform comment, or write a comment in one of the linked posts.

Comment by danielfilan on World State is the Wrong Level of Abstraction for Impact · 2019-10-04T21:45:03.437Z · score: 6 (3 votes) · LW · GW

I'm not aware of others explicitly trying to deduce our native algorithm for impact. No one was claiming the ontological theories explain our intuitions, and they didn't have the same "is this a big deal?" question in mind. However, we need to actually understand the problem we're solving, and providing that understanding is one responsibility of an impact measure! Understanding our own intuitions is crucial not just for producing nice equations, but also for getting an intuition for what a "low-impact" Frank would do.

I wish you'd expanded on this point a bit more. To me, it seems like to come up with "low-impact" AI, you should be pretty grounded in situations where your AI system might behave in an undesirably "high-impact" way, and generalise the commonalities between those situations into some neat theory (and maybe do some philosophy about which commonalities you think are important to generalise vs accidental), rather than doing analytic philosophy on what the English word "impact" means. Could you say more about why the test-case-driven approach is less compelling to you? Or is this just a matter of the method of exposition you've chosen for this sequence?

Comment by danielfilan on Attainable Utility Theory: Why Things Matter · 2019-10-04T21:32:09.177Z · score: 7 (3 votes) · LW · GW

Can you give other conceptions of "impact" that people have proposed, and compare/contrast them with "How does this change my ability to get what I want?"

This is not quite what you're asking for, but I have a post on ways people have thought AIs that minimise 'impact' should behave in certain situations, and you can go through and see what the notion of 'impact' given in this post would advise. [ETA: although that's somewhat tricky, since this post only defines 'impact' and doesn't say how agent should behave to minimise it]

Comment by danielfilan on Unrolling social metacognition: Three levels of meta are not enough. · 2019-10-04T21:20:44.035Z · score: 6 (3 votes) · LW · GW

See this paper arguing that humans have the ability to solve tasks that require up to seven levels of recursive metacognition, especially when those tasks are ecologically valid (i.e. the prompts are films of interactions, and the question is to pick which sentence one of the participants is more likely to say). Abstract:

Recursive mindreading is the ability to embed mental representations inside other mental representations e.g. tohold beliefs about beliefs about beliefs. An advanced ability to entertain recursively embedded mental states is consistent with evolutionary perspectives that emphasise the importance of sociality and social cognition in human evolution: high levels of recursive mindreading are argued to be involved in several distinctive human behaviours and institutions, such as communication, religion, and story-telling. However, despite a wealth of research on first-level mindreading under the term Theory of Mind, the human ability for recursive mindreading is relatively understudied, and existing research on the topic has significant methodological flaws. Here we show experimentally that human recursive mindreading abilities are far more advanced than has previously been shown. Specifically, we show that humans are able to mindread to at least seven levels of embedding, both explicitly, through linguistic description, and implicitly, through observing social interactions. However, our data suggest that mindreading may be easier when stimuli are presented implicitly rather than explicitly. We argue that advanced mindreading abilities are to be expected in an extremely social species such as our own, where the ability to reason about others' mental states is an essential, ubiquitous and adaptive component of everyday life.

[ETA: the paper was actually brought to my attention by the author of the OP]

Comment by danielfilan on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T20:31:15.046Z · score: 33 (13 votes) · LW · GW

I see this in a different light: as far as I can tell, Yann LeCun believes that the way to advance AI is to tinker around, take opportunities to make advances when it seems feasible, find ways of fixing problems that come up in an ad-hoc, atheoretic manner (see e.g. this link), and then form some theory to explain what happened; while Stuart Russell thinks that it's important to have a theory that you really believe in drive future work. As a result, I read LeCun as saying that when problems come up, we'll see them and fix them by tinkering around, while Russell thinks that it's important to have a theory in place before-hand to ensure that bad enough problems don't come up and/or ensure that we already know how to solve them when they do.

Comment by danielfilan on Deducing Impact · 2019-10-02T19:42:19.339Z · score: 2 (1 votes) · LW · GW

If I'm already somewhat familiar with your work and ideas, do you still recommend these exercises?

Comment by danielfilan on Value Impact · 2019-10-02T19:30:33.408Z · score: 4 (2 votes) · LW · GW

It's interesting to me to consider the case of me getting into a PhD program at UC Berkeley, which felt pretty impactful. It wasn't that I intrinsically valued being a PhD student at Berkeley, and it wasn't just that being a PhD student at Berkeley objectively gave any agent greater ability to achieve their goals (although they pay you, so it's true to some extent), it was that it gave me greater ability to achieve my goals by (a) being able to learn more about AI alignment and (b) getting to hang out with my friends and friends-of-friends in the Bay Area. (a) and (b) weren't automatic consequences of being admitted to the program, I had to do some work to make them happen, and they aren't universally valuable. A simplified example of this kind of thing is somebody giving you a non-transferrable $100 gift voucher for GameStop.

Comment by DanielFilan on [deleted post] 2019-09-29T06:45:26.603Z

good judgment project

should be 'Good Judgment Project'

systematically testing the efficaciousness of those techniques

should be 'efficacy'

Why do we Need to Test things?

either 'Things' should be capitalised or nothing here should be

comes from the pre UFC history history of isolated martial

should be 'pre-UFC', and 'history' should only be there once

Kong Fu

it's typically spelled 'Kung fu', although standard pinyin spelling would actually be gōngfu. Similarly, karate+capoeira shouldn't be capitalised

In the ancient world, people would leave their homes and travel great distances to train with Kong Fu masters. Different schools in the Kong Fu tradition would develop and compete with each other according to their own traditions, and there was no doubt in anyone's mind that Kong Fu masters could teach the art of fighting. But in the end, the Gracies showed that countless of these millennia old isolated martial arts traditions were totally inferior to the relatively new Gracie methods.

there was no doubt in anyone's mind that Kong Fu masters could teach the art of fighting.

citation needed. for this bit but also the whole story.

But in the end, the Gracies showed that countless of these millennia old isolated martial arts traditions were totally inferior to the relatively new Gracie methods.

Doesn't disprove the claim that 'kung fu masters could teach the art of fighting'. Like, I don't doubt that learning kung fu really did teach you how to fight well, move efficiently, etc.

millennia old isolated martial arts traditions

Kung fu is plausibly 1.5-3 millenia old, karate is ~1.5 centuries old, capoeira is ~5 centuries old.

So you see, we cannot trust our eyes.

None of the problems you mention are from optical illusions. I'd instead say 'we can't trust our intuitions/impressions'.

make it harder for reality to fairly judge between our hypotheses.

I don't think reality judges hypotheses, or that anybody 'judges between' hypotheses, but rather that reality reveals things that distinguishes between hypotheses.

Most methods that we come up with will not work, but they will sound good enough and seem reasonable enough to convince us that they work anyway.

citation/evidence needed

In any case, I do not think a hypothesis, such as the efficacy of Double Crux, being hard to test, gives us license to believe whatever we want about it. We still have to test it, and our senses and intuitions can still only give us limited evidence one way or the other.

'we still have to test it' seems wrong to me. Maybe it just is too expensive to test, and we have to rely on our best reasoning about the truth or falsehood of the hypothesis.

If you are going to help me test the method of setting five minute times, let me first say that I seriously appreciate it!

should be 'five minute timers'. Also how are you evaluating the results of this test?

It might be that Double Crux cannot be taught in a 60 minute module and requires more training before the effects are noticeable.

weird that you're referencing your test before describing it.


should be 'Step 1' (similarly for later steps)

Obtain participants on polity.

What is polity?

Filter participants for college level education.



should be 'Group 1', similarly with later groups.

They will be given a module (unsure how long this module will be at this time, but I hope less than 40 minutes) that teaches double crux

(a) I think that you should capitalise 'double crux' here like you do elsewhere, and (b) it's more honest to say 'They will be given a module that attempts to teach double crux' or something.

Rewards will be given according to a discretized version of a proper scoring rule, to incentivize calibrated probability assignments.

Why discrcetise?

I will hire people who can credibly claim that they know how to teach double crux

Why do they need to know how to teach it if they just have to judge how DC-y the convo is?

If the conversation does not seem like a double crux at all, then they should give it a 1; if it seems like a totally paradigmatic example of a double crux, then they should give it a 10.

I think it would be useful to say what a 5 would be.

I will calculate participants’ brier scores on the problem they discussed before the discussion, and the brier score they got after the discussion. I will calculate the means of the differences between pre conversation brier scores and post conversation brier scores for each group.

(a) Brier should be capitalised, (b) log scores are more natural here than Brier scores imo (log scores are about how many bits away from the truth you are, Brier scores are just made up to incentivise honest reporting if you pay people according to them)

I will calculate this by taking an average of both KL divergences for a pair before the conversation, and comparing it to average KL divergences after the conversation.

IMO total variation distance is more natural here (average of KL divergences is a weird object).

Unfortunately, many Effective Altruists and Rationalists already know about Double Crux,

(a) I don't think you should capitalise 'effective altruists' and 'rationalists', and (b) I dislike how that phrasing implies that they are effectively altruistic or rational.


I feel like the ending could be stronger, it feels like the post just stops. Maybe you could thank people who helped you think about this :)

Comment by danielfilan on Eli's shortform feed · 2019-09-28T21:10:01.837Z · score: 2 (1 votes) · LW · GW

(this appears to be a problem where it displays differently on different browser/OS pairs)

Comment by danielfilan on Eli's shortform feed · 2019-09-28T21:01:42.455Z · score: 2 (1 votes) · LW · GW

To me, it looks like the numbers in the General section go 1, 4, 5, 5, 6, 7, 8, 9, 3, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 2, 3, 3, 4, 2, 3, 4 (ignoring the nested numbers).

Comment by danielfilan on Eli's shortform feed · 2019-09-27T23:18:52.368Z · score: 2 (1 votes) · LW · GW

FYI the numbering in the (General) section is pretty off.

Comment by danielfilan on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T22:03:20.959Z · score: 9 (2 votes) · LW · GW

Presumably you could take the majority vote of comments left in a 2 hour span?

Comment by danielfilan on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T21:33:24.340Z · score: 6 (3 votes) · LW · GW

I'm surprised that LW being down for a day isn't on your list of cons. [ETA: or rather the LW home page]

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-09-26T18:25:14.942Z · score: 15 (4 votes) · LW · GW

I get to nuke LW today AMA.

Comment by danielfilan on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T18:24:32.596Z · score: 4 (2 votes) · LW · GW

(FYI California is currently in the PDT time zone, not PST)

Comment by danielfilan on Modes of Petrov Day · 2019-09-17T22:56:10.937Z · score: 2 (1 votes) · LW · GW

Does this app provide some probability of a false alarm?

Comment by danielfilan on Can this model grade a test without knowing the answers? · 2019-09-15T23:02:25.361Z · score: 5 (3 votes) · LW · GW

That being said, as long as you know which answers will be more common among people who don't know the right answer, and roughly how much more common they will be, you can probably add that knowledge to the algorithm without too much difficulty and have it still work as long as there are some questions where you don't expect the uninformed to reliably lean one way.

Comment by danielfilan on Can this model grade a test without knowing the answers? · 2019-09-13T03:57:10.322Z · score: 9 (3 votes) · LW · GW

One example of the ability of the model: in the paper, the model is run on 120 responses to a quiz consisting of 60 Raven's Progressive Matrices questions, each question with 8 possible answers. As it happens, no responder got more than 50 questions right. The model correctly inferred the answers to 46 of the questions.

A key assumption in the model is that errors are random: so, in domains where you're only asking a small number of questions, and for most questions a priori you have reason to expect some wrong answers to be more common than the right one (e.g. "What's the capital of Canada/Australia/New Zealand"), I think this model would not work (although if there were enough other questions such that good estimates of responder ability could be made, that could ameliorate the problem). If I wanted to learn more, I would read this 2016 review paper of the general field.

Comment by danielfilan on Open & Welcome Thread - September 2019 · 2019-09-10T20:19:19.774Z · score: 4 (2 votes) · LW · GW

I often find myself seeing a cool post, and then thinking that it would take too much time to read it now but that I don't want to forget it. I don't like browser-based solutions for this.

Comment by danielfilan on Open & Welcome Thread - September 2019 · 2019-09-10T20:18:28.008Z · score: 12 (5 votes) · LW · GW

Feature request: let me mark a post as 'to read', which should have it appear in my recommendations until I read it.

Comment by danielfilan on What Programming Language Characteristics Would Allow Provably Safe AI? · 2019-09-05T19:00:05.551Z · score: 6 (3 votes) · LW · GW

Here's a public github for coda, the language he's been working on, with a bit written about it.

Comment by danielfilan on One Way to Think About ML Transparency · 2019-09-04T00:15:25.460Z · score: 15 (5 votes) · LW · GW

Update: I reread the post (between commenting that and now, as prep for another post currently in draft form). It is better than I remember, and I'm pretty proud of it.

Comment by danielfilan on One Way to Think About ML Transparency · 2019-09-03T16:05:44.767Z · score: 3 (2 votes) · LW · GW

If the human knows the logic of the random number generator that was used to initialize the parameters of the original network, they have no problem to manually run the same logic themselves.

Presumably the random seed is going to be big and complicated.

Comment by danielfilan on One Way to Think About ML Transparency · 2019-09-03T00:27:01.921Z · score: 5 (3 votes) · LW · GW

Ah, gotcha. I think this is a bit different to compressibility: if you formalise it as Kolmogorov complexity, then you can have a very compressible algorithm that in fact you can't compress given time limits, because it's too hard to figure out how to compress it. This seems more like 'de facto compressibility', which might be formalised using the speed prior or a variant.

Comment by danielfilan on One Way to Think About ML Transparency · 2019-09-03T00:18:46.443Z · score: 5 (3 votes) · LW · GW

One question remains: are these models simulatable? Strictly speaking, no. A human given the decision tree would still be able to get a rough idea of why the neural network was performing a particular decision. However, without the model weights, a human would still be forced to make an approximate inference rather than follow the decision procedure exactly. That's because after the training procedure, we can only extract a decision tree that approximates the neural network decisions, not extract a tree that perfectly simulates it.

Presumably if the extraction procedure is good enough, then the decision tree gets about as much accuracy as the neural network, and if inference times are similar, then you could just use the decision tree instead, and think of this as a neat way of training decision trees by using neural networks as an intermediate space where gradient descent works nicely.

Comment by danielfilan on One Way to Think About ML Transparency · 2019-09-03T00:08:54.423Z · score: 8 (5 votes) · LW · GW

In a more complex ad-hoc approach, we could instead design a way to extract a theory simulatable algorithm that our model is implementing. In other words, given a neural network, we run some type of meta-algorithm that analyzes the neural network and spits out psuedocode which describes what the neural network uses to make decisions. As I understand, this is roughly what Daniel Filan writes about in Mechanistic Transparency for Machine Learning.

I endorse this as a description of how I currently think about mechanistic transparency, although I haven't reread the post (and imagine finding it somewhat painful to), so can't fully endorse your claim.

Comment by danielfilan on One Way to Think About ML Transparency · 2019-09-03T00:05:59.657Z · score: 3 (2 votes) · LW · GW

In theory simulatability, the human would not necessarily be able to simulate the algorithm perfectly, but they would still say that they algorithm is simulatable in their head, "given enough empty scratch paper and time." Therefore, MCTS is interpretable because a human could in theory sit down and work through an entire example on piece of paper. It may take ages, but the human would eventually get it done; at least, that's the idea. However, we would not say that some black box ANN is interpretable, because even if the human had several hours to stare at the weight matrices, once they were no longer acquainted with the exact parameters of the model, they would have no clue as to why the ANN was making decisions.

I'm not sure what distinction you're drawing here - in both cases, you can simulate the algorithm in your head given enough scratch paper and time. To me, the natural distinction between the two is compressibility, not simulability, since all algorithms that can be written in standard programming languages can be simulated by a Turing machine, which can be simulated by a human with time and scratch paper.

Comment by danielfilan on [AN #62] Are adversarial examples caused by real but imperceptible features? · 2019-08-30T22:04:20.124Z · score: 2 (1 votes) · LW · GW

I'm sort of confused by the main point of that post. Is the idea that the robot can't stack blocks because of a physical limitation? If so, it seems like this is addressed by the first initial objection. Is it rather that the model space might not have the capacity to correctly imitate the human? I'd be somewhat surprised by this being a big issue, and at any rate it seems like you could use the Wasserstein metric as a cost function and get a desirable outcome. I guess instead we're instead imagining a problem where there's no great metric (e.g. text answers to questions)?

Comment by danielfilan on Open & Welcome Thread - August 2019 · 2019-08-30T21:41:17.799Z · score: 7 (3 votes) · LW · GW

A colleague notes:

  • an into deep learning course will be useful even once you've taken the coursera course
  • this textbook is mathematically oriented and good (although I can't vouch for that personally)
  • depth-first search from research agendas seems infeasible for someone without machine learning experience, with the exception of MIRI's agent foundations agenda
Comment by danielfilan on How to Make Billions of Dollars Reducing Loneliness · 2019-08-30T20:39:36.520Z · score: 8 (5 votes) · LW · GW

It takes less effort to rinse a dish before putting it in a diswasher than it does to clean it by hand (in fact often you don't need to rinse it), and the machine beeps once your dishes are dry. These factors, plus the batch processing, make dishwashers less effortful per dish for me.

Comment by danielfilan on Test Cases for Impact Regularisation Methods · 2019-08-30T20:33:13.215Z · score: 2 (1 votes) · LW · GW

Another example is described in Stuart Armstrong's post about a bucket of water. Unlike the test cases in this post, it doesn't have an unambiguous answer independent of the task specification.

Comment by danielfilan on Open & Welcome Thread - August 2019 · 2019-08-30T03:17:16.311Z · score: 5 (3 votes) · LW · GW

My guess is that taking an ML coursera course is the best next step (or perhaps a ML course taught at your university, if that's a viable option).

More speculatively, it might be a good idea to read a research agenda (e.g. Concrete Problems in AI Safety, the Embedded Agency Sequence), dig into sections that seem interesting, and figure out what you need to know to understand the content and the cited papers. But this probably won't work until you understand the basics of ML (for things like CPAIS) or mathematical logic and Bayesian decision theory (for things like the embedded agency sequence).

Comment by danielfilan on A Personal Rationality Wishlist · 2019-08-29T21:51:01.537Z · score: 3 (2 votes) · LW · GW

Sensible advice, although I'm more interested in the metaphorical case where this isn't possible (which is natural to me, since my actual room has curtains but no doors, partially because I'm not actually worried about housemate snooping).

Comment by danielfilan on A Personal Rationality Wishlist · 2019-08-29T21:49:44.402Z · score: 6 (3 votes) · LW · GW

A counterpoint to your first sentence:

The quality of roads is relevant, but not really the answer. Bicycles can be ridden on dirt roads or sidewalks (although the latter led to run-ins with pedestrians and made bicycles unpopular among the public at first). And historically, roads didn’t improve until after bicycles became common—indeed it seems that it was in part the cyclists who called for the improvement of roads.

From this post about why humanity waited so long for the bicycle. I particularly recommend the discussion of how long it took to invent pedals and gears.

Comment by danielfilan on A Personal Rationality Wishlist · 2019-08-29T21:46:11.574Z · score: 3 (2 votes) · LW · GW

At this juncture, it seems important to note that all examples I can think of took place on Facebook, where you can just end interactions like this without it being awkward.

Comment by danielfilan on A Personal Rationality Wishlist · 2019-08-29T21:45:13.758Z · score: 2 (1 votes) · LW · GW

I assume OP is taking the perspective of his friends, who are annoyed by this behavior, rather than the perspective of the anime-fans, who don't necessarily see anything wrong with the situation.

In the literal world, I'm an anime fan, but the situation seems basically futile: the people recommending anime seem like they're accomplishing nothing but generating frustration. More metaphorically, I'm mostly interested in how to prevent the behaviour either as somebody complaining about anime or as a third party, and secondarily interested in how to restrain myself from recommending anime.

Comment by danielfilan on A Personal Rationality Wishlist · 2019-08-27T21:53:20.948Z · score: 8 (7 votes) · LW · GW

I agree that many people do not understand how bicycles work, if that was your point. My claim was that it is possible to look at a bicycle and understand how it works, not that it was inevitable for everybody who interacts with a bicycle to do so. I think the prevalence of misunderstanding of bicycles is not strong evidence against my claim, since my guess is that most people who interact with bicycles don't spend time looking at them and trying to figure out how they work. If people looking at bicycles still couldn't reproduce them, that would be strong evidence against my claim, but that was relatively uncommon.

[ETA: although I see how this undermines the idea that it only requires 'a little' thought, since that brings to mind thought that only takes a few seconds.]

Comment by danielfilan on A Personal Rationality Wishlist · 2019-08-27T20:02:20.463Z · score: 5 (3 votes) · LW · GW

Sorry, I meant the feelings that are more prevalent among more neurotic people, like "anxiety, worry, fear, anger, frustration, envy, jealousy, guilt, depressed mood, and loneliness" (list taken from Wikipedia).