post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by DanielFilan · 2019-09-29T06:45:26.603Z · LW(p) · GW(p)

good judgment project

should be 'Good Judgment Project'

systematically testing the efficaciousness of those techniques

should be 'efficacy'

Why do we Need to Test things?

either 'Things' should be capitalised or nothing here should be

comes from the pre UFC history history of isolated martial

should be 'pre-UFC', and 'history' should only be there once

Kong Fu

it's typically spelled 'Kung fu', although standard pinyin spelling would actually be gōngfu. Similarly, karate+capoeira shouldn't be capitalised

In the ancient world, people would leave their homes and travel great distances to train with Kong Fu masters. Different schools in the Kong Fu tradition would develop and compete with each other according to their own traditions, and there was no doubt in anyone's mind that Kong Fu masters could teach the art of fighting. But in the end, the Gracies showed that countless of these millennia old isolated martial arts traditions were totally inferior to the relatively new Gracie methods.

there was no doubt in anyone's mind that Kong Fu masters could teach the art of fighting.

citation needed. for this bit but also the whole story.

But in the end, the Gracies showed that countless of these millennia old isolated martial arts traditions were totally inferior to the relatively new Gracie methods.

Doesn't disprove the claim that 'kung fu masters could teach the art of fighting'. Like, I don't doubt that learning kung fu really did teach you how to fight well, move efficiently, etc.

millennia old isolated martial arts traditions

Kung fu is plausibly 1.5-3 millenia old, karate is ~1.5 centuries old, capoeira is ~5 centuries old.

So you see, we cannot trust our eyes.

None of the problems you mention are from optical illusions. I'd instead say 'we can't trust our intuitions/impressions'.

make it harder for reality to fairly judge between our hypotheses.

I don't think reality judges hypotheses, or that anybody 'judges between' hypotheses, but rather that reality reveals things that distinguishes between hypotheses.

Most methods that we come up with will not work, but they will sound good enough and seem reasonable enough to convince us that they work anyway.

citation/evidence needed

In any case, I do not think a hypothesis, such as the efficacy of Double Crux, being hard to test, gives us license to believe whatever we want about it. We still have to test it, and our senses and intuitions can still only give us limited evidence one way or the other.

'we still have to test it' seems wrong to me. Maybe it just is too expensive to test, and we have to rely on our best reasoning about the truth or falsehood of the hypothesis.

If you are going to help me test the method of setting five minute times, let me first say that I seriously appreciate it!

should be 'five minute timers'. Also how are you evaluating the results of this test?

It might be that Double Crux cannot be taught in a 60 minute module and requires more training before the effects are noticeable.

weird that you're referencing your test before describing it.

Step1

should be 'Step 1' (similarly for later steps)

Obtain participants on polity.

What is polity?

Filter participants for college level education.

Why?

Group1

should be 'Group 1', similarly with later groups.

They will be given a module (unsure how long this module will be at this time, but I hope less than 40 minutes) that teaches double crux

(a) I think that you should capitalise 'double crux' here like you do elsewhere, and (b) it's more honest to say 'They will be given a module that attempts to teach double crux' or something.

Rewards will be given according to a discretized version of a proper scoring rule, to incentivize calibrated probability assignments.

Why discrcetise?

I will hire people who can credibly claim that they know how to teach double crux

Why do they need to know how to teach it if they just have to judge how DC-y the convo is?

If the conversation does not seem like a double crux at all, then they should give it a 1; if it seems like a totally paradigmatic example of a double crux, then they should give it a 10.

I think it would be useful to say what a 5 would be.

I will calculate participants’ brier scores on the problem they discussed before the discussion, and the brier score they got after the discussion. I will calculate the means of the differences between pre conversation brier scores and post conversation brier scores for each group.

(a) Brier should be capitalised, (b) log scores are more natural here than Brier scores imo (log scores are about how many bits away from the truth you are, Brier scores are just made up to incentivise honest reporting if you pay people according to them)

I will calculate this by taking an average of both KL divergences for a pair before the conversation, and comparing it to average KL divergences after the conversation.

IMO total variation distance is more natural here (average of KL divergences is a weird object).

Unfortunately, many Effective Altruists and Rationalists already know about Double Crux,

(a) I don't think you should capitalise 'effective altruists' and 'rationalists', and (b) I dislike how that phrasing implies that they are effectively altruistic or rational.

--

I feel like the ending could be stronger, it feels like the post just stops. Maybe you could thank people who helped you think about this :)