Case Study: the Death Note Script and Bayes

gwern

Case Study: the Death Note Script and Bayes

post by gwern · 2013-01-04T04:33:37.458Z · LW · GW · Legacy · 44 comments

44 comments

"Who wrote the Death Note script?"

I give a history of the 2009 leaked script, discuss internal & external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.

If you're already familiar this particular leaked 2009 live-action script, please write down your current best guess as to how likely it is to be authentic.

This is intended to be easy to understand and essentially beginner-level for Bayes's theorem and fermi estimates, like my other Death Note essay (information theory, crypto) or my console insurance page (efficient markets, positive psychology, expected value).

Be sure to check out the controversial twist ending!

(I'm sorry to post just a link, but I briefly thought about writing it and all the math in the LW edit box and decided that cutting my wrists sounded both quicker and more enjoyable. Unfortunately, there seems to be a math problem in the Google Chrome/Chromium browser where fractions simply don't render, apparently due to not enabling Webkit's MathML code; if fractions don't render for you, well, I know the math works well in my Iceweasel and it seems to work well in other Firefoxes.)

44 comments

Comments sorted by top scores.

comment by AlexSchell · 2013-01-04T17:17:46.050Z · LW(p) · GW(p)

Nicely done. Since this was presumably partly intended as a Bayes tutorial, it might benefit from an explanation of the role your assumption of conditional independence plays in your calculations, and how much more complicated this would have been without that assumption.

Speaking of this, I personally would have liked a back-of-the-envelope calculation on how much of an effect the independence assumption has on your results, maybe by differentiating between "highly competent fake" and "normal fake" hypotheses and continuing to assume independence.

Replies from: gwern

↑ comment by gwern · 2013-01-04T19:25:50.835Z · LW(p) · GW(p)

how much more complicated this would have been without that assumption.

I'll add a footnote mentioning it.

Speaking of this, I personally would have liked a back-of-the-envelope calculation on how much of an effect the independence assumption has on your results, maybe by differentiating between "highly competent fake" and "normal fake" hypotheses and continuing to assume independence.

I'm not sure what that calculation would look like. I don't think I've ever tried conditionals before.

Replies from: AlexSchell

↑ comment by AlexSchell · 2013-01-13T06:56:57.000Z · LW(p) · GW(p)

I would have thought more than a footnote would have been helpful. To avoid lazy other-optimizing, I've written some content below which you may use/adapt/modify as you see fit.

The odds form of Bayes' theorem is this:

P(a|b)/P(~a|b) = P(a)/P(~a) x P(b|a)/P(b|~a)

In English, the ratio of the posterior probabilities (the posterior odds of a) equals the product of the ratio of the prior probabilities and the likelihood ratio.

What we are interested in is the likelihood ratio p(e|is-real)/p(e|is-not-real), where e is all external and internal evidence we have about the DN script.

e is equivalent to the conjunction of each of the 13 individual pieces of evidence, which I'll refer to as e1 through e13:

e = e1 & e2 & ... & e13

So the likelihood ratio we're after can be written like this:

p(e|is-real)/p(e|is-not-real) = p(e1&e2&...&e13|is-real)/p(e1&e2&...&e13|is-not-real)

Now, it follows from probability theory that the above is equivalent to

LR(e) = LR(e1) LR(e2|e1) LR(e3|e1&e2) LR(e4|e1&e2&e3) ... * LR(e13|e1&e2&...&e12)

(The ordering is arbitrary.)

Now comes the point where the assumption of conditional independence simplifies things greatly. The assumption is that the "impact" of each evidence (i.e. the likelihood ratio associated with it) does not vary based on what other evidence we already have. That is, for any evidence ei its likelihood ratio is the same no matter what other evidence you add to the right-hand side:

LR(ei|c) = LR(ei) for any conjunction c of other pieces of evidence

Assuming conditional independence simplifies the expression for LR(e) greatly:

LR(e) = LR(e1) LR(e2) LR(e3) ... LR(e13)

On the other hand, the conditional independence assumption is likely to have a substantial impact on what value LR(e) takes. This is because most pieces of evidence are expected to correlate positively with one another instead of being independent. For example, if you know that the script is a 20,000 word long Hollywood plot and that the stylometric analysis seems to check out, then if you are dealing with a fake script (is-not-real) it is an extremely elaborate fake, and (e.g.) the PDF metadata are almost certain to "check out" and so provide much weaker evidence for is-real than the calculation assuming conditional independence suggests. On the other hand, the evidence of legal takedowns seems unaffected by this concern, as even a competent faker would hardly be expected to create the evidence of takedowns.

[The suggested back-of-the-envelope calculation could go along the lines of the last paragraph, or as I said in the grandparent you might get rid of most of the problematic correlations by considering 2-3 hypotheses about the faker's level of skill and motivation (via a likelihood vector instead of ratio). My own guess is that stylometrics pretty much screens off all other internal evidence as well as dating and (most of) credit, but leaves takedown unaffected.]

Note to self: consider testing the obvious conspiracy theory here.

Replies from: gwern

↑ comment by gwern · 2013-01-17T02:44:48.304Z · LW(p) · GW(p)

Thanks for the writeup. I'll add that as a footnote.

comment by Sniffnoy · 2013-01-04T07:28:14.340Z · LW(p) · GW(p)

I'm confused -- isn't the probability that a given pair occurs at random 1/29 rather than 1/15?

Edit: Oops, this was thinking pairings rather than trees. Corrected in reply.

Replies from: Sniffnoy, Kaj_Sotala

↑ comment by Sniffnoy · 2013-01-09T03:59:40.205Z · LW(p) · GW(p)

OK, I think the correct probability here is 1/57. According to OEIS (it cites Stanley as a reference; I haven't taken the time to try to understand why this would be the case), the number of unordered binary trees on a set of n+1 labelled leaves is given by 1*3*...*(2n-1). If we want to count how many of these have two particular leaves directly next to each other, well, we're essentially merging them into one super-leaf; thus we want the same thing on one fewer leaf. Hence the number we want is (1*3*...*55)/(1*3*...*57)=1/57. More generally, if we had n leaves, we'd have 1/(2n-3).

Edit: OK, not going to write out the whole thing here unless someone really wants, but for those skeptical of the above formula, you can prove it with exponential generating functions.

↑ comment by Kaj_Sotala · 2013-01-04T08:45:44.745Z · LW(p) · GW(p)

That's if you fix the position of the first item in the pair: if item 1 literally the first item in a sequence, then there is indeed a 1/29 chance that the second item of the pair will appear next to it. But if the pair can be found anywhere...

Replies from: Kindly, Unnamed

↑ comment by Kindly · 2013-01-04T14:36:16.779Z · LW(p) · GW(p)

But you don't add the different probabilities for where the first item can be. No matter where the first item in the pair occurs, there is a 1/29 chance the second item will be next to it.

Another way of thinking about it. For any given item, there are 29 other items. Only one of these can be paired with the first, and all these events are equally likely. The probabilty has to be 1/29 and not 1/15, because 29 copies of 1/15 don't add up to 1.

Actually, the probability is slightly lower, because some items are not leaves at all. If we take the tree in the article as representative, then we expect roughly 10 pairs among the 30 items, which gives a probability of 2/87: with probability 2/3, the first item ends up as half of a pair, and with probability 1/29, the second item ends up as the other half of that same pair.

In the movie subtree, we have 12 items, so the probability of being paired is 2/33 rather than 1/6.

Edit: Laplace-adjusting the "is a random item in a pair" probability, we get 11/32 as an estimate instead, and 1/16 for the final answer. Note that because of the reasonably large sample size, this doesn't make a huge difference.

Replies from: gwern

↑ comment by gwern · 2013-01-04T16:36:24.864Z · LW(p) · GW(p)

there is a 1/29 chance the second item will be next to it.

'Next to it', perhaps, but wouldn't that other alternative be putting it on an entirely different branch and so less similar as it's not in the same cluster? movie-fearandloathing may be 'next to' fanfiction-remiscent-afterthought-threecharacters in the clustering, but not nearly as similar to it as movie-1492conquestparadise... so I think that analysis is less right than my own simple one.

Replies from: Kindly

↑ comment by Kindly · 2013-01-04T17:17:32.873Z · LW(p) · GW(p)

By "next to it" I meant paired with it, sorry. Not all items have another item paired with them, which is where the correction factor of 2/3 comes from.

Replies from: gwern

↑ comment by gwern · 2013-01-04T19:18:12.920Z · LW(p) · GW(p)

Not all items have another item paired with them, which is where the correction factor of 2/3 comes from.

Ah, I see. I'm not sure how I should deal with the non-pairing or multiple node groups; I didn't take them into account in advance, and anything based on observing the tree that was generated feels ad hoc. So if the odds of the pairing given random chance is overestimated, that means the strength of the pairing is being underestimated, right, and the likelihood ratio is weaker than it 'should' be? I'm fine with leaving that alone: as I said, when possible I tried to make conclusions as weak as possible.

Replies from: Kindly

↑ comment by Kindly · 2013-01-04T19:37:38.021Z · LW(p) · GW(p)

What do the pairings even mean, exactly? I would expect two nodes to be paired iff they are closer to each other than to any other node. If this is the case, then under a random-distance model with n nodes the probability that two specific nodes are paired is 1/(2n-3).

Replies from: gwern

↑ comment by gwern · 2013-01-04T20:32:50.797Z · LW(p) · GW(p)

As far as I know, it means that they are closer, yes.

↑ comment by Unnamed · 2013-01-04T19:32:04.421Z · LW(p) · GW(p)

If you took 30 people, and randomly put them into 15 pairs, then the probability that Person A would be paired with Person Z is 1/29. Person A is equally likely to be paired with any of the 29 other people.

If you took 15 women & 15 men, and randomly put them into 15 woman-man pairs, then the probability that Woman A would be paired with Man Z is 1/15. Woman A is equally likely to be paired with any of the 15 men.

The stylometrics analysis resembles the former situation, with p=1/29. The script could've been paired with any of the 29 other items.

comment by benelliott · 2013-01-09T00:57:42.129Z · LW(p) · GW(p)

On thing that struck me, using Bayes separately on all those pieces of evidence assumes independance, but it seems that conditioning on it being a fake, lots of the observations used as evidence all correlate with the faker being generally competent and fastidious, e.g. the sort of person who would get the address right is more likely to also get the authorship, formatting, PDF software and timezone right.

Replies from: gwern

↑ comment by gwern · 2013-01-09T02:16:51.799Z · LW(p) · GW(p)

That was pointed out in the essay two or three times, and has already been mentioned in the comments here as well.

Replies from: benelliott

↑ comment by benelliott · 2013-01-09T02:48:26.143Z · LW(p) · GW(p)

Ah, sorry about that. Should have read the footnotes.

Replies from: gwern

↑ comment by gwern · 2013-01-09T02:57:53.466Z · LW(p) · GW(p)

Well, it was also towards the end as part of a list of reasons to not believe the final estimate.

Replies from: benelliott

↑ comment by benelliott · 2013-01-09T12:46:20.228Z · LW(p) · GW(p)

That mentions there are 'reasons' to believe they might be correlated, still might have been worth my while to mention one such reason had that been all there was.

comment by pleeppleep · 2013-01-04T05:16:01.921Z · LW(p) · GW(p)

You posted this here just for an excuse to ask the poll, didn't you?

Replies from: gwern

↑ comment by gwern · 2013-01-04T05:20:44.484Z · LW(p) · GW(p)

I'm sure I don't know what you mean.

comment by Nisan · 2013-01-04T21:29:50.933Z · LW(p) · GW(p)

If you have html with $-delimited latex in it, this tool will replace all the $-delimited latex with nice img tags at once.

Replies from: None, gwern, army1987

↑ comment by [deleted] · 2013-01-17T20:24:52.767Z · LW(p) · GW(p)

There are many reasons why this approach is not useful, particularly if one is interested in archival purposes (which gwern certainly is). Eventually MathML rendering in modern browsers will catch up.

Replies from: army1987

↑ comment by A1987dM (army1987) · 2013-01-18T16:25:38.450Z · LW(p) · GW(p)

Eventually MathML rendering in modern browsers will catch up.

I wouldn't hold my breath waiting for that to happen.

↑ comment by gwern · 2013-01-20T21:36:46.693Z · LW(p) · GW(p)

Trying this out, it seems to badly mangle inline Latex - the images seem to force each expression into its own block.

↑ comment by A1987dM (army1987) · 2013-01-04T21:48:21.258Z · LW(p) · GW(p)

Cool! Bookmarked.

comment by beoShaffer · 2013-01-04T20:52:20.096Z · LW(p) · GW(p)

Was " a likelihood factor equal to 0 " supposed to be " a likelihood factor equal to 1"?

Replies from: gwern

↑ comment by gwern · 2013-01-04T21:06:27.080Z · LW(p) · GW(p)

Yes, thanks. (And while I'm at it, why was I using 'likelihood factor' all over the place when it's 'likelihood ratio'...)

Replies from: Kindly

↑ comment by Kindly · 2013-01-04T23:42:11.012Z · LW(p) · GW(p)

You may have made the same mistake in the Plot section when adding up (rather than multiplying) a bunch of likelihood ratios.

Replies from: gwern

↑ comment by gwern · 2013-01-05T00:15:06.688Z · LW(p) · GW(p)

Yes, that was an error; I actually made a counterbalancing error there, where I flipped two arguments in the last two... My own ineptitude never ceases to impress me sometimes. (It's a good thing that was a hypothetical section that wasn't used in the full chained of posterior/prior calculations, because I'd've hated to have to redo them all. Again.)

comment by MinibearRex · 2013-01-07T04:14:37.039Z · LW(p) · GW(p)

We finish with high confidence in the script's authenticity

If you're already familiar this particular leaked 2009 live-action script, please write down your current best guess as to how likely it is to be authentic.

Unless someone already tried to come up with an explicit probability, this ordering will bias the results. Ask people for their guesses before you tell them what you have already written on the subject.

Replies from: gwern

↑ comment by gwern · 2013-01-07T04:26:35.513Z · LW(p) · GW(p)

Well, no one familiar with the script before reading this essay seems to have reported anything. That was a bit sloppy on my part, though.

comment by gwern · 2013-01-05T04:28:00.376Z · LW(p) · GW(p)

HN submission: http://news.ycombinator.com/item?id=5010846 >30 comments; hit #1 on the front page.

comment by gwern · 2013-01-04T00:48:31.432Z · LW(p) · GW(p)

Do you prefer polls on an article to be broken up over multiple comments to make some optional, or all in a single comment?

[pollid:387]

comment by gwern · 2013-01-04T00:48:21.414Z · LW(p) · GW(p)

Was the essay:

[pollid:384]

Did the chosen topic (anime & movies) make the essay more or less interesting for you?

[pollid:385]

Which topic was least well explained or employed:

[pollid:386]

comment by gwern · 2013-01-04T00:47:49.113Z · LW(p) · GW(p)

Having read or skimmed the essay's arguments & conclusion, what probability do you assign that this specific leaked script is genuine?

[pollid:381]

Having read or skimmed the essay, which of the 12 distinct arguments did you find weakest?

[pollid:382]

And strongest?

[pollid:383]

comment by gwern · 2013-01-04T00:47:26.059Z · LW(p) · GW(p)

What prior probability would you give that reports of a leaked full-length script for a Hollywood movie would be true and the script genuine? In deciles:

[pollid:380]

(Deciles, since I doubt anyone really has such a prior accurate down to single percentage points...)

comment by A1987dM (army1987) · 2013-01-04T08:13:20.115Z · LW(p) · GW(p)

I don't think this belongs in Main.

Replies from: Kaj_Sotala, dhoe, ygert

↑ comment by Kaj_Sotala · 2013-01-04T08:38:32.483Z · LW(p) · GW(p)

I disagree: Bayes is a big part of Less Wrong, and this is an excellent worked out example of how one could try to apply it in practice. If my pretty-poorly-written, qualitative-claims-only Applied Bayes' Theorem: Reading People got promoted, so should this.

Replies from: army1987, ygert

↑ comment by A1987dM (army1987) · 2013-01-04T10:53:15.438Z · LW(p) · GW(p)

Reading people is a task far more common than figuring out whether a leaked script for a movie is authentic, and many more people will be interested in the former.

↑ comment by ygert · 2013-01-04T10:14:24.635Z · LW(p) · GW(p)

Look, this is certainly a interesting post, and I enjoyed reading it. But that is not a sufficient criterion for a post being in Main. Compare this to the other recent posts in Main, and you will see a big stylistic difference. A worked out example of using Bayes is very interesting and insightful, but it is not anything "new". To use an analogy, if the other posts in Main are the content of a textbook, this is one of the worked-out sample exercises to show you how the exercises in the book are actually done. That is no less valuable, but it is simply not the same class, and a distinction is necessary.

Replies from: gwern

↑ comment by gwern · 2013-01-04T16:22:04.363Z · LW(p) · GW(p)

I've never seen this distinction before, and I don't think my essay is remotely like the usual fare of Discussion.

EDIT: especially if something like http://lesswrong.com/lw/g7y/morality_is_awesome/ gets 3x the net upvotes...

↑ comment by dhoe · 2013-01-04T10:04:17.064Z · LW(p) · GW(p)

I think it does. Bayes gets mentioned a lot around here, but there are not that many clear and accessible examples on how to go and analyze a real question; I recently read Proving History, despite no particular interest in the topic (Jesus' historicity), just to get a better idea of how people do it in practice.

↑ comment by ygert · 2013-01-04T08:24:19.937Z · LW(p) · GW(p)

Agreed.

Case Study: the Death Note Script and Bayes

Contents

44 comments