Experiment: Knox case debate with Rolf Nelson

komponisto

Experiment: Knox case debate with Rolf Nelson

post by komponisto · 2011-07-08T08:22:17.723Z · LW · GW · Legacy · 68 comments

68 comments

Recently, on the main section of the site, Raw_Power posted an article suggesting that we find "worthy opponents" to help us avoid mistakes.

As you may recall, Rolf Nelson disagrees with me about Amanda Knox -- rather sharply. Of course, the same can be said of lots of other people (if not so much here on Less Wrong). But Rolf isn't your average "guilter". Indeed, considering that he speaks fluent Bayesian, is one of the Singularity Institute's largest donors, and is also (as I understand it) signed up for cryonics, it's hard to imagine an "opponent" more "worthy". The Amanda Knox case may not be in the same category of importance as many other issues where Rolf and I probably agree; but my opinion on it is very confident, and it's the opposite of his. If we're both aspiring rationalists, at least one of us is doing something wrong.

As it turns out, Rolf is interested in having a debate with me on the subject, to see if one of us can help to change the other's mind. I'm setting this post up as an experiment, to see if LW can serve as a suitable venue for such an exercise. I hope it can: Less Wrong is almost unique in the extent to which the social norms governing discussion reflect and coincide with the requirements of personal epistemic rationality. (For example: "Do not believe you do others a favor if you accept their arguments; the favor is to you.") But I don't think we've yet tried an organized one-on-one debate -- so we'll see how it goes. If it proves too unwieldy or inappropriate for some other reason, we can always move to another venue.

Although the primary purpose of this post is a one-on-one debate between Rolf Nelson and myself, this is a LW Discussion post like any other, and it goes without saying that others are welcome and encouraged to comment. Just be aware that we, the main protagonists, will try to keep our discussion focused on each other's arguments. (Also, since our subject is an issue where there is already a strong LW consensus, one would prefer to avoid a sort of "gangup effect" where lots of people "pounce" on the person taking the contrarian position.)

With that, here we go...

68 comments

Comments sorted by top scores.

comment by Wei Dai (Wei_Dai) · 2011-07-09T02:50:08.118Z · LW(p) · GW(p)

Could each side of the debate perhaps give a back of the envelope Bayesian calculation for P(guilty), just to summarize their current positions to bystanders who are joining in without having read the whole discussion up to this point?

Replies from: rolf_nelson, komponisto

↑ comment by rolf_nelson · 2011-07-12T04:56:32.609Z · LW(p) · GW(p)

I don't currently have a formal calculation. If you're curious, my current P(guilty) is .95; I'm reluctant to go higher due to the structural uncertainty inherent in any "trial by media". One way to summarize my current position is that I believe most of the court's findings (as in the Massei report) seem basically correct, and that correcting for demographics doesn't look to be nearly enough to swing the needle from guilt to innocence.

↑ comment by komponisto · 2011-07-09T04:39:33.910Z · LW(p) · GW(p)

I actually inserted mine into my comment above:

Roughly speaking, I would say that together, all the evidence against Knox and Sollecito, of which the bra clasp and knife are for me the overwhelming majority (everything else being near-negligible), shifted P(guilt) upward by about an order of magnitude -- a factor of 10, from a prior somewhere between 0.0001 and 0.001 to a posterior somewhere between 0.001 and 0.01. The Conti-Vecchiotti report cuts that down a bit, though not necessarily hugely, since their findings were pretty much what I already expected them to be.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T05:16:48.117Z · LW(p) · GW(p)

Where does the prior "between 0.0001 and 0.001" come from?

ETA: For example, the fact that the suspect and victim are acquaintances is surely not negligible evidence, given that 52% of murders in the US are committed by a family member or acquaintance and presumably it's similar in Italy. Perhaps that's already taken into account in your prior, but without more information the reader has no way to tell. (If it's too much trouble to write a more self-contained summary at this point, let me know and I'll go read the old discussions when I get a chance.)

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T06:00:57.336Z · LW(p) · GW(p)

For example, the fact that the suspect and victim are acquaintances is surely not negligible evidence

Screened off by the evidence against Rudy Guede.

Replies from: Wei_Dai, Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T07:38:34.403Z · LW(p) · GW(p)

Separately, I'm having trouble understanding how "suspect and victim are acquaintances" is screened off by "Guede killed Kercher". According to the wiki:

If A is a hypothesis and B and C are two pieces of evidence relating to A, then B is said to screen off C from A if P(A|B&C) = P(A|B). That is, if knowing C provides no additional information about A once B is known.

In the wiki, the example given is A="Knox killed Kercher", B="Guede killed Kercher", and C="Kercher was killed", so it's trivially true that P(A|B&C) = P(A|B) since B logically implies C. But if we replace C with D="suspect and victim are acquaintances" it's no longer trivially true that B screens off D. Actually I think it's false.

Consider the question, does P(A|B&~D) equal P(A|B&D)? Surely the probability that Knox is one of multiple attackers who killed Kercher, given that Guede killed Kercher (and no other evidence), would be smaller if Knox were just a random person with no relationship to Kercher? But B screens off D means that P(A|B&D) = P(A|B), which implies P(A|B) = P(A|B&~D).

What if we replace B with E="Guede killed Kercher and there is no strong evidence of another attacker"? Same thing, we still have P(A|E&D) > P(A|E&~D).

(Note that this is not an argument that Knox killed Kercher, but just that you seem to be using the concept of "screened off" incorrectly, and also wrongly claiming "everything else [besides bra clasp and knife] being near-negligible".)

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T16:08:35.054Z · LW(p) · GW(p)

You seem to be calling it "incorrect" if I say that "X = Y" when X is only approximately equal to Y. Obviously you're right in a literal sense, but it's an inappropriate criticism in this context.

"Suspect and victim are acquaintances" here is essentially the same event as "Knox's roommate was killed" -- something which significantly raises the prior probability that Knox committed murder. However, once we learn the details of the case, we find that the killing is entirely explained by the actions of Guede. ("Entirely" here is to be understood in an approximative sense.)

While it is perhaps true, using your labels above, that P(A|E&D) > P(A|E&~D), the difference between these quantities is surely very small compared to the difference between P(A|D) and P(A|E&D).

Replies from: None, Wei_Dai

↑ comment by [deleted] · 2011-07-09T17:12:22.191Z · LW(p) · GW(p)

You seem to be calling it "incorrect" if I say that "X = Y" when X is only approximately equal to Y. Obviously you're right in a literal sense, but it's an inappropriate criticism in this context.

I think all of the disputes that show up between you and raw power are going to be of the form "K thinks that X and Y are close together, while R thinks that X and Y are far apart" or vice versa.

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T17:51:06.620Z · LW(p) · GW(p)

I think all of the disputes that show up between you and raw power

You meant Rolf Nelson, I assume.

Replies from: None

↑ comment by [deleted] · 2011-07-09T21:29:08.051Z · LW(p) · GW(p)

natch, sorry

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T17:05:16.606Z · LW(p) · GW(p)

It seems like using your logic, we can similarly say that the evidence against Guede screens off the evidence of the bra clasp and knife. Is that correct? If not, what is the difference between "suspect and victim are acquaintances" and "there is (perhaps not particularly reliable) DNA evidence linking suspect to this murder" that makes Guede screen off one of them but not the other?

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T18:01:20.296Z · LW(p) · GW(p)

It seems like using your logic, we can similarly say that the evidence against Guede screens off the evidence of the bra clasp and knife. Is that correct?

Not really. Unlike the fact of the murder of Knox's roommate, the bra clasp and knife are essentially independent of the evidence against Guede.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T18:59:31.198Z · LW(p) · GW(p)

What you said above was:

While it is perhaps true, using your labels above, that P(A|E&D) > P(A|E&~D), the difference between these quantities is surely very small compared to the difference between P(A|D) and P(A|E&D).

Is this the criteria you would use for "screened off" in general? If so, suppose we replace D with F="some DNA evidence exists linking Knox to murder". (E still being "evidence against Guede".) Don't we still have P(A|E&F) - P(A|E&~F) << P(A|F) - P(A|E&F)? To illustrate, P(A|F) = 0.1, P(A|E&F) = 0.01, P(A|E&~F) < 0.001. (These are semi-plausible numbers for illustrating this point, not my actual probabilities.)

In this later comment you say

Unlike the fact of the murder of Knox's roommate, the bra clasp and knife are more or less independent of the evidence against Guede.

This seems to make more sense, but I'm still having trouble translating it into a technical definition of "screened off". Can you suggest one?

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T23:19:24.610Z · LW(p) · GW(p)

It's easy to break an approximative definition by applying it to a situation where distinctions between orders of error are important. So any such definition, strictly speaking, has to be considered a sort of analogy or metaphor that may not always be applicable to every context.

Strictly speaking, as you know, "E screens F off from A" means P(A|E&F) = P(A|E&~F). So it seems reasonable to say "E approximately screens F off from A" if |P(A|E&F) - P(A|E&~F)| is small. However, what "small" means is context-dependent. When, above, I declined to apply this terminology to E and F, it was because I was mentally comparing |P(A|E&F) - P(A|E&~F)| to |P(A|E) - P(A|E&F)|, rather than to |P(A|F) - P(A|E&F)|. The latter, of course, is much larger. So I don't suppose I can really stop you from applying the approximative definition of "screening off" in this situation if what you're interested in is P(A|F) vs P(A|E&F) (a large downward jump) rather than P(A|E) vs P(A|E&F) (a small upward jump).

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-10T14:00:26.248Z · LW(p) · GW(p)

What do you say we table this discussion about "approximately screens off"? (I'm thinking of writing a discussion post asking LW what a good, i.e., generally useful, definition of it would be. Maybe it doesn't have to be context-dependent, or could be less context-dependent, if we talk about P(A|E&F) / P(A|E&~F) instead of P(A|E&F) - P(A|E&~F).)

For now, perhaps you can just tell me what mathematical statement you actually had in mind, when you said "Screened off by the evidence against Rudy Guede"?

Replies from: komponisto

↑ comment by komponisto · 2011-07-10T17:41:18.338Z · LW(p) · GW(p)

For now, perhaps you can just tell me what mathematical statement you actually had in mind, when you said "Screened off by the evidence against Rudy Guede"?

P(A|E&D) is much closer to P(A) than to P(A|D).

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T06:04:04.846Z · LW(p) · GW(p)

The point is that the calculation you gave is missing too many steps to be useful for someone just coming to the discussion.

To take another example, does your prior take into account the gender of the suspect? (Females commit far fewer murders than males.) Or is that also screened off by some other evidence?

Replies from: komponisto, komponisto

↑ comment by komponisto · 2011-07-09T06:41:06.245Z · LW(p) · GW(p)

I suspect that a lot of the details you're wondering about will quickly emerge in the discussion with Rolf, since (1) our opinions are widely separated, which seems to imply very different-looking calculations, with substantial inferential gaps to be bridged, and (2) that discussion is just getting started.

That said, I'm not sure which missing steps you consider the most important. A very short case summary from my point of view would be something like "student killed by burglar; housemate and boyfriend blamed before burglar discovered; after catching burglar, police filter evidence to fit three-person theory instead of dropping initial idea." Is that helpful at all?

↑ comment by komponisto · 2011-07-09T06:54:50.658Z · LW(p) · GW(p)

To take another example, does your prior take into account the gender of the suspect? (Females commit far fewer murders than males.) Or is that also screened off by some other evidence?

A reference class that gives an upper bound for my prior would be "intelligent 20-year-old female college student with no criminal history commits murder".

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T08:04:22.352Z · LW(p) · GW(p)

Wikipedia says

At 0.013 per 1,000 people, Italy has the 47th highest murder rate in the world.

Which gives no more than 0.000013 probability that Knox is a murderer if all we know is that she lives in Italy. I guess "intelligent 20-year-old female college student with no criminal history" is less likely to commit murder than average, so I'm still confused how you got "between 0.0001 and 0.001".

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T16:22:32.458Z · LW(p) · GW(p)

Well, the answer I suppose is that I wasn't taking the country into account.

However, if you agree that "between 0.0001 and 0.001" is an upper bound, that surely suffices! The important kind of confusion would be where you think my prior is too low, rather than too high.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T17:49:04.394Z · LW(p) · GW(p)

The important kind of confusion would be where you think my prior is too low, rather than too high.

I was trying to understand what evidence has been taken into account in your prior (i.e., is there some other information that might be considered Bayesian evidence against Knox, but which is already in your prior), so that I can understand what other evidence you consider "negligible". I think at this point that confusion has been resolved.

I still wonder why the two sides don't each post a more detailed Bayesian calculation. Let's say A="Knox killed Kercher", B="Kercher has been killed and Knoxed lived in Italy and is an intelligent 20-year-old female college student with no criminal history", C="evidence against Guede", D="Knox and Kercher were roommates", E="evidence of a staged burglary", F="bra and clasp", G="all other information about the case". What are

P(A|B)
P(A|B&C)
P(A|B&C&D)
P(A|B&C&D&E)
P(A|B&C&D&E&F)
P(A|B&C&D&E&F&G)

(Or some other set of evidence and order of evaluation that might be more appropriate.) Wouldn't that help to quickly pinpoint where your disagreements are?

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T18:42:12.776Z · LW(p) · GW(p)

Let's say A="Knox killed Kercher", B="Kercher has been killed and Knoxed lived in Italy and is an intelligent 20-year-old female college student with no criminal history", C="evidence against Guede", D="Knox and Kercher were roommates", E="evidence of a staged burglary", F="bra and clasp", G="all other information about the case".

I'll redefine slightly:

A := "Knox killed Kercher, given background info about both, but not the fact of their acquaintance". P(A) = tiny.
B := "Kercher killed". P(A|B) = approximately P(A). (We are not yet given that they were roommates.)
C := "evidence against Guede". P(A|B&C) = approximately P(A). (No significant connection between Guede and Knox.)
D := "Knox and Kercher were roommates". P(A|B&C&D) = slightly higher than P(A), but still well below the threshold of consideration.
E := "Facts cited as evidence of staged burglary". P(A|B&C&D&E) = approximately P(A|B&C&D). (Likelihood ratios involved are close to unity; certainly small relative to P(~A)/P(A).)
F := "bra clasp and knife". P(A|B&C&D&E&F) = possibly as much as an order of magnitude higher than P(A|B&C&D). (Explaining results is a minor puzzle.)
G := "all other information". P(A|B&C&D&E&F&G) = approximately P(A|B&C&D&E&F). (Other evidence weak; slightly inculpatory facts canceled out by slightly exculpatory facts.)

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2011-07-09T18:55:45.733Z · LW(p) · GW(p)

Thanks, that's very helpful. Perhaps you could copy this to the main debate branch, so Rolf would see it and possibly respond in a similar fashion? Also, to seek a bit more clarification, what is your estimate of P(A|B&C&D) / P(A|B&C)?

comment by [deleted] · 2011-07-09T17:29:34.633Z · LW(p) · GW(p)

I'm excited about this but a little skeptical. Bayes theorem is a single equation between four quantities. Every time you discuss a new bit of evidence, you have two or three degrees of freedom to argue about. Maybe K consistently judges P(E|guilty)/P(E) to be smaller than R does -- then what?

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2011-07-12T04:49:26.707Z · LW(p) · GW(p)

If 10 bits difference of opinion is the result of 10 seemingly individual bits being added up, then one or probably both are making bottom line arguments.

But I don't expect we'll find that. I expect one large point of disagreement or maybe a lot of highly analogous disagreements, like whether A-Z are independent or correlated; or whether X screens off A-Z.

comment by jsalvatier · 2011-07-08T14:24:25.557Z · LW(p) · GW(p)

If the topic is too complex to do a debate video, at some point I would recommend doing a summary of the debate: how did your opinions progress, what mistakes/misunderstandings/misconceptions happened when etc.

comment by komponisto · 2011-07-08T08:23:06.530Z · LW(p) · GW(p)

So, I'll start things by picking up our conversation from the other thread. Rolf says he believes that the knife and bra clasp are among the strongest pieces of evidence against Knox and Sollecito; I responded by pointing to the recent independent review that strongly critiqued that evidence.

If [the independent experts'] main point is that the evidence doesn't meet the standard of scientific rigor, then I might not disagree with them on anything factual. Very little evidence, either way, does meet the standard of scientific rigor. Fingerprints never reach the standard of scientific rigor. DNA testing, as practiced, probably rarely if ever meets the standard of scientific rigor. Eyewitness testimony obviously can never come close to meeting the standard of scientific rigor. Heck, most science doesn't meet the standard of scientific rigor. We still need to evaluate evidence on its full merits.

Absolutely true -- but of course the reason scientific rigor exists is because errors happen when it isn't applied, and so we need to take this into account and scrupulously avoid overconfidence when evaluating such evidence. And, even more to the point here, Conti and Vecchiotti (the independent experts) don't merely say that it doesn't meet the standard of scientific rigor; they actually say specifically that it is "not reliable". This is important, because they were asked by the court to assess "the degree of reliability"; hence they could presumably have answered anywhere along a continuum -- and the answer they came back with was "pretty much zero".

So, to confirm that our initial analysis here diverges, around how much are you currently shifting based on the test results for the knife and for the bra clasp? For you, did either one shift P(guilt) by a factor of 100? 10? Not at all?

Roughly speaking, I would say that together, all the evidence against Knox and Sollecito, of which the bra clasp and knife are for me the overwhelming majority (everything else being near-negligible), shifted P(guilt) upward by about an order of magnitude -- a factor of 10, from a prior somewhere between 0.0001 and 0.001 to a posterior somewhere between 0.001 and 0.01. The Conti-Vecchiotti report cuts that down a bit, though not necessarily hugely, since their findings were pretty much what I already expected them to be.

You can guess my hypothesis for why the DNA tests came out the way they did. Do you have a specific alternative hypothesis or hypotheses you want me to consider? Is your main claim here that you believe the knife was accidentally contaminated in the laboratory, and the bra strap was accidentally contaminated in Kercher's room? Or is there a different alternative hypothesis I should consider first?

Yes, I think those would be my best explanations for those results, though there is significant uncertainty even about the attribution of the DNA in the first place. It's also possible that the knife was accidentally contaminated by investigators at Sollecito's apartment, that the clasp was contaminated in the lab, or that there was deliberate malfeasance applied to either of the items at some point; but these are less likely secondary possibilities, in my view.

I assume we're not going to quash any evidence, since we're a court of Bayes and not a court of law? That is, I'm proposing that whenever we want to exclude or diminish evidence, we should have a Bayesian reason for why the evidence doesn't really alter P(guilt). The proposal is partly because it's the correct Bayesian thing to do, and partly because trying to divine and mimic Italian criminal procedure and admissability rules would add (IMHO unnecessary) additional complexity.

Agree entirely. I would only add that a court of Bayes should always be more accurate than a court of law and never less.

Replies from: rolf_nelson, rolf_nelson

↑ comment by rolf_nelson · 2011-07-09T20:23:04.406Z · LW(p) · GW(p)

Thanks k, today I'll give my thoughts on the knife. I'm sure there are some mistakes in my analysis below, but let's see if we can start to pinpoint areas of disagreement. Let "ddk.lc" be the specific hypothesis that the double-dna knife was accidentally contaminated in the laboratory, and "a.g" be the hypothesis that Amanda is guilty.

I want to estimate the base rate of lab cross-contamination in the late 2000's. Two observations:

Looks like Washington State only admitted to one case of laboratory cross-contamination in homicide cases from around 2001-2003, based on SPI. Maybe there were 2 to 5 that weren't noticed or otherwise were unreported. Washington State has about 200 homicides/year, I'd guess about 1/3 go to state labs?
Cases of mistaken "cold matches" that are investigated by the police but turn out to be cross-contamination seem to be extremely rare.

Contamination rates have probably declined slightly since 2001-2004, (widespread DNA forensics is relatively new), so I'm guessing a base-rate of about one accidental cross-contamination per 50 homicides.

(Can that base rate be applied to this case? I could lower it a little based on the first independent report [EDIT: Sorry, I mean the trial judges' sentencing report] affirming the results, on it being a high-profile case, on the defense being allowed to participate in the testing but declining to fully participate, on no "smoking gun" piece of sloppiness found, and on the defense not stressing any history of past contamination. I could raise it a little based on the second independent report criticizing the lab for not following "international protocols" (though I wouldn't particularly expect them to), and on the low-count DNA. Not having much data here, I'll stick with the base rate as my "wild-ass guess".)

Here's some more WAGs. If there's a single lab contamination, the odds that it contaminates a likely murder weapon at the cottage is about .002. The odds that the DNA spread is Meredith's, rather than one of the many people associated with this case or with other cases the lab is processing in parallel, is about .05.

So I assess P(lc) as .02, and P(ddk.lc | lc) as .0001, which gives P(ddk.lc) as .00002, assuming complete innocence.

In contrast, I estimate P(ddk | a.g) is about .05. So if we exclude the large "systemic uncertainty" of my analysis, the DNA evidence on the double-dna knife alone would make me shift by a factor of 2500 to 1 in favor of a.g rather than ddk.lc.

Let me touch on the circumstantial evidence around the knife, with the caveat that there's even more systemic uncertainty than in my DNA analysis.

The knife was on top of the other knives, and matched one of the murder weapons: meh, shift by a factor of 2 to 1 in favor of a.g over ddk.lc.
The same knife was bleached, and the other knives weren't: Shift by a factor of 5 to 1.
Raffaele, at one point, claimed that Meredith visited his cottage and pricked herself on the knife: Shift by a factor of 100.
Amanda's reaction to the knife: Shift by 10.
Where Amanda's DNA was found on the handle suggests someone stabbing rather than cooking with it: meh, shift by 2.
Amanda wrote in her diary speculating whether Raffaele may have framed her by pressing the knife in her hand while she slept: Shift by 10, this is not something you'd write if you knew the knife isn't a murder weapon.

So for the non-DNA evidence around the knife I give a slightly larger shift (200000 to 1), but paradoxically I assign it less importance because there's more systemic uncertainty and more guess-work on my part, compared with the DNA evidence.

Replies from: komponisto, Wei_Dai

↑ comment by komponisto · 2011-07-10T08:50:00.563Z · LW(p) · GW(p)

Okay, so, obviously a lot of disagreement here.

Regarding the base rate of contamination, the first thing to remember is that in the case of the knife we're dealing with a Low Copy Number (LCN) sample, for which the risk of contamination is greatly increased unless extremely stringent protocols are followed -- protocols which were not followed here. So we're probably talking about an order of magnitude increase in the base rate: more like 1/5 instead of 1/50.

(Incidentally, there was never any "first independent report" prior to this one: the defense's request for an independent review during the first trial was rejected. What you are most likely referring to is the fact that Stefanoni's own bosses at the Polizia Scientifica signed off on her work, well before the trial. Based on what's in the Conti-Vecchiotti report I consider it likely that they didn't scrutinize it very carefully.)

Secondly, thanks to Conti and Vecchiotti, we have considerable Inside View information about the reliability of this sample in particular. And there's every reason to believe this result is completely bogus. Indeed, Conti and Vecchiotti come pretty close to accusing Stefanoni of outright scientific fraud. Here are some of their observations:

The sample in question (Trace B) tested negative for blood, as did every other sample taken from the blade. (Samples from the handle were not tested for blood.) No attempt was made to scientifically determine the actual nature of the alleged biological material.
When "quantification" (test to determine whether there was enough DNA to be analyzed) was performed, Traces B and C both yielded a result of "too low". Stefanoni reported Trace B as a positive result, and Trace C as a negative result, without any justification. There is no documentation in the lab data to support her statement in court that the Trace B sample was in the range of several hundred picograms. Stefanoni also claimed to have executed steps in the quantification procedure that are not documented.
The "amplification" (chemical copying of the sample in order to produce a large enough amount for analysis) was performed only once, despite the fact (admitted by Stefanoni) that it should be repeated in order to be considered reliable.
Stefanoni did not perform negative controls, which could have indicated the presence of contamination.
The sample was analyzed in the same laboratory at the same time as numerous samples containing Meredith Kercher's DNA.

In short, everything smacks of a deliberate effort to obtain a certain result -- which was obtained easily enough given the lax procedural standards and absence of safeguards. I thus have very little hesitation in dismissing the knife DNA.

Now, regarding the "circumstantial evidence surrounding the knife" that you listed, I see two major general issues. One is that you are misinformed about a number of details, apparently as a result of uncritically accepting claims from pro-guilt sources (cf. the "first independent report" misconception above). You should be extremely skeptical of pro-guilt advocacy sites such as PMF or True Justice -- they are caught up in a death spiral of hate and constitute an absolute breeding ground of anti-epistemology the likes of which are usually only seen in political or religious conflicts. No, that doesn't mean everything they say is false, and yes, the pro-innocence community is vulnerable to the same disease, but in general I would advise against assuming any claim of theirs is true unless it is explicitly conceded by the other side. For example, this is wrong:

The knife was on top of the other knives, and matched one of the murder weapons

In actuality, the hypothesis of more than one murder weapon was invented specifically to accommodate the fact that this particular knife -- which investigators had already decided was the murder weapon based on "police intuition" followed by the bogus DNA result discussed above -- turned out not to match certain wounds on the victim, nor an imprint found in a bedsheet; a clear instance of motivated cognition and -- given that a smaller knife would have been compatible with all of the wounds as well as the imprint -- a patent violation of Occam's Razor. Another claim without any foundation is:

The same knife was bleached, and the other knives weren't

The closest thing to evidence for this is that a police officer claimed to have smelled bleach upon opening the drawer that the knife was in. (That doesn't distinguish the knife in question from the others, needless to say.)

The other major issue is that in my opinion you wildly overestimate the evidentiary strength of the observations that are true. This applies to all of items 3 through 6. For example, I see no reason why Raffaele's claim about Meredith visiting his house should even have a likelihood ratio as high as 2, let alone 100. The innocent explanation is that he had been told about the knife DNA result, and believed it. Given that he didn't have the Conti-Vecchiotti report available, why do you consider the likelihood of this to be only 1/100 that of a corresponding guilty scenario? Similarly, why is Amanda's reaction to the knife drawer even evidence of guilt at all, let alone 10 decibels' worth?

Item 5 seems to me to be another example of overconfidence:

Where Amanda's DNA was found on the handle suggests someone stabbing rather than cooking with it: meh, shift by 2.

Even if this is so (which I don't have any particular reason to accept), presumably all one has to do to to get one's DNA in this position is shift one's grip on the knife in some way -- as opposed to actually stabbing someone! Isn't the denominator of this ratio quite substantial?

Amanda wrote in her diary speculating whether Raffaele may have framed her by pressing the knife in her hand while she slept: Shift by 10, this is not something you'd write if you knew the knife isn't a murder weapon.

But in this situation "knowing the knife isn't a murder weapon" doesn't come close to being necessary for innocence. She knew that Raffaele was a suspect along with her, knew that Meredith had been killed with a knife, and (depending on the chronology) may have known that Raffaele's knife was considered the murder weapon by the police. Under those assumptions, it's an entirely natural speculation, it seems to me.

Also, here is Amanda's quote in context:

So unless Raffaele decided to get up after I fell asleep, grabbed said knife, went over to my house, used it to kill Meredith, came home, cleaned the blood off, rubbed my fingerprints all over it, put it away, then tucked himself back into bed, and then pretended really well the next couple of days, well, I just highly doubt all of that.

Replies from: rolf_nelson, rolf_nelson

↑ comment by rolf_nelson · 2011-07-12T04:38:14.353Z · LW(p) · GW(p)

a smaller knife would have been compatible with all of the wounds as well as the imprint

(Massei, 170) seems to disagree (although I could be misreading it), but I welcome any counter-arguments. If you want to claim extreme coroner bias or extreme trial-court bias (or both), then eventually you'll want to separate out the hypothesis of bias and make your case for it.

The closest thing to evidence for (the knife being bleached) is that a police officer claimed to have smelled bleach upon opening the drawer that the knife was in. (That doesn't distinguish the knife in question from the others, needless to say.)

I disagree: "Let me state beforehand that it was extremely clean" (Massei, 99)

The innocent explanation is that (Raffaele) had been told about the knife DNA result, and believed it.

I give the odds that he would tell such a lie if guilty as .1, and if innocent as .001; what odds do you give? Why would Raffaele believe the DNA result? He's more likely to believe it's not the murder weapon if he's innocent, and also less likely to lie if innocent.

Similarly, why is Amanda's reaction to the knife drawer even evidence of guilt at all, let alone 10 decibels' worth?

I think if she's innocent, reacting to the knife like that is about .05 likely, and if she's guilty, about .005 likely. What odds do you give? You wouldn't be more likely to react strongly to an otherwise-irrelevant event if you knew it was going to practically end your life?

Where Amanda's DNA was found on the handle suggests someone stabbing rather than cooking with it: meh, shift by 2.

Even if this is so (which I don't have any particular reason to accept), presumably all one has to do to to get one's DNA in this position is shift one's grip on the knife in some way -- as opposed to actually stabbing someone! Isn't the denominator of this ratio quite substantial?

My reasoning: suppose a chef grabs a knife in cooking position ten times and nose-picking position once. Suppose further DNA is only found in one spot. It's probably ten times more likely that the DNA is found in cooking position, and that his unhygienic habit will remain undetected (aside from the giant nasal scars).

But in this situation "knowing the knife isn't a murder weapon" doesn't come close to being necessary for innocence.

I disagree: not necessary, but probable, by a factor of >10 IMHO.

"So unless Raffaele decided to get up after I fell asleep, grabbed said knife, went over to my house, used it to kill Meredith, came home, cleaned the blood off, rubbed my fingerprints all over it, put it away, then tucked himself back into bed, and then pretended really well the next couple of days, well, I just highly doubt all of that."

I'm more concerned by: “This could have happened: Raffaele [killed Meredith] and then, having come back home, pressed my fingerprints — I was asleep — onto the knife", from London Times

Replies from: komponisto

↑ comment by komponisto · 2011-07-12T20:29:47.356Z · LW(p) · GW(p)

a smaller knife would have been compatible with all of the wounds as well as the imprint

(Massei, 170) seems to disagree (although I could be misreading it), but I welcome any counter-arguments.

The argument there is pretty weak: Massei and Cristiani simply find it hard to believe that a single knife could have caused such different-looking wounds. Not much more than an argument from personal incredulity; they don't support it with any detailed arguments or expert opinions indicating incompatibility of a smaller knife. In fact, they proceed to argue at length that Item 36 itself is compatible with all the wounds, by disputing the defense arguments that it is incompatible with some of them. (For counterarguments on this, see pp. 36-51 of Knox's appeal document, and pp. 20-23 of Sollecito's. Since you cited the original page number of Massei, I'm guessing you can read Italian. If not, I can provide translations. Briefly, the important things to note, besides the object-level defense arguments themselves, are that prosecution expert Bacci "disavowed" his earlier judgement of compatibility, and that civil-party expert Liviero wasn't even aware of the defense experts' counterarguments and hadn't considered them.)

The closest thing to evidence for (the knife being bleached) is that a police officer claimed to have smelled bleach upon opening the drawer that the knife was in. (That doesn't distinguish the knife in question from the others, needless to say.)

I disagree: "Let me state beforehand that it was extremely clean" (Massei, 99)

Such a statement strikes me as extremely weak evidence, likely tainted by hindsight among other things. Bleaching is the kind of thing you establish with chemical tests, not someone's judgement of "looking clean".

The innocent explanation is that (Raffaele) had been told about the knife DNA result, and believed it.

I give the odds that he would tell such a lie if guilty as .1, and if innocent as .001; what odds do you give? Why would Raffaele believe the DNA result? He's more likely to believe it's not the murder weapon if he's innocent, and also less likely to lie if innocent.

Firstly, since we don't have the transcript of the interrogation, we don't know that it was a "lie". It could, for example, have been an exchange like this:

-- We found Meredith's DNA on a kitchen knife in your house!

-- Well, I guess she must have come to my house and pricked herself, then!

But in any case my probability that he would make up some story similar to this (or otherwise say something that would likely be reported as him making up such a story), given (1) innocence, (2) that he had been told about the DNA result, and (3) had demonstrated confusion about his memories elsewhere (e.g. suggesting at one point that Knox had gone out on the evening of the crime), is in the region of 50%. People tend to trust authorities, and defying the data is an advanced rationalist skill that cannot be counted on, especially in the face of shouting policemen. From what I understand, it is relatively easy to get people to make things up in stressful interrogation situations.

I think if she's innocent, reacting to the knife like that is about .05 likely, and if she's guilty, about .005 likely. What odds do you give? You wouldn't be more likely to react strongly to an otherwise-irrelevant event if you knew it was going to practically end your life?

(I assume you meant to reverse those numbers.) Let's make sure we agree on what event we're referring to. My assumption was that you were talking about Amanda's distressed reaction when an officer opened up a drawer containing knives in her cottage (i.e. not Raffaele's apartment -- nothing to do with Item 36). I don't understand why this event would have any life-ending import. I'm not aware of any similar story regarding Item 36, the closest thing being an expression of concern about it to her parents at one point. Very plausible innocent explanations for both include, in the first case, distress at Meredith's demise primed by the sight of knives, and in the second, the knowledge that she had handled the knife while cooking.

(I predict that if you took 100 young females and put them into a situation where they knew a friend or roommate of theirs had been killed with a knife by an intruder, and then showed them a drawer full of knives, the number of those who would react with visible emotion would be well into the double digits.)

My reasoning: suppose a chef grabs a knife in cooking position ten times and nose-picking position once. Suppose further DNA is only found in one spot. It's probably ten times more likely that the DNA is found in cooking position, and that his unhygienic habit will remain undetected (aside from the giant nasal scars).

This seems to be an argument against the proposition that Amanda used the knife for cooking more than for stabbing. But that seems unlikely to be sound, since even if it was used for stabbing at one point, it had presumably been used for cooking a fair amount before. It was a kitchen knife after all, and Amanda had spent a lot of time at Raffaele's apartment.

In any case, this kind of inference seems like it is subject to tremendous uncertainty. How much difference is there between cooking positions and stabbing positions anyway? Surely there are any number of ways to do both. Why 10:1 or 2:1, instead of 3:2 or 10:9? As a matter of fact, I'm curious in general: what sort of evidence of guilt would have a likelihood ratio below 2 for you?

But in this situation "knowing the knife isn't a murder weapon" doesn't come close to being necessary for innocence.

I disagree: not necessary, but probable, by a factor of >10 IMHO.

Can you explain this more? Why should an innocent person have knowledge of what the murder weapon was or wasn't? If she were innocent, why should she, under the circumstances she was in at the moment, with the information she had, doubt Raffaele's guilt any more strongly than she indicated in the diary excerpt (already fairly strong, IMO)?

I'm more concerned by: “This could have happened: Raffaele [killed Meredith] and then, having come back home, pressed my fingerprints — I was asleep — onto the knife", from London Times

That is almost certainly just a mistranslation of the passage I quoted above (in its original form). See here and here (#8). (Admittedly, I haven't found a "neutral" source explicitly recognizing this mistake, but it is extremely plausible a priori, not surprising at all that the media wouldn't bother to correct it, and moreover no one can produce photographic evidence of the second version, unlike the first. Even the guilters use the version I quoted.)

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-07-14T07:11:43.216Z · LW(p) · GW(p)

While it remains interesting, we don't seem to be getting any traction toward significantly changing each other's mind through this one-on-one debate, should we just cut our losses and end it? I'm open to other suggestions on how to proceed. You haven't yet presented arguments toward innocence, but if we just follow the same pattern we have been so far we're probably not going to get anywhere with those either. Probably the only thing we can agree on is that at least one of us is biased on this case. ;-)

If you want to proceed, I'll ask you whether you agree with my "one contamination in fifty homicides" estimate, and what your probability is that the independent report is basically correct, and whether you would agree with me that this puts an upper bound on the contamination probability. I'll also continue to try to nail down, line-by-line, where we disagree on the facts of the case, and say that I feel you're giving too little weight to the statements of lab technicians, police officers, and trial judges, especially when those people were there and we weren't.

If you don't want to proceed, then I thank you for humoring me as far as you did with this experiment, and I will express my condolences to all innocent people in jail, whether they include Amanda or not, and hope that justice prevails in this case.

Replies from: komponisto

↑ comment by komponisto · 2011-07-17T07:00:01.198Z · LW(p) · GW(p)

While it remains interesting, we don't seem to be getting any traction toward significantly changing each other's mind through this one-on-one debate, should we just cut our losses and end it? I'm open to other suggestions on how to proceed. You haven't yet presented arguments toward innocence, but if we just follow the same pattern we have been so far we're probably not going to get anywhere with those either.

I wouldn't have expected us to have made much progress on mind-changing by this point, since we've only had a couple of iterations of back-and-forth; and moreover we've been starting to succumb to the danger of going "shallow and wide" (i.e. listing disagreements on many different aspects of the case) rather than "narrow and deep" (i.e. trying to iron out a particular important point of disagreement). For instance, I think the contrasting opinions on the wounds, or inferences about how the knife was held, are areas of lesser importance, which we probably don't even need to get into at this point, since there are much more important issues to focus on (such as how likely contamination is, or the incompatibility of the stomach evidence with the prosecution's hypothesized time of death).

So making progress would probably require us to pick a small number of narrowly-defined issues to hash out, one at a time.

Here's an important question to assess whether we've said anything important yet: has anything I've said surprised you?

You haven't yet presented arguments toward innocence, but if we just follow the same pattern we have been so far we're probably not going to get anywhere with those either.

That may be right, so if we do continue further it will be best to try to avoid that pattern. In any case, I think it's probably worthwhile for me to at least state for the record what I believe to be the most important arguments for innocence, so I'll go ahead and do that. Here are what I would consider to be the three strongest arguments for innocence, from a starting prior of "Knox and Sollecito have been convicted":

(1) The hypothesized crime would be so singularly unusual as to make a miscarriage of justice a priori as likely as guilt, even without knowledge of defense arguments. In other words, the base rate of crimes in this reference class is less than or equal to the base rate of wrongful convictions.

(2) Guede is unquestionably guilty. Hence, Meredith Kercher's death itself does not require explanation. (There is furthermore little connection between Guede and either Knox or Sollecito.)

(3) The stomach evidence is incompatible with a time of death much beyond 9:00 - 9:30 pm, given the testimony from Meredith's friends about when her last meal took place. Yet there was activity on Sollecito's computer at 9:10 pm according to Massei-Cristiani, and 9:26 pm according to Sollecito's appeal.

(Note that although (1) is in principle screened off by detailed knowledge of prosecution and defense arguments, it is effectively preserved in the form of arguments about lack of motive, lack of criminal history, demographics, etc.)

Now here's what I predict about where this would lead us. Pursuing (1) would lead us to a discussion very similar to the one we've been having about the base rate of contamination. Before we started the discussion I would have been most curious about your response to (2), but given your numbers on the knife evidence it seems pretty clear that you think the evidence against Knox and Sollecito is strong enough to overcome my prior against Knox and Sollecito's guilt given Guede's (whatever yours is). That leaves (3). I'm not exactly sure what your reaction will be, because I don't know how aware you are of this line of argument already. I suspect I could probably get you to agree that it would be extremely unusual for no food to have passed into the duodenum 5 hours after a meal (as required by the prosecution theory), even conditioned on the already unusual fact of none having passed after 150 minutes. However, I can't predict how far you will lower your probability of guilt as a result.

If you want to proceed, I'll ask you whether you agree with my "one contamination in fifty homicides" estimate, and what your probability is that the independent report is basically correct, and whether you would agree with me that this puts an upper bound on the contamination probability.

I don't think I would have a problem positing that the expert report constitutes 50:1 evidence in favor of contamination, possibly much more.

I'll also...say that I feel you're giving too little weight to the statements of lab technicians, police officers, and trial judges, especially when those people were there and we weren't.

This is interesting, because it seems to me that you're giving too little weight to Conti and Vecchiotti. I would be curious to know specifically whose opinion I should be weighting more than I am, and what information you think they are likely to have that I don't. For example, I don't see why I should give particular weight to Massei and Cristiani's opinion, given that they made their reasoning plain in their 427-page report, and I can read it and decide what I think. And why should I trust Stefanoni, in light of what Conti and Vecchiotti say? Should I take seriously Edgardo Giobbi's assertion that his psychological profiling techniques are effective, when he has admitted to attaching significance to the fact that Knox and Sollecito were seen eating pizza? And Mignini, of course, has the Monster of Florence issue.

The only person who was actually "there" is Rudy Guede; is there any particular reason I should believe him?

Anyway, basically, my feeling on whether to proceed is the following: I don't consider it particularly costly to proceed, especially if done in a relaxed, low-key way. There's no particular urgency, beyond that automatically entailed by the quest for epistemic accuracy. It might be more effective to "drag it out" over a more extended period, and exchange fewer messages per unit time, than to try to accomplish mind-changing in a short intense session (which has more of a danger of feeling like a comptetition anyway).

If you don't want to proceed, then I thank you for humoring me as far as you did with this experiment, and I will express my condolences to all innocent people in jail, whether they include Amanda or not, and hope that justice prevails in this case.

I appreciate your saying this, however far we end up carrying the discussion, and I hope you still feel able to say the same at the end (if that turns out not to be now).

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-07-20T06:26:35.336Z · LW(p) · GW(p)

So making progress would probably require us to pick a small number of narrowly-defined issues to hash out, one at a time.

Sounds good, like you suggested let's cover the time of death, and also continue to go deep on the question of lab contamination.

Here's an important question to assess whether we've said anything important yet: has anything I've said surprised you?

It hasn't been predictable, but it hasn't caused me to shift significantly in favor of innocence or guilt so far. I did learn I was wrong about which knife Amanda reacted strongly to, but that's within the bounds of how many errors I expected to be making here.

I suspect I could probably get you to agree that it would be extremely unusual for no food to have passed into the duodenum 5 hours after a meal (as required by the prosecution theory), even conditioned on the already unusual fact of none having passed after 150 minutes. However, I can't predict how far you will lower your probability of guilt as a result.

I haven't looked into this much. According to Massei, Umani Ronchi, a court-appointed expert, testified that a farinaceous meal takes 6-7 hours for gastric emptying, and additionally that it's possible some of the food passed into the duodenum but then, after death, slid into the small intestine. Massei also claims that even Vinci agreed with the range of 18:50 - 4:50 for time of death. Did the defence experts take into account the composition of the meal, or testify that sliding of the food after death is unlikely?

I don't think I would have a problem positing that the expert report constitutes 50:1 evidence in favor of contamination, possibly much more.

The sample in question (Trace B) tested negative for blood, as did every other sample taken from the blade. (Samples from the handle were not tested for blood.) No attempt was made to scientifically determine the actual nature of the alleged biological material.

OK, what are the odds that a small dna trace left by "stabbing + cleaning" would test positive for blood, and what are the odds a small dna contamination to the knife would test positive for blood? (By the way, do you have a specific contamination hypothesis in mind?) In both cases, keep in mind only one "small zone" of the striation was tested for blood, and the rest of the striation was consumed in DNA analysis.

When "quantification" (test to determine whether there was enough DNA to be analyzed) was performed, Traces B and C both yielded a result of "too low". Stefanoni reported Trace B as a positive result, and Trace C as a negative result, without any justification. There is no documentation in the lab data to support her statement in court that the Trace B sample was in the range of several hundred picograms. Stefanoni also claimed to have executed steps in the quantification procedure that are not documented.

Sounds like she didn't document everything; how much are you shifting based on this? Part of the problem is I don't know how much the average technician documents, so I don't know how usual or unusual this is. If nobody documents everything, but we still see a .02/homicide contamination rate, then Stefanoni's not documenting doesn't change anything.

The "amplification" (chemical copying of the sample in order to produce a large enough amount for analysis) was performed only once, despite the fact (admitted by Stefanoni) that it should be repeated in order to be considered reliable.

Can I get a source for Stefanoni's admission? Is this from the report?

Stefanoni did not perform negative controls, which could have indicated the presence of contamination.

Is this also from the report? Have you translated this part yet?

The sample was analyzed in the same laboratory at the same time as numerous samples containing Meredith Kercher's DNA.

I gave a .05 chance that, if there was a cross-contamination, it would have been of Meredith's DNA. Are you giving a different probability?

Replies from: komponisto

↑ comment by komponisto · 2011-07-20T14:34:24.734Z · LW(p) · GW(p)

I haven't looked into this much. According to Massei, Umani Ronchi, a court-appointed expert, testified that a farinaceous meal takes 6-7 hours for gastric emptying, and additionally that it's possible some of the food passed into the duodenum but then, after death, slid into the small intestine. Massei also claims that even Vinci agreed with the range of 18:50 - 4:50 for time of death. Did the defence experts take into account the composition of the meal, or testify that sliding of the food after death is unlikely?

We're talking here not about the time it takes for the stomach to emtpy completely, but rather the time it takes for ingesta to begin passing into the duodenum ("T_lag"). (At death, there was 500mL of ingesta in the stomach -- consistent with the meal size -- and nothing in the duodenum.) According to this paper, the median value of T_lag is 81.5 minutes, and the 75th percentile is 102 minutes. (Furthermore, the median time for half the contents to empty ["T_1/2"] is 127 minutes, and the 75th percentile is 168.3 minutes.) The prosecution scenario of death during the 11:00 pm hour would require a T_lag of more than 240 minutes, possibly more than 300. My understanding is that this is basically unheard of, whatever the composition of the meal.

Ronchi claimed that the coroner, Lalli, had failed to seal the duodenum via ligature, as is apparently the standard procedure; this was the basis for his claim that food could have slipped into the small intestine. However, video of the autopsy revealed that Ronchi was wrong, and that Lalli had indeed properly sealed the duodenum. (Sollecito appeal, p. 165)

OK, what are the odds that a small dna trace left by "stabbing + cleaning" would test positive for blood, and what are the odds a small dna contamination to the knife would test positive for blood? (By the way, do you have a specific contamination hypothesis in mind?) In both cases, keep in mind only one "small zone" of the striation was tested for blood, and the rest of the striation was consumed in DNA analysis.

There were four samples on the blade, B,C,E, and G that were tested for blood; the results were all negative. These traces were all presumed to be blood by Stefanoni; it seems reasonable to suppose that if there had been blood on the knife, these would have been the most likely spots in which to have found it.

A positive blood test result would require more than DNA from (white) blood cells -- it would require hemoglobin. So it seems to me that the only way to get a positive blood test result from contamination would be to spill a blood sample on the knife. I estimate the probability of this having happened as being in the range of 0.001.

On the other hand, in the event that the knife had been used for stabbing, and that the victim's DNA remained on the knife, I would estimate a probability northward of 0.9 that at least one of the "presumed blood" traces would have tested positive for blood. (Cf. ChrisHalkides' comment below: "Two experts have publicly stated that the chances of cleaning a bloody knife to the point at which blood is no longer detected but DNA is detected, are small." This agrees with my intuition, and "no more than 0.1" seems a reasonable interpretation of "small".)

Sounds like she didn't document everything;

It's worse than that; what Conti and Vecchiotti describe is suggestive of deliberate misrepresentation if not outright fabrication of results. The "several hundred picograms" quantification result for Trace B appears to have been completely made up. She claimed in court that she had obtained this result using Real Time PCR, but this was not the case: records obtained by Conti and Vecchiotti show that another method ("Qubit Fluorometer") was used, and that the result obtained was "too low" -- the exact same as for Trace C. There was no justification for treating Trace B as a positive result and Trace C as a negative, and in particular no reason for subjecting Trace B to amplification.

The "amplification" (chemical copying of the sample in order to produce a large enough amount for analysis) was performed only once, despite the fact (admitted by Stefanoni) that it should be repeated in order to be considered reliable.

Can I get a source for Stefanoni's admission? Is this from the report?

Yes (p.61).

Stefanoni did not perform negative controls, which could have indicated the presence of contamination.

Is this also from the report? Have you translated this part yet?

This is on p. 79. This section of the report, on Stefanoni's knife results, is next in line to be translated (by my collaborator katy_did, while I'll be simultaneuously doing the clasp section).

The sample was analyzed in the same laboratory at the same time as numerous samples containing Meredith Kercher's DNA.

I gave a .05 chance that, if there was a cross-contamination, it would have been of Meredith's DNA. Are you giving a different probability?

Substantially different. Stefanoni is quoted as follows on p. 102 of the report:

"...the knife was analyzed...in the course of these 50 samples attributed to the victim, some were prior to the analysis of the knife, of course, and others subsequent, so in these 50 I don't know if the knife was, I don't know now, a fourth, a third of the way through this flux of analyses..."

This puts me at 0.5 or more. Also note that the number 50 itself gives a probability of 0.1 if we use your estimate of 500 total samples in the lab.

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-07-22T07:18:08.756Z · LW(p) · GW(p)

Let's talk about the dna some more once you guys have finished translating the relevant parts of the independent report, then, if your argument hinges on details of the independent report rather than just the conclusions.

We're talking here not about the time it takes for the stomach to emtpy completely, but rather the time it takes for ingesta to begin passing into the duodenum ("T_lag").

Sounds good. In your case, for one particular meal where the subjects had probably fasted beforehand, the lag is just under 2/3 of the half-time. If you accept Umani Rochi's half-time of 360-420 minutes, then the lag could be 2/3 of that, or 240+ minutes. Of course, for all I know Umani Rochi could have been referring to the lag time, or the final gastric emptying time, rather than the half-time. Lags could easily be much smaller, or larger, than 2/3 of the half-time in this case.

It sounds like you might disagree with not just with Umani Rochi (a court-appointed expert), and Raffaele's consultant Vinci, but also with another of Raffaele's consultants, Introna, who placed the start of attack between 21:30 and 22:30.

Note that stress (such as being attacked) can increase lag time, so we might be talking about the time the attack started rather than the time of death.

In addition to the starchiness of the meal, I would claim that:

Alcohol (or drug use) may increase lag time, studies differ as to how significant this is though.
Subjects in studies usually fast before the study, which means in the real world I expect lag times to be longer. Meredith also returned home after the meal, which may be more physical activity than the subjects did, though I could be wrong about that.
Subjects in studies don't usually go and eat a snack after the meal, as I believe Meredith did. I would expect this to also increase Meredith's lag time.

Anyway, what's your model here: What do you personally estimate the lag to be based solely on digestion (assuming no slippage)? Maybe you can give a mean and a standard deviation, and we can start by modeling it as a normal distribution?

Ronchi claimed that the coroner, Lalli, had failed to seal the duodenum via ligature, as is apparently the standard procedure; this was the basis for his claim that food could have slipped into the small intestine. However, video of the autopsy revealed that Ronchi was wrong, and that Lalli had indeed properly sealed the duodenum. (Sollecito appeal, p. 165)

How much does application of the ligatures reduce the probability of slippage? If ligatures were not applied, how likely do you think complete slippage would be? If they are applied, what are the odds that (1) the slippage occurs before the ligatures are applied, or (2) the slippage occurs anyway after the ligatures are applied, perhaps due to improper application?

Replies from: komponisto, komponisto

↑ comment by komponisto · 2011-07-31T14:53:24.088Z · LW(p) · GW(p)

Let's talk about the dna some more once you guys have finished translating the relevant parts of the independent report, then, if your argument hinges on details of the independent report rather than just the conclusions.

The translation is now nearing completion (the clasp section is finished, and the knife section will be soon). Here, furthermore, are some relevant links:

Expert commentary on C-V report: http://forensicdnaconsulting.wordpress.com/2011/07/30/understanding-the-independent-dna-experts%E2%80%99-report-in-the-amanda-knox-case-part-i/
Article detailing contamination issues and other problems with DNA testing: http://www.nacdl.org/public.nsf/0/6285f6867724e1e685257124006f9177
Examples of laboratory fraud and how they were detected: http://www.bioforensics.com/conference05/FBS_Dayton_2005_Fraud.pdf
9 DNA experts sign open letter critiquing evidence near the end of the first trial: http://www.newscientist.com/article/dn18215-knox-murder-trial-evidence-flawed-say-dna-experts.html
Chris Halkides' blog: http://viewfromwilmington.blogspot.com

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-08-02T07:55:42.441Z · LW(p) · GW(p)

Great, give me a top-level post when the knife translation is finished, or when you think it's in a good enough state to back up your claims in the dna discussion.

↑ comment by komponisto · 2011-07-22T15:07:01.672Z · LW(p) · GW(p)

In your case, for one particular meal where the subjects had probably fasted beforehand, the lag is just under 2/3 of the half-time. If you accept Umani Rochi's half-time of 360-420 minutes, then the lag could be 2/3 of that, or 240+ minutes. Of course, for all I know Umani Rochi could have been referring to the lag time, or the final gastric emptying time, rather than the half-time. Lags could easily be much smaller, or larger, than 2/3 of the half-time in this case.

As best I can determine, Ronchi was talking about total emptying time, not half-time (let alone lag time). This is unquestionably what would make the most sense, given not only the term used ("gastric emptying"), but also the averages presented, for example, here:

50% of stomach contents emptied: 2.5 to 3 hours
Total emptying of the stomach: 4 to 5 hours

Given this, a total emptying time of 6-7 hours under some cirumstances doesn't seem outside the bounds of possibility. Extrapolating in such a way as to preserve ratios, we could then imagine a half-time of up to 4.5 hours, say. But 2/3 of that would give us 3 hours, or 180 minutes -- not the 4-5 hours we need for the prosecution theory.

It sounds like you might disagree with not just with Umani Rochi (a court-appointed expert), and Raffaele's consultant Vinci, but also with another of Raffaele's consultants, Introna, who placed the start of attack between 21:30 and 22:30.

Not according to p. 180 of Massei-Cristiani, where Introna is described as placing it between 21:00 and 21:30. Raffaele's appeal document argues for 21:30 - 22:00; this is apparently obtained by averaging the 2-3 hour (from last meal) figure of Lalli and Introna, and the 3-4 hour figure of Ronchi (whose testimony was incorrectly interpreted by Massei and Cristiani, according to the appeal: Raffaele's lawyers cite passages where he appeared to agree that 4 hours is the normal limit). My "disagreement" with Raffaele's lawyers in this context is of little import, for several reasons: (1) I am in perfect agreement with Introna, as reported by Massei; (2) We're talking about confidence intervals anyway; I think 21:00-21:30 is most likely, but 21:30-22:00 is not ruled out nearly as strongly in my model as anything after 22:00 is; (3) Ronchi, whose estimate figured into their calculation, was talking about total emptying time anyway, not lag time (or even half time).

As for Vinci, he was looking at different criteria for the time of death (not specifically gastric contents), and simply gave a wider range, not an incompatible one. No disagreement here that I am aware of.

Note that stress (such as being attacked) can increase lag time, so we might be talking about the time the attack started rather than the time of death.

Sure; just remember that the computer evidence provides an alibi up to nearly 21:30.

In addition to the starchiness of the meal, I would claim [various ways lag time could be increased beyond study results]

No doubt you've identified some of the ways Meredith's digestive process could have been slowed (although there is no evidence of significant alcohol or drug consumption), in the event that there actually was a lag time of 4-5 hours. The question, however, is how likely such an extraordinary retardation is. According to standard data (see below), it should be a highly unusual event. So how does this information (the fact that a time of death -- or, if you like, attack time -- after 23:00 requires a lag time of over 4 hours) affect your probability of guilt? It seems to me that it should go down noticeably, unless your model was already incorporating both the studies on digestion and the fact that Meredith's duodenum was empty (i.e. you weren't surprised by either datum). (For what it's worth, I think the Massei court erred in this regard by ignoring lag time, and also by using uncertainty as an excuse to smuggle in probability for their preferred conclusion.)

Anyway, what's your model here: What do you personally estimate the lag to be based solely on digestion (assuming no slippage)? Maybe you can give a mean and a standard deviation, and we can start by modeling it as a normal distribution?

Although the paper I cited explicitly stated that the results did not fit to a normal distribution, the percentiles given are fairly well approximated by assuming a mean of 81.5 and a standard deviation of 30. Under these assumptions, the lag time required for the Massei guilt scenario would be at least a five- or six-sigma event.

Now I know you doubt that the conditions of the study hold here, but don't you find this at least a little bit confusing? To make the Massei narrative reasonable, you would basically have to assume that (1) the 6-7 hours allowed by Ronchi for total emptying of farinaceous meals is typical rather than exceptional; (2) this extrapolates to a typical lag time of 3 hours or more, as in my calculation above; and (3) the variance is large enough to make a 4-5-hour lag time a reasonable exception (in which case we would probably be talking about a total emptying time of 8-9 hours or more). Each of these seems highly doubtful, to say nothing of their conjunction. Regarding (2) in particular, note that the study data suggests that the ratio of lag time to total emptying time is closer to 1/3 than to 1/2 (implying a more concave relationship between elapsed time and percentage of contents emptied; suggesting perhaps that lag time may be short even when total emptying time is long).

How much does application of the ligatures reduce the probability of slippage? If ligatures were not applied, how likely do you think complete slippage would be? If they are applied, what are the odds that (1) the slippage occurs before the ligatures are applied, or (2) the slippage occurs anyway after the ligatures are applied, perhaps due to improper application?

I imagine that preventing slippage is at least part of the purpose of the ligatures, and so I assume they reduce the probability significantly. But even in the worst-case scenario here, the amount of slippage can't have been very large, because the stomach contents could easily have constituted the entire meal on their own. In the unlikely event that the ligatures were improperly applied, we can infer that Meredith may have just passed her lag time, and that a few pieces of food had just started to pass into her duodenum. This is of minimal help to the prosecution, because on their timeline, we should have been long into that stage, and the stomach should not have been nearly as full as it was -- indeed, we should have expected with significant probability that the stomach would be completely empty.

To illustrate further, if as much as half of Meredith's meal had passed into the duodenum, and we assume a normally-distributed half-time with median 127 minutes and standard deviation 40 (the median taken from the study), the finding would still have put her well within the slowest 1% under the prosecution theory (while only in the slowest 10% to 50% under the defense theory).

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-08-02T07:56:27.780Z · LW(p) · GW(p)

Just replied here: http://lesswrong.com/r/discussion/lw/6k7/experiment_knox_case_debate_with_rolf_nelson/4liq

↑ comment by rolf_nelson · 2011-07-12T04:18:16.784Z · LW(p) · GW(p)

So we're probably talking about an order of magnitude increase in the base rate: more like 1/5 instead of 1/50.

I'll posit a factor of three.

(Incidentally, there was never any "first independent report" prior to this one...)

My bad, that was quite confusing of me, I mean the Massei(+Cristiani) report, which is independent of the police lab. Will fix. Thanks!

The sample in question (Trace B) tested negative for blood, as did every other sample taken from the blade.

IMHO irrelevant, Meridith's DNA is incriminating whether it's from blood or tissue, and whether any blood chemicals remained after cleaning in sufficient quantity to test positive.

(Various lab procedural criticisms)

Do you want to formally introduce a hypothesis that, LCN aside, this test was sloppier than the average test at the average lab? (Call this "ddk.slop") If so, one way to numerically assess would be to establish:

How sloppy do you think the test was, in terms of percentile? Are 10% of lab tests as or more sloppy than this one? 1%? Less?
How certain are you that the ddk.slop and its associated percentile is true? (FWIW I currently think this lab test is less sloppy than average.)

Then we can quantify your estimated shift to the base rate.

Replies from: komponisto

↑ comment by komponisto · 2011-07-12T18:08:50.553Z · LW(p) · GW(p)

The sample in question (Trace B) tested negative for blood, as did every other sample taken from the blade.

IMHO irrelevant, Meridith's DNA is incriminating whether it's from blood or tissue, and whether any blood chemicals remained after cleaning in sufficient quantity to test positive.

Nonetheless, conditioned on Meredith's DNA being present, a negative blood test is surely significant evidence in favor of contamination over guilt, isn't it? If it was contamination, you would expect this with near certainty; whereas if the knife had been used to kill Meredith, what are the chances that any cleaning job was good enough to leave no trace of blood, yet bad enough to leave not only Meredith's DNA, but also granules of starch?

(Various lab procedural criticisms)

Do you want to formally introduce a hypothesis that, LCN aside, this test was sloppier than the average test at the average lab?

I'm not sure what you mean by "LCN aside"; LCN would seem integral to such a claim, since ordinary standards would already constitute sloppiness in the LCN context.

But anyway I'm not sure I need to introduce such a hypothesis, since my beliefs about the unreliability of the knife result do not particularly depend on any beliefs I have about reliability of results in the average lab. It suffices for my purposes that the facts mentioned by Conti and Vecchiotti are Bayesian evidence of contamination given general scientific "common sense". While I currently think it unlikely that Stefanoni's level of sloppiness is typical, if it turned out that it was, that would sooner undermine my confidence in forensic lab results in general than my beliefs about this case.

Formally, in other words, I view the base rate of contamination as being screened off by the detailed information about lab procedures, rather than an independent piece of evidence to be weighed.

↑ comment by Wei Dai (Wei_Dai) · 2011-07-11T03:17:55.183Z · LW(p) · GW(p)

If there's a single lab contamination, the odds that it contaminates a likely murder weapon at the cottage is about .002.

I'm having trouble understanding this. Can you explain a bit more? How did you get .002?

Replies from: komponisto

↑ comment by komponisto · 2011-07-11T19:18:49.556Z · LW(p) · GW(p)

My guess is it's probably figured by assuming there are ~500 objects being examined by the lab. However, I would point out that contaminations are unlikely to be independent of each other, and are instead probably correlated via the general level of standards in the laboratory. In other words, if there is a single contamination, there are likely to be others as well (reported or not).

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-07-12T04:03:19.151Z · LW(p) · GW(p)

My guess is it's probably figured by assuming there are ~500 objects being examined by the lab.

Correct, I'm guessing a mean of roughly 500 examinations per homicide investigation.

...if there is a single contamination, there are likely to be others as well (reported or not).

Agree.

↑ comment by rolf_nelson · 2011-07-09T19:26:38.464Z · LW(p) · GW(p)

I tentatively disagree on the interpretation of the experts' report, 'reliability' means a different thing for court DNA procedure vs. Bayesian analysis. I doubt they would say, "this kind of thing is 99% likely to point to the correct suspect, so it is mostly reliable by our standards".

Replies from: komponisto

↑ comment by komponisto · 2011-07-09T20:28:14.767Z · LW(p) · GW(p)

I doubt they would say, "this kind of thing is 99% likely to point to the correct suspect, so it is mostly reliable by our standards".

If they didn't have the option of saying that, then it's hard for me to see what the point of the review was. Is it your position that Conti and Vecchiotti would have been likely to use the same strong language as they did, regardless of the position of this evidence along the continuum of reliability? Could they have said anything harsher if it were truly unreliable? Do you think a similar review of the DNA evidence against Guede would have produced a similar level of criticism?

comment by komponisto · 2011-08-09T04:10:54.965Z · LW(p) · GW(p)

The translation of the Conti-Vecchiotti report is now complete.

comment by magfrump · 2011-08-22T06:11:01.833Z · LW(p) · GW(p)

I just came back to read through this discussion after seeing it linked. Is it currently over? The points seem pretty heavily unresolved, but it has been 8 days since the last update and the previous longest gap was 9 days due to a technical problem, with nothing else exceeding 7 days. So I am guessing the thread is still going with time lapses to let the participants cool down, but I'd like to both confirm that this is the case (because I believe this type of experiment is extremely valuable) and note that there are people (at least one person) following along and at the very least enjoying the read.

Replies from: komponisto

↑ comment by komponisto · 2011-09-07T09:16:36.748Z · LW(p) · GW(p)

It's still going on as far as I'm concerned; but there isn't any particular limit to delays between updates.

comment by ChrisHalkides · 2011-07-21T02:03:07.035Z · LW(p) · GW(p)

Rolf,

I specifically asked Professor Dan Krane, who heads a DNA forensics company, which was more important, to be present at the testing or to have access to the electronic data files. He said that to perform a case review, the electronic data files were very important, and the observation of the tests themselves were not that important. The prosecution never turned those critical EDFs over to the defense, and even the independent experts had to work hard to obtain them. The defense could have used the files to look for evidence of contamination or secondary transfer.

comment by ChrisHalkides · 2011-07-15T15:29:19.951Z · LW(p) · GW(p)

I would also point to the apparent lack of negative controls and substrate controls as something that increases the odds of contamination. My point in bringing this to everyone's attention is that I am not certain that I am seeing bias here, so much as I am seeing two people with different beliefs about the facts of the case. That would have to be addressed first, IMO. Finally, I have given a brief case for innocence on my blog: http://viewfromwilmington.blogspot.com/2011/01/why-i-believe-that-amanda-knox-and.html

comment by ChrisHalkides · 2011-07-14T14:28:27.768Z · LW(p) · GW(p)

Either the knife had bleach on it, or it had DNA on it, not both. A 2-3% dilution of bleach is recommended by Promega for cleaning pipets in PCR work. They also note that if one does not thoroughly rinse away the bleach, it will affect subsequent experiments. There is also a 1998 article in Biotechniques which shows that a 10% dilution of bleach damages DNA within one minute. If the prosection's claim is that the knife was bleached (and I am not certain that it is), then it is in conflict with their claim that the knife had Meredith's DNA when it was taken into custody.

comment by ChrisHalkides · 2011-07-14T10:30:50.702Z · LW(p) · GW(p)

The exact value for the frequency of contamination is a very difficult number to pin down, for a variety of reasons. However, two facts make it more likely that it has occurred with respect to the knife than with a typical object. One is that the sample is in the low template range of analysis but no special handling precautions (a separate building with positive flow air hoods, etc.) were taken. The mere existence of so many special handling precautions in true Low Copy Number (LCN) analysis suggest that contamination is more likely when working in this very low concentration range. Two is that there was no blood on the knife. Two experts have publicly stated that the chances of cleaning a bloody knife to the point at which blood is no longer detected but DNA is detected, are small.

comment by rolf_nelson · 2013-02-23T19:27:39.541Z · LW(p) · GW(p)

I finally got around to reading through the appeal motivation and the relevant parts of the Conti-Vecchiotti report, and I find nothing to lend credence to the innocence hypothesis. If anything, I would judge the timing of the double-DNA knife testing seems to move the 'laboratory contamination' hypothesis from very very very unlikely to very very very very unlikely.

So in the end, I have to apologize to kompinisto as I have inadvertently wasted both of our time in suggesting this debate; our failure to reach a consensus on an accurate truth in this issue is a mild lose-lose.

comment by rolf_nelson · 2011-08-02T07:46:20.942Z · LW(p) · GW(p)

Reply to: http://lesswrong.com/r/discussion/lw/6k7/experiment_knox_case_debate_with_rolf_nelson/4jmb

Didn't realize you updated, looks like we can't go more than 8 or 9 deep before the RSS feed stops notifying about thread changes.

In terms of narrowing down what Umani Ronchi was actually saying, saying that the prosecution claims something in its appeals document isn't useful evidence. If there's a specific quote of Umani Ronchi that the prosecution makes, that might be useful, as long as the quote is clear enough that we can deduce it isn't being quoted out-of-context.

It sounds like you might disagree with... another of Raffaele's consultants, Introna, who placed the start of attack between 21:30 and 22:30.

Not according to p. 180 of Massei-Cristiani, where Introna is described as placing it between 21:00 and 21:30.

That's a good find, and you may be right. I was going by this, but maybe it's a mistranslation or a misunderstanding:

"[Introna] also observed that the beginning of the attack must have been a moment of tremendous stress for Kercher and may have arrested the digestive process. One could and should obtain a precise indication from this, in the sense that the stress to which the victim was subjected must have started between 21:30 pm and 22:30 pm." (p. 130)

Raffaele's appeal document argues for 21:30 - 22:00... I think 21:00-21:30 is most likely, but 21:30-22:00 is not ruled out nearly as strongly in my model as anything after 22:00 is.. the computer evidence provides an alibi up to nearly 21:30.

Again, post-trial prosecution claims that haven't even gone under cross-examination aren't useful evidence. If you want an alibi to 21:30, you'll have to provide better support, which will unfortunately be difficult even if Raffaele is innocent, since computers open files in the background all the time, and so not just the timing but the nature of the file opened will have to be examined.

(although their is no evidence of significant alcohol or drug consumption)

There's evidence of about one glass (p. 152), so around 10 ml. A dose of 60 ml appears to almost double emptying time in one study (http://alcalc.oxfordjournals.org/content/40/3/187.full.pdf+html), so I'd expect a change of about 10-20% in Meredith's case. So probably not terribly significant on its own. I know there was no trace of drugs found in her body, and marijuana appears to have a long half-life, so I agree there's no drug consumption even though Meredith had easy access to marijuana.

Although the paper I cited explicitly stated that the results did not fit to a normal distribution, the percentiles given are fairly well approximated by assuming a mean of 81.5 and a standard deviation of 30. Under these assumptions, the lag time required for the Massei guilt scenario would be at least a five- or six-sigma event.

So to get to even 21:00 from 18:00, you need to go out by more than 90 minutes. Three standard deviations is >.99 probability, so this model doesn't seem to be accurate, at least not with a normal distribution. So do you want to propose a new model with a greater standard deviation, or propose that it's not a normal distribution? If the latter, I would expect the deviation from normality to be equally likely to work against Raffaele, as it is to work in his favor.

In the other direction, a 30-minute variance is already too large to provide much evidence in favor of Raffaele's innocence, especially without further evidence of a 21:30 alibi.

Now I know you doubt that the conditions of the study hold here, but don't you find this at least a little bit confusing?

I agree that the stomach findings are a mild surprise if we're talking about 23:00+ like in the Massei narrative, but the first problem is that my surprise is only mild since there are so many factors that affect it, and the second problem is that once I'm slightly surprised by going out to 21:00, I don't get much more surprised by going out to 21:30 or even 22:00, and so don't see Raffaele as having an alibi.

As an example, does it surprise you that the abstract of one (unfortunately gated) study (http://www.ncbi.nlm.nih.gov/pubmed/7956593) of fried food gives 317 minutes for total gastric emptying, even though it probably, like other experiments, is unlike Meredith's case in that it probably involves pre-experiment fasting and no post-meal snack.

But even in the worst-case scenario here, the amount of slippage can't have been very large, because the stomach contents could easily have constituted the entire meal on their own.

I don't follow the logic here; isn't the more important question whether the stomach contents could have equally well constituted just half or 2/3 the meal? Or do you just mean that it's unlikely more than half of the meal passed through?

To illustrate further, if as much as half of Meredith's meal had passed into the duodenum, and we assume a normally-distributed half-time with median 127 minutes and standard deviation 40 (the median taken from the study), the finding would still have put her well within the slowest 1% under the prosecution theory (while only in the slowest 10% to 50% under the defense theory).

Yes, your model (if correct) of a non-ligature situation harms the court's theory about the attack taking place somewhere in the 23:00-23:30 interval, though it fails again to save Raffaele's computer alibi. Plus, I think you're underestimating the quantitative level of uncertainty if we don't know how much she ate, exactly what all she ate, exactly when she started eating, what effect having a post-meal snack has, what effect not fasting has, amount of alcohol consumed, and what effect walking home after eating had, all of which should contribute to a large standard deviation.

Replies from: komponisto

↑ comment by komponisto · 2011-08-03T16:22:10.884Z · LW(p) · GW(p)

In terms of narrowing down what Umani Ronchi was actually saying, saying that the prosecution claims something in its appeals document isn't useful evidence.

Surely you meant the defense appeal document here? (I haven't referenced the prosecution appeal, and there wouldn't be much reason to, since it's just a 20-page rant arguing that Amanda and Raffaele are really nasty people and deserve a harsher sentence than the Massei court gave them.)

My interpretation of Ronchi doesn't depend on the defense appeal; it's simply the common-sense default meaning of what he said, as reported in Massei-Cristiani, and confirmed by general information about average gastric emptying times.

But even if it did, the appeal documents constitute the defense's reply to the Massei-Cristiani report, and so I don't see why they are any less useful than the latter. They rely on the same records that Massei and Cristiani do.

(although there is no evidence of significant alcohol or drug consumption)

There's evidence of about one glass (p. 152), so around 10 ml.

Interestingly, p. 390 says the opposite: that Meredith had not consumed alcohol, according to Lalli. (And indeed it has been suggested by others elsewhere that the alleged small gastric alcohol level could have been due to a fermentation reaction). However, this is unlikely to be an important issue, as you point out.

Of course, this is not the only internal contradiction in the document:

Not according to p. 180 of Massei-Cristiani, where Introna is described as placing it between 21:00 and 21:30.

[...] "[Introna] also observed that the beginning of the attack must have been a moment of tremendous stress for Kercher and may have arrested the digestive process. One could and should obtain a precise indication from this, in the sense that the stress to which the victim was subjected must have started between 21:30 pm and 22:30 pm." (p. 130)

Indeed, it seems the only way to know for sure which of these passages (if either) is accurate would be to have a transcript of Introna's testimony, which we unfortunately don't have. However, it's pretty clear in any case that Introna would exclude the Massei timeline of post-23:00.

So to get to even 21:00 from 18:00, you need to go out by more than 90 minutes. Three standard deviations is >.99 probability, so this model doesn't seem to be accurate, at least not with a normal distribution. So do you want to propose a new model with a greater standard deviation, or propose that it's not a normal distribution? If the latter, I would expect the deviation from normality to be equally likely to work against Raffaele, as it is to work in his favor.

I would sooner hypothesize that Meredith's last meal actually took place closer to 19:00 than 18:00, given the vagueness of the testimony on the matter. This puts her within 2 standard deviations, perhaps even 1.5.

But, granting a non-normal distribution, it's really difficult for me to see how it could significantly work against Raffaele, given where the 25th and 75th percentiles are. Probability mass would have to be transferred to the extreme right tail from somewhere else; how do you propose to do this in a way that isn't specifically tailored to yield the desired bottom line?

I agree that the stomach findings are a mild surprise if we're talking about 23:00+ like in the Massei narrative, but the first problem is that my surprise is only mild since there are so many factors that affect it, and the second problem is that once I'm slightly surprised by going out to 21:00, I don't get much more surprised by going out to 21:30 or even 22:00, and so don't see Raffaele as having an alibi.

My questions, in that case, are:

(1a) What does your gastric lag-time model look like, such that you don't get significantly more surprised by going out to 22:00 than 21:00?

(1b) Why do you believe that model rather than one more similar to mine?

(2) What is your probability of guilt, conditioned on death having occurred (a) before 21:30? (b) before 22:00?

But even in the worst-case scenario here, the amount of slippage can't have been very large, because the stomach contents could easily have constituted the entire meal on their own.

I don't follow the logic here; isn't the more important question whether the stomach contents could have equally well constituted just half or 2/3 the meal? Or do you just mean that it's unlikely more than half of the meal passed through?

Slippage is a priori unlikely, especially with the ligatures applied (professional opinion), and hence given a level of gastric contents consistent with the meal in question, there's no reason to believe any significant slippage occurred.

As an example, does it surprise you that the abstract of one (unfortunately gated) study (http://www.ncbi.nlm.nih.gov/pubmed/7956593) of fried food gives 317 minutes for total gastric emptying, even though it probably, like other experiments, is unlike Meredith's case in that it probably involves pre-experiment fasting and no post-meal snack[?]

Only with regard to fried food being the cause; as you'll recall I've already allowed for a total emptying time of 6-7 hours "in some circumstances". Note that this timeline is characterized as "markedly delayed" by the authors. And, once again, the relevant variable for us is lag time, not total emptying time. (If we try to extrapolate, using the fact that 1/2 seems to be an upper bound on the ratio of lag time to total emptying time, with 1/3 being in practice a better estimate, this would yield no more than 158.5 minutes, and probably something more like 105 minutes, in this "markedly delayed" scenario.)

The lag time given in the alcohol study you linked to is 48.1 ± 6.5 minutes (!). (And note this: "The lag phases after 4 and 10% (v/v) ethanol, beer, and red wine were not significantly different from that of water... the inhibitory effect of ethanol and alcoholic beverages is mainly induced by a prolongation of the gastric emptying phase (without affecting the lag phase)...")

Here is another source characterizing any lag time over 150 minutes as "extremely delayed". By comparison, "normal" is 50-100 min and "delayed" is 100-150. For half-emptying time, over 200 minutes is "extremely delayed".

I think you're underestimating the quantitative level of uncertainty if we don't know how much she ate, exactly what all she ate, exactly when she started eating, what effect having a post-meal snack has, what effect not fasting has, amount of alcohol consumed, and what effect walking home after eating had, all of which should contribute to a large standard deviation.

Just how large do you think the standard deviation is? If you believe in the Massei theory, you have to come up with a lag time of four hours at minimum. I can't find any evidence that that is anywhere close to being within normal human parameters. Can you?

In my view, essentially all of the uncertainty arising from the factors you mention is used up simply by postulating a lag time of two hours or more, in contrast to the more typical 50-100 minutes. This view is supported by the sizes of the standard deviations relative to the means in all of the various studies.

On the other hand, if you want to believe the time of death was earlier, you run into other problems (in addition to the improbably extreme lag time for anything after 22:00). From 22:30 onward there was a broken-down car outside the cottage, with a tow truck arriving at around 23:20-23:30. No one associated with this incident (occupants of the car, tow-truck operator, a street witness) reported seeing anyone enter or exit the cottage, or hearing anything coming from inside. (This is of course also a problem for the Massei timeline.) There was activity on Meredith's cell phone at 21:58, 22:00, and 22:13, making it unlikely that death occurred between these times. (Incidentally, it's worth noting the interrupted call home at 20:56, not attempted again afterward, which is extremely consistent with the defense theory of when the attack occurred.) And then, of course, there is the computer activity at 21:10 and (according to the defense) 21:26.

So what is your probability distribution for time of death?

Replies from: rolf_nelson, rolf_nelson

↑ comment by rolf_nelson · 2011-08-04T08:24:10.234Z · LW(p) · GW(p)

I would sooner hypothesize that Meredith's last meal actually took place closer to 19:00 than 18:00, given the vagueness of the testimony on the matter. This puts her within 2 standard deviations, perhaps even 1.5.

If we model the meal start-time as a normal distribution, then it'll be simple to add it to the model and combine it with the other sources of uncertainty, since two normal distributions sum to a new normal distribution with a variance equal to the sum of the variances. Though now that I mention it, a lot of the other bits of uncertainty might be somewhat log-normal because they might multiply the time rather than add to it.

But, granting a non-normal distribution, it's really difficult for me to see how it could significantly work against Raffaele, given where the 25th and 75th percentiles are. Probability mass would have to be transferred to the extreme right tail from somewhere else; how do you propose to do this in a way that isn't specifically tailored to yield the desired bottom line?

To give two contrasting examples, something like female heights (http://www.johndcook.com/blog/2008/07/20/why-heights-are-not-normally-distributed/) would work against Raffaele because outliers are few and extreme, while a gently bimodal distribution like human heights (http://www.johndcook.com/mixture_distribution.html) might work in Raffaele's favor because of a concentration in the center.

My questions, in that case, are:

(1a) What does your gastric lag-time model look like, such that you don't get significantly more surprised by going out to 22:00 than 21:00?

Good question. Let me look here at some more papers. One source of uncertainty is that I don't know if we care in this case about 2% or 10% or something else.

The first completely-ungated study I found in Google shows 10 minutes for a 2% decrease (http://jnm.snmjournals.org/content/32/7/1349.full.pdf).

Second study shows 25 minutes for a 10% decrease (http://jnm.snmjournals.org/content/32/7/1349.full.pdf).

Third study shows 23 minutes using multiple methods (http://jnm.snmjournals.org/content/37/10/1639.full.pdf).

The gated study you cited shows 81.5 minutes using unknown-to-me methods, perhaps the meal was larger or different from the other studies.

So I guess I would reluctantly discard the concept of attempting solely normal distributions, since this already is looking too right-tailed. So this is too complex for me to easily model, I can only say that intuitively even if we use 19:00, then if a genie tells me it's at least 120 minutes, then I wouldn't be much more surprised by 150 minutes or 180 minutes. The first three studies above looked like they were behaving at 10, 25, and 23, and then your example jumped to more than 3x the highest figure so far. So jumping again to even 3x of your number wouldn't be more than a one-in-ten surprise, especially given the numerous factors I've itemized.

(2) What is your probability of guilt, conditioned on death having occurred (a) before 21:30? (b) before 22:00?

If we're not taking systemic uncertainty into account, then it's still going to be quite a large probability of guilt. However, I would say that, compared with 23:00, (a) would shift me by about 15:1 on the grounds that the computer evidence would have to be mis-analyzed, or (more likely) Raffaele would have had to manufacture the computer alibi (recall Raffaele is a computer engineer), and (b) by 5:1 on the grounds that the timetable gets a bit tighter than in the 23:00 case. Keep in mind that I'm currently not yet bothering to weigh the eyewitness testimony at all in my assessment of guilt.

Slippage is a priori unlikely, especially with the ligatures applied (professional opinion), and hence given a level of gastric contents consistent with the meal in question, there's no reason to believe any significant slippage occurred.

I believe the independent court expert more than hearsay that an unknown FRCPath claimed that, even without ligatures, complete slippage is "well-nigh impossible".

And note this: "The lag phases after 4 and 10% (v/v) ethanol, beer, and red wine were not significantly different from that of water... the inhibitory effect of ethanol and alcoholic beverages is mainly induced by a prolongation of the gastric emptying phase (without affecting the lag phase)..."

That's a good point, so I hereby drop the alcohol point altogether for the non-slippage case.

Here is another source characterizing any lag time over 150 minutes as "extremely delayed". By comparison, "normal" is 50-100 min and "delayed" is 100-150. For half-emptying time, over 200 minutes is "extremely delayed".

This seems to be for small easily-digested test meals, as far as I can tell. No hospital is going to serve a patient a pizza to determine how well their diabetes is under control. ;-)

Just how large do you think the standard deviation is? If you believe in the Massei theory, you have to come up with a lag time of four hours at minimum. I can't find any evidence that that is anywhere close to being within normal human parameters. Can you?

I see that large, fried, and/or starchy meals have much larger T(50) times than other meals, and I don't have any lag times for those. Since T(50) times are frequently unexpectedly large, and since lag times correlate in some large but unknown way with T(50) times, I infer a significant probability that lag times are frequently unexpectedly large as well.

Let me float one scenario. I'd presume that starch increases the T(50) time so much because it can take a long time for large amounts of starch to convert to sugar in the stomach. Does almost the entire portion of starch need to get converted to sugar before any starch can go to the duodenum? If so, then the lag time for a large starch meal would be close to the T(50) time.

On the other hand, if you want to believe the time of death was earlier, you run into other problems...

Sounds like a whole other discussion.

So what is your probability distribution for time of death?

Based on just stomach evidence, and ignoring expert testimony, I'd have to say it most likely happened around 19:00. So that's not very useful.

If we take a leap of faith and use the 317 minutes T(50) for 700 kcal fried pasta but don't believe the starch needs to convert first, then I'd revert to a 1/4 guess for lag time on the basis the ratio decreases as T(50) grows, resulting in 80 +/- 6 minutes, so that model fails for me as well, dang it.

Factoring in that it wasn't before 21:00, but still ignoring expert testimony, I'll have to take an "inside view" and try to generate hypotheses as to why it took so long. I'll currently guess that for to get us out to 21:00, either the starch needs to convert to sugar first (40%), or else there was slippage after the body was discovered (5%), or that there was slippage when the body was moved by one or more perps before being discovered by the police and "ligatured" (55%). I'm open to other suggestions. Unfortunately the gated 81-minute median study isn't currently helpful in this regard, because I have to ask myself, why was this study 81 minutes, instead of the others that were 25 or 10 or 40 minutes? But if we can find out whatever X factor increased it to 81 minutes, then might be able to judge how much of that X factor we had in our case, and whether we had more or less X factor than in the study. Anyway, overall I'll guess 30% for 21:00-21:30, 20% for 21:30-22:00, 25% for 22:00-23:00, 7% for 23:00-23:30.

Now let's factor in expert testimony. Since none of our models are working very well, and since the literature that I've seen doesn't converge on a single simple model anyway, I think in the end I'll go with the independent expert testimony. The experts have access to gated medical journals and even some kind of summary chart of different times under different situations in the literature, as well as forensic experience, which I don't have. They also get to factor in the body temperature, which I've been ignoring.

Replies from: ciphergoth, komponisto

↑ comment by Paul Crowley (ciphergoth) · 2011-08-04T10:28:58.277Z · LW(p) · GW(p)

I'm not following this discussion in detail, but I'm glad you guys are having it - I think it's a worthwhile case study, and the flavour of the way it's discussed is informative without getting into the detail of the subject matter.

↑ comment by komponisto · 2011-08-05T14:28:58.531Z · LW(p) · GW(p)

(2) What is your probability of guilt, conditioned on death having occurred (a) before 21:30? (b) before 22:00?

If we're not taking systemic uncertainty into account, then it's still going to be quite a large probability of guilt. However, I would say that, compared with 23:00, (a) would shift me by about 15:1 on the grounds that the computer evidence would have to be mis-analyzed, or (more likely) Raffaele would have had to manufacture the computer alibi (recall Raffaele is a computer engineer), and (b) by 5:1 on the grounds that the timetable gets a bit tighter than in the 23:00 case.

If we don't take systemic uncertainty into account, then 15:1 isn't much of a shift in the face of numbers like 2500:1 or 200000:1 that you were giving for the knife. On the other hand, if we do take systemic uncertainty into account (as we ultimately must), a shift of 15:1 or even 5:1 would be significant, given your estimate of .95 probability of guilt, or 19:1 odds. Crudely approximating this as 20:1, it would take you down to 4:3 (p = 4/7 = 0.57) or 4:1 (p= 4/5 = 0.8) respectively. I imagine that if you were to believe the 21:26 computer interaction, the pre-22:00 odds could potentially go down to something like 4:3 as well. This indicates that it may be worthwhile to keep pursuing the time-of-death issue.

Keep in mind that I'm currently not yet bothering to weigh the eyewitness testimony at all in my assessment of guilt.

Well, I was already assuming that you didn't believe Curatolo (for one thing, he gives Amanda and Raffaele an alibi for 21:30-23:00!). If you do, that will have to be dealt with separately. But in general, not only is eyewitness testimony among the weakest forms of evidence, but it cuts both ways in this case, as the broken-down car illustrates.

Slippage is a priori unlikely, especially with the ligatures applied (professional opinion), and hence given a level of gastric contents consistent with the meal in question, there's no reason to believe any significant slippage occurred.

I believe the independent court expert more than hearsay that an unknown FRCPath claimed that, even without ligatures, complete slippage is "well-nigh impossible".

Keep in mind, however, that Ronchi's speculation about slippage was based on the mistaken assumption that ligatures had not been applied. So there may not be as much contradiction here as it appears (especially since it isn't clear to me that the FRCPath's opinion necessarily pertained to the non-ligature case, depending on how standard ligatures are).

How much slippage do you think may have occurred?

Here is another source characterizing any lag time over 150 minutes as "extremely delayed". By comparison, "normal" is 50-100 min and "delayed" is 100-150. For half-emptying time, over 200 minutes is "extremely delayed".

This seems to be for small easily-digested test meals, as far as I can tell. No hospital is going to serve a patient a pizza to determine how well their diabetes is under control. ;-)

If you can find a reference to support the idea that a lag time in excess of four or even three hours would not be highly unusual for a small-to-moderate pizza meal eaten by a healthy adult human, I will update appropriately. Conversely, if you can't (and I haven't been able to so far), I don't see how you can derive the level of uncertainty you need to make the Massei theory plausible in the face of all the other data. Even acknowledging the wide variation in lag times depending on the type of meal used in the studies, they are all on the short end; there is no indication, anywhere (that I have come across), of the kind of extremes that we would need at the long end. Not so much as a passing remark. Isn't this a bit suspicious?

I see that large, fried, and/or starchy meals have much larger T(50) times than other meals, and I don't have any lag times for those. Since T(50) times are frequently unexpectedly large, and since lag times correlate in some large but unknown way with T(50) times, I infer a significant probability that lag times are frequently unexpectedly large as well.

I agree, of course; I just don't agree that "frequently unexpectedly large" translates to anything like "regularly in excess of four hours".

Let me float one scenario. I'd presume that starch increases the T(50) time so much because it can take a long time for large amounts of starch to convert to sugar in the stomach. Does almost the entire portion of starch need to get converted to sugar before any starch can go to the duodenum? If so, then the lag time for a large starch meal would be close to the T(50) time...

If we take a leap of faith and use the 317 minutes T(50) for 700 kcal fried pasta but don't believe the starch needs to convert first, then I'd revert to a 1/4 guess for lag time on the basis the ratio decreases as T(50) grows, resulting in 80 +/- 6 minutes, so that model fails for me as well, dang it.

I'll ask Yvain (LW's medical student extraordinaire) if he knows anything about the mechanisms involved and the plausibility of your proposed scenario. At the very least I expect he'll know someone who knows something. (Update: Yvain says he doesn't know any more than we do.)

I'm confused about the notation T(50): does this refer to half-time, or total emptying time? Because the 317 minutes for fried pasta was total emptying time.

I'm open to other suggestions. Unfortunately the gated 81-minute median study isn't currently helpful in this regard, because I have to ask myself, why was this study 81 minutes, instead of the others that were 25 or 10 or 40 minutes? But if we can find out whatever X factor increased it to 81 minutes, then might be able to judge how much of that X factor we had in our case, and whether we had more or less X factor than in the study.

I'll try to find this out from people who have read the study.

Anyway, overall I'll guess 30% for 21:00-21:30, 20% for 21:30-22:00, 25% for 22:00-23:00, 7% for 23:00-23:30.

Not that it's surprising, given the roughness of all these estimates, but there seems to be an inconsistency with your other probabilities: if you believe p = 95% for guilt overall, p = 80% conditioned on before 22:00, and p = 57% conditioned on before 21:30, then you have 40 percentage points' worth of guilt to distribute among the 50% before 22:00, and only 17 of those are taken from the 30% before 21:30; leaving you with 23% to be distributed among the 20% between 21:30 and 22:00, which is impossible.

(EDIT: And also, 55% worth of guilt-probability to be distributed among the 50% post-22:00 probability, likewise impossible.)

Now let's factor in expert testimony. Since none of our models are working very well, and since the literature that I've seen doesn't converge on a single simple model anyway, I think in the end I'll go with the independent expert testimony.

Which testimony? Ronchi didn't give any testimony about lag time, as far as I know.

They also get to factor in the body temperature, which I've been ignoring.

Unfortunately, one of the many scandals of this case is that the body temperature measurement was delayed until 12 hours after the discovery of the body, limiting its usefulness.

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-08-12T03:42:14.346Z · LW(p) · GW(p)

Thanks for another well-researched reply, let's have a couple more posts on this, and then turn to the DNA for a bit.

On the other hand, if we do take systemic uncertainty into account (as we ultimately must), a shift of 15:1 or even 5:1 would be significant, given your estimate of .95 probability of guilt, or 19:1 odds.

The problem is that systemic uncertainty works both ways. If I see there being, say, 10 times as much evidence for guilt then there is for innocence, I'll still cap the probability of guilt at .95 anyway, due to systemic uncertainty. If I change my mind and decided there was 5, or 20, times as much evidence for guilt, the basic conclusion won't change.

To look at it another way, I expect that if we examine ten pieces of evidence as to whether the Earth is flat, on average one of the pieces can easily point to the Earth being flat at a 10:1 ratio by chance. You would need to either have a much stronger piece of evidence among the first ten pieces, or else have more than one of the pieces point to the Earth being flat, to show that something is true given the first ten pieces of evidence.

How much slippage do you think may have occurred?

There's a ton of factors here, I'll guess that if there's slippage, it's about 50% that the entire contents would slip; probably our digestion process is evolutionary designed to make the food pass through easily by that stage. Probably another 50% that a suspiciously large amount of food would be found in the small intenstine. I could narrow it down more if I knew how large the volume of the first part of the small intestine to the first bend is, whether the first part of the small intestine was searched, whether the rest of the small intestine was searched, how fast food is evacuated from the duodenum, whether food keeps getting evacuated from the duodenum after the stress of being threatened with a knife occurs, whether peristalsis continues to push food through the small intestine after stress, whether diffusion of food through the small intestine walls continues after stress or even death, and how fast peristalsis and diffusion work.

I'll add that a search for '"empty duodenum" forensics' suggests to me that, as far as I can tell, almost nobody except for Amanda Knox's defense has ever cared whether a duodenum was empty or not. That probably puts an upper-bound on how useful this evidence is; if it were reliable, I would expect it come up more often in online appeals-courts decisions, and in trial reporting. I also can't find any literature on this, which is odd if it's a useful way to narrow down time of death. So based on the "evidence of absence", let me propose the following hypothesis:

Vacant Duodenum Hypothesis: "An empty duodenum is not, by itself, definitive proof for or against any time-of-death. The main reason to search the duodenum is in hopes of actually finding food there; no matter what the time-of-death scenario, there is always at least a 1/10 chance that the duodenum will be empty when examined."

If you can find a reference to support the idea that a lag time in excess of four or even three hours would not be highly unusual for a small-to-moderate pizza meal eaten by a healthy adult human, I will update appropriately.

So far, we don't have data either way about lag times (not median) for a pizza, nor how a follow-up snack affects it. BTW do you know something I don't about the size of the pizza?

Conversely, if you can't (and I haven't been able to so far), I don't see how you can derive the level of uncertainty you need to make the Massei theory plausible in the face of all the other data. Even acknowledging the wide variation in lag times depending on the type of meal used in the studies, they are all on the short end; there is no indication, anywhere (that I have come across), of the kind of extremes that we would need at the long end.

So far there's no indication of >180 or even >120 either, right? Is the main point of disagreement that if you see the numbers:

10, 25, 23, 82, 48

and if a genie tells you the next number is above 150, then you're saying "it's almost certainly between 150 and 180!" and I'm saying "these numbers are all over the place, it's more likely to be near 150 than near 300, but there's a signficant chance it's a lot bigger than 150."

I'm confused about the notation T(50): does this refer to half-time, or total emptying time? Because the 317 minutes for fried pasta was total emptying time.

My bad, I misread the abstract. Doesn't significantly change the scenarios though.

Unfortunately, one of the many scandals of this case is that the body temperature measurement was delayed until 12 hours after the discovery of the body, limiting its usefulness.

So are you claiming that Meredith's weight before losing blood was 57kg, or just pointing out that a weight of 50-55 kg only shifts us by about 10:1?

Replies from: komponisto

↑ comment by komponisto · 2011-08-13T22:53:32.005Z · LW(p) · GW(p)

On the other hand, if we do take systemic uncertainty into account (as we ultimately must), a shift of 15:1 or even 5:1 would be significant, given your estimate of .95 probability of guilt, or 19:1 odds.

The problem is that systemic uncertainty works both ways.

So to be absolutely clear, then: taking into account all the information you are aware of, and adjusting for systematic uncertainty, what are your current probabilities of guilt conditioned on death having occurred during the following intervals?:

(1) 21:00 - 21:30 (2) 21:30 - 22:00 (3) 22:00 - 23:00 (4) 23:00 - 23:30 (5) after 23:30

(Be sure to check for consistency with your probability distribution for time-of-death and your overall probability of guilt.)

To look at it another way, I expect that if we examine ten pieces of evidence as to whether the Earth is flat, on average one of the pieces can easily point to the Earth being flat at a 10:1 ratio by chance. You would need to either have a much stronger piece of evidence among the first ten pieces, or else have more than one of the pieces point to the Earth being flat, to show that something is true given the first ten pieces of evidence.

That sounds like a point about priors, rather than systemic uncertainty. What I want to know is the following: if I could show that the time of death was before 21:30, or before 22:00 (etc.), how far would that reduce your current guilt-probability of 95%? (Obviously, if the answer is "negligibly", then there isn't any point in discussing gastric lag time.)

I'll add that a search for '"empty duodenum" forensics' suggests to me that, as far as I can tell, almost nobody except for Amanda Knox's defense has ever cared whether a duodenum was empty or not.

On the contrary, see here for example. (By the way, it's actually Sollecito's defense; the matter is not discussed in Knox's appeal document.)

The literature often emphasizes that gastric contents are of limited reliability in determining time of death. However, there is a specific circumstance in this case that make it atypically informative: the fact that the duodenum was completely empty, which by default implies that the entire meal was still in the stomach (modulo slippage issue discussed below). This puts a tighter bound on the time of death than in a more typical situation with some smaller fraction of the meal in the stomach.

Vacant Duodenum Hypothesis: "An empty duodenum is not, by itself, definitive proof for or against any time-of-death. The main reason to search the duodenum is in hopes of actually finding food there; no matter what the time-of-death scenario, there is always at least a 1/10 chance that the duodenum will be empty when examined."

I'm not sure how to make sense of this. What matters here is not the emptiness of the duodenum by itself, but rather the conjunction of the empty duodenum with the non-empty stomach. In other words, the phase of digestion -- which is clearly time-dependent, with some phases carrying more information about time than others. See for instance the above-cited textbook, which observes as follows:

At autopsy, if 50% of the volume of the last meal is found in the stomach, the last food intake was about 3-4 hours prior to death, with 98% confidence limits not shorter than 1 and not greater than 10 hours.

When 90% of the last meal is found in the stomach, the last food intake was probably within the hour prior to death, with 98% confidence limits not more than 3-4 hours.

If only 30% of the last meal is found, the last food intake was around 4-5 hours previous to death, with 98% confidence limits not shorter than 1-2 and not longer than 10-11 hours prior to death

In the situation at hand, we have 100% of the last meal in the stomach, as revealed by the empty duodenum. This places us in the second bullet, except with even stronger bounds and higher confidence. (And note, by the way, that 3-4 hours is an upper bound on the 98% confidence interval, not the confidence interval itself. I claim that the 98% confidence interval in this case should actually be more like 2.5 hours.)

[comment split due to length]

Replies from: rolf_nelson, komponisto

↑ comment by rolf_nelson · 2011-09-10T00:55:06.158Z · LW(p) · GW(p)

So to be absolutely clear, then: taking into account all the information you are aware of, and adjusting for systematic uncertainty, what are your current probabilities of guilt conditioned on death having occurred during the following intervals?:

.95 for all the scenarios mentioned, maybe a little less for the 21:00-21:30.

On the contrary, see here for example.

Good find, and it slightly bolsters the case against Knox: contents don't pass into the duodenum after death (which I expected), and other unspecified parts of digestion continue after death (which I would have bet against). This information slightly increases the probability that the duodenum can empty after death through digestion processes, in which case the duodenum would remain empty no matter what state the stomach is in.

The literature often emphasizes that gastric contents are of limited reliability in determining time of death. However, there is a specific circumstance in this case that make it atypically informative: the fact that the duodenum was completely empty, which by default implies that the entire meal was still in the stomach (modulo slippage issue discussed below)

(snip)

In the situation at hand, we have 100% of the last meal in the stomach, as revealed by the empty duodenum.

But that's exactly one of the points I'm not confident of. Also even if there is 100% of the meal in the stomach, I still don't agree that analysis can exclude 21:00-21:30 but include 21:30-22:00 to any large degree of confidence. A model should be robust in its conclusions for us to have confidence in the conclusions; if small, reasonable changes to the model change the conclusions, then we have to limit our level of confidence and weight it with or against corraborating information from elsewhere.

↑ comment by komponisto · 2011-08-13T22:54:14.013Z · LW(p) · GW(p)

[comment split due to length]

Now, to the slippage issue:

How much slippage do you think may have occurred?

There's a ton of factors here, I'll guess that if there's slippage, it's about 50% that the entire contents would slip; probably our digestion process is evolutionary designed to make the food pass through easily by that stage. Probably another 50% that a suspiciously large amount of food would be found in the small intenstine.

As it happens, in the present case, the only material found in the small intestine was at the very end, near the ileocecal valve. At least, that is the implication of the wording of Ronchi's speculation (combined with the absence of any mention by Massei and Cristiani of material nearer to the duodenum):

Prof. Umani Ronchi, at the hearings of 04-19-2008 and 9-19-2009, never discussed "an imperfect application of the ligatures" at the duodenal level, but rather the [supposed] failure to ligature the duodenum on the part of Dr. Lalli during the autopsy (p. 23, hearing of 9-19-2009: "given that the ligatures were not applied, given that without the ligatures this sliding toward the bottom can happen, and that an amount of food that had maybe already passed into the duodenum, could even have, due to gravity, could have gotten all the way to the ileocecal valve.")

The missing ligature, in fact, allowed Prof. Ronchi to conclude that the gastric contents, at least in part, had slipped in the duodenum or that the contexts, having already passed into the duodenum, could have slid due to gravity all the way to the ileocecal valve after traveling 5 meters of small intestine. From this, the Court deduced that the autopsy finding regarding the objective fact that the duodenum was empty was unreliable.

(Sollecito appeal, p. 165)

Now, if your mistrust of the defense is sufficiently high, perhaps you're not willing to draw the same inference I have from this passage. However, I'm still interested in the impact it would have on your probability estimates if it were true. Suppose for the sake of argument that there wasn't anything in the small intestine, save a small amount at the ileocecal valve. How would that affect your beliefs? Are you willing to acknowledge a significant dependence of your opinion on the presence of material in "earlier" parts of the small intestine?

Apart from this, another thing this passage implies is that Ronchi's speculation about slippage was confined to the possibility of it having occurred at the autopsy, with the intestines uncoiled, in a situation where ligatures had not been applied (which we know to be contrary to the actual situation). He wasn't suggesting, in other words, that there may have been slippage due to the body having been moved by the killer(s). And if in fact the only material in the small intestine was at the ileocecal valve, then it is very unlikely indeed that material could have slipped through 5 meters of coiled intestine inside the victim's body, as the slippage hypothesis would in that case require.

So far, we don't have data either way about lag times (not median) for a pizza, nor how a follow-up snack affects it.

But we do have data for other situations, and those data are what my prior is based on. What's your prior, and why is it better?

Incidentally, I was able to obtain a copy of the Hellmig et al. paper. Here is the study protocol:

For preparation of the solid test meal, 100 mg of 13C-octanoate was dissolved in an egg. After addition of 50 mL of low-fat milk, the egg was scrambled and fried in a pan. The solid test meal was completed by a piece of brown bread (50 g) and butter (20 g). After an overnight fast a breath sample was collected to define the basic value before the test meal was administered within 10 min. Breath samples were collected every 15 min for the first 120 min, then at 150, 180, 210 and 240 min after ingestion. Patients were again instructed not to drink, eat, smoke or exercise during the test.

So far there's no indication of >180 or even >120 either, right?

The range in the Hellmig et al. study was 29-203 minutes.

Is the main point of disagreement that if you see the numbers:

10, 25, 23, 82, 48

and if a genie tells you the next number is above 150, then you're saying "it's almost certainly between 150 and 180!" and I'm saying "these numbers are all over the place, it's more likely to be near 150 than near 300, but there's a signficant chance it's a lot bigger than 150."

Obviously, it depends crucially on what we know about the process that generated the numbers. Here we're talking about the duration of a physiological process, which is likely to be distributed approximately normally modulo specific pathological conditions. Of those numbers, the most relevant is the 82 (due to the use of a larger test meal with mixed food groups, and its taking place after the phenomenon described below was discovered).

Beyond differences in the test meal, the shorter times may be accounted for by a phenomenon known as "interdigestive duodenogastric reflux", which is a "sieving" process involving the shuffling of food back and forth between the stomach and duodenum, that takes place during the lag phase. This phenomenon was not known when some of the earlier studies were published, and so there is a significant possibility that those studies detected duodenal activity that the investigators mistook for the end of the lag phase. (HT to LondonJohn at JREF for this observation.)

But furthermore, we also have to reason about the hypothetical sequences of numbers that we didn't see. If the numbers had been

110, 125, 123, 182, 148

to say nothing of

100, 250, 230, 820, 480

-- or even if the studies consistently had extreme data points in the range of 300, regardless of their averages -- then the Massei-Cristiani theory would be in significantly better shape.

So are you claiming that Meredith's weight before losing blood was 57kg, or just pointing out that a weight of 50-55 kg only shifts us by about 10:1?

I was actually pointing out that an earlier temperature measurement would probably have permitted a narrower confidence interval.

But, since you mention it, 50-55 kg was just Lalli's eyeballed guess; the body was not actually weighed. Standard formulas predict 57-60 kg from Meredith's age, sex and height.

Replies from: rolf_nelson

↑ comment by rolf_nelson · 2011-09-10T01:06:54.707Z · LW(p) · GW(p)

As it happens, in the present case, the only material found in the small intestine was at the very end, near the ileocecal valve.

I agree, but I don't know whether other material would have been found if present. Is searching the entire small intestine feasible, and if so was such a search performed? Would food in the middle of the small intenstine after death have continued to digest?

Prof. Umani Ronchi, at the hearings of 04-19-2008 and 9-19-2009, never discussed "an imperfect application of the ligatures" at the duodenal level, but rather the [supposed] failure to ligature the duodenum on the part of Dr. Lalli during the autopsy (p. 23, hearing of 9-19-2009: "given that the ligatures were not applied, given that without the ligatures this sliding toward the bottom can happen, and that an amount of food that had maybe already passed into the duodenum, could even have, due to gravity, could have gotten all the way to the ileocecal valve.")

Right, like the court, I understand that Umani Ronchi was incorrect about the ligatures.

The missing ligature, in fact, allowed Prof. Ronchi to conclude that the gastric contents, at least in part, had slipped in the duodenum or that the contexts, having already passed into the duodenum, could have slid due to gravity all the way to the ileocecal valve after traveling 5 meters of small intestine. From this, the Court deduced that the autopsy finding regarding the objective fact that the duodenum was empty was unreliable.

From Massei, it appears that Umani Ronchi didn't "conclude that the gastric contents had slipped"; he concluded instead that either the gastric contents might have slipped, or the stomach had not emptied: "He [Umani Ronchi] could not, however, say whether it [the stomach] had partially emptied" (147) and still gave an overall TOD of 20:50 and 4:50. Thus, logically Umani Ronchi didn't find a TOD of 20:50+ as proving that the stomach has partially emptied. Of course, you can speculate that Umani Ronchi might have been simply being illogical, but to the degree we trust him as the court-appointed expert, we should weigh his conclusion appropriately.

That said, he obviously did make a mistake for some unexplored reason in concluding the ligatures were absent; and I agree we should lower the estimation of his overall reliability, remembering of course to similarly lower the reliability of experts who you do like every time they make a mistake.

Now, if your mistrust of the defense is sufficiently high, perhaps you're not willing to draw the same inference I have from this passage.

I do trust the defense; I trust them to not unethically stab their client in the back by drawing attention to any inconvenient pro-prosecution facts in their defense appeal document. Pointing out pro-prosecution facts is the prosecution's and court's job, not the defense's, even in inquisitorial systems, and anyway the defense's checks are signed by the defendant. That said, where the defense appeal document contains a direct quote, then I'd agree that's pretty reliable.

Are you willing to acknowledge a significant dependence of your opinion on the presence of material in "earlier" parts of the small intestine?

I don't think it's a question of significance, it's more that we're dealing with a conjunction: that stomach emptying had to have started, that the full small intenstines were searched, and that post-emptying digestion processes would not have emptied the earlier parts of the small intestine. If we can establish that material isn't present, and that digestion wouldn't have destroyed the evidence, and that stomach emptying had to have started, then that would establish the TOD you want (even if slippage can't be ruled out), but so far I can't agree by more than an order of magnitude that any of those three are true. Given that I didn't even know until today that digestion processes continue after death, the odds that I'm going to become more confident than that without reliable sources are pretty small.

Apart from this, another thing this passage implies is that Ronchi's speculation about slippage was confined to the possibility of it having occurred at the autopsy, with the intestines uncoiled, in a situation where ligatures had not been applied (which we know to be contrary to the actual situation).

Yeah, I'll have to again pass on giving weight to a defense appeal document's spin about what must have been going through the mind of a court expert for them to be able to say such incriminating-sounding things against their client. It's the defense's job to interpret all testimony in as positive a light for the client as possible.

But we do have data for other situations, and those data are what my prior is based on. What's your prior, and why is it better?

My prior is much vaguer, and reflects my believing I don't have enough knowledge to have a more narrow prior. I don't have an answer to "why it's better", it's the one my brain came up with, and I don't have anyone else's brain handy to think with.

Incidentally, I was able to obtain a copy of the Hellmig et al. paper. Here is the study protocol:

For preparation of the solid test meal, 100 mg of 13C-octanoate was dissolved in an egg. After addition of 50 mL of low-fat milk, the egg was scrambled and fried in a pan. The solid test meal was completed by a piece of brown bread (50 g) and butter (20 g).

That's a bit unexpected to me, that looks less than 300 Calories! I would have expected large lag times to be associated with a large meal.

After an overnight fast a breath sample was collected to define the basic value before the test meal was administered within 10 min. Breath samples were collected every 15 min for the first 120 min, then at 150, 180, 210 and 240 min after ingestion. Patients were again instructed not to drink, eat, smoke or exercise during the test.

Right, so that tends to confirm that there's no exercise or drinking, and they fast before the test. We already agreed drinking probably doesn't have a huge effect, but Meredith didn't fast and she might have gotten some physical activity in.

The range in the Hellmig et al. study was 29-203 minutes.

I assume you mean lag time?

Beyond differences in the test meal, the shorter times may be accounted for by a phenomenon known as "interdigestive duodenogastric reflux", which is a "sieving" process involving the shuffling of food back and forth between the stomach and duodenum, that takes place during the lag phase. This phenomenon was not known when some of the earlier studies were published, and so there is a significant possibility that those studies detected duodenal activity that the investigators mistook for the end of the lag phase. (HT to LondonJohn at JREF for this observation.)

Doesn't that paradoxically decrease your confidence that you know everything that's going on with digestive processes and can accurately predict TOD?

-- or even if the studies consistently had extreme data points in the range of 300, regardless of their averages -- then the Massei-Cristiani theory would be in significantly better shape.

I agree, the TOD theory would be in even better shape if that were the case.

Standard formulas predict 57-60 kg from Meredith's age, sex and height.

Different "standard" formulas give different results. Also, the standard deviation of weight based on a/s/h has to be considered. I'll go with the estimate of the expert who actually saw the body and what her build was, and consider it unlikely that Lalli's first estimate of weight would be off by more than 15 pounds.

↑ comment by rolf_nelson · 2011-08-04T08:23:48.423Z · LW(p) · GW(p)

Surely you meant the defense appeal document here?

Yes, typo.

My interpretation of Ronchi doesn't depend on the defense appeal; it's simply the common-sense default meaning of what he said, as reported in Massei-Cristiani...

I don't agree with your common-sense default meaning in the English translation, then, although of course the original Italian may be more enlightening.

...and confirmed by general information about average gastric emptying times.

That reasoning seems circular to me: the question of what the times are in this case, is exactly what I'm trying to determine here.

But even if it did, the appeal documents constitute the defense's reply to the Massei-Cristiani report, and so I don't see why they are any less useful than the latter. They rely on the same records that Massei and Cristiani do.

I judge court findings to be much more reliable than claims of the defense attorneys because:

The defense attorneys are chosen and paid for by the defense
Defense attorneys are ethically obligated to assist the defense, while the court is ethically obligated to neutrally examine the case
Court bias can result in a mistrial being declared; defense attorney bias (toward the defense), in contrast, is considered acceptable or even mandatory
If the defense is found to wield misinformation to successfully free a guilty client, they'll gain prestige and be more likely to be hired for more money in the future. If a court wields misinformation, on the other hand, it will be more likely to have negative rather than positive consequences for the court
Empirically, defense attorneys always side with the defense; I can't think of a case where the defense attorney summed up to the jury with "You know what? I'm convinced, my client is guilty after all."
Though I shouldn't weigh it too highly, a subjective sense that even if the defendants are innocent, this particular defense team has lost credibility, for example with Pasquali's testimony.

Replies from: komponisto

↑ comment by komponisto · 2011-08-05T14:25:55.652Z · LW(p) · GW(p)

My interpretation of Ronchi doesn't depend on the defense appeal; it's simply the common-sense default meaning of what he said, as reported in Massei-Cristiani...

I don't agree with your common-sense default meaning in the English translation, then, although of course the original Italian may be more enlightening.

The term used (in Massei-Cristiani) is svuotamento gastrico, which is a pretty literal counterpart of "gastric emptying". If someone says "X takes Y hours to empty", I view it as the default assumption that they are talking about the time it takes to empty completely (not to start emptying or empty halfway). Do you have a different view?

...and confirmed by general information about average gastric emptying times.

That reasoning seems circular to me: the question of what the times are in this case, is exactly what I'm trying to determine here.

Ronchi is reported by Massei and Cristiani as having said that "gastric emptying" can sometimes take 6-7 hours; we want to know whether he meant total emptying, or just half-emptying (or something else). So I searched and found a reference stating that total emptying (with the meaning unambiguous, explicitly contrasted to other parameters) typically takes 4-5 hours, in contrast to half-emptying, which typically takes 2.5-3 hours. For me this increases the (already high) probability that total emptying, and not half-emptying, was what was meant.

I judge court findings to be much more reliable than claims of the defense attorneys

Here are some factors that you may not be adequately considering:

1. Argument screens off authority. Whatever the appropriate default assumptions about reliability may be, both the court and the defense have explained their arguments in detail (in lengthy documents that are publicly avaliable), while being in a position to know what each other's arguments are. There is unlikely to be much important information not contained in these documents. In particular, if the prosecution case is correct and the defense case isn't, we should expect to be able to determine this from the court's opinion and the appeal documents (without requiring further "rebuttal" from the court), since the court will have heard the defense case already, and should be anticipating the strongest possible defense reply; we should in other words expect to perceive the defense appeals as substantively weak, while perhaps demonstrating legal cleverness. If instead we perceive them as strong, we should regard that as significant evidence in favor of the defense.

2. The argument about "neutrality" could equally well be applied to the prosecution side as much as the court, since the prosecutors (who in Italy supervise the police investigation) are ethically obligated to conduct the investigation in a neutral manner, and to charge only suspects whose culpability is rationally indicated by the evidence. Hence there shouldn't be much difference in reliability between the prosecution and a court which has decided in favor of guilt; yet I presume you wouldn't regard the prosecution as sufficiently reliable to not bother listening to defense arguments. (See also 4. below: in continental European "inquisitorial" systems, judges and prosecutors are traditionally regarded as belonging to the same job category.)

3. People change their minds less often than they think; and the judges are particularly unlikely to revise their opinion during the 90-day interval between the time the verdict is announced and the motivation document is submitted. (They presumably aren't even legally allowed to change the verdict, since the rest of the jury is no longer participating.) Hence the latter is guaranteed to be the work of people trying to defend a decision they've already publicly committed to.

4. Cultural assumptions about how the legal process works (and what is considered acceptable behavior for attorneys and judges) do not necessarily transfer to a foreign country. For example, it's not clear that the Italian system has the concept of a "mistrial" in the sense that you refer to. What it does have is second-level ("appeal") courts which regularly modify or overturn first-level rulings, for various reasons (at a much higher rate than in the American system). My suspicion is that the closest analogue of "a mistrial being declared" is simply the appeal court reversing the first-level verdict -- which is precisely what Knox and Sollecito are currently seeking to have done. So the inference you're wanting to make about the first court's level of even-handedness may not be valid, due to a possible difference in the error-correction mechanism. (Relatedly, a first-level finding of guilt in Italy does not have the same significance as "conviction" in the U.S., but is rather somewhere between indictment and conviction.)

5. Ignoring my own advice above, I could invoke the American assumption that defense attorneys are ethically (and/or legally) obligated not to mislead the court, particularly in official written filings.

Empirically, defense attorneys always side with the defense; I can't think of a case where the defense attorney summed up to the jury with "You know what? I'm convinced, my client is guilty after all."

I expect defense attorneys to make different kinds of arguments when they think their client is guilty than when they think their client is innocent. Don't you?

Though I shouldn't weigh it too highly, a subjective sense that even if the defendants are innocent, this particular defense team has lost credibility, for example with Pasquali's testimony.

We may want to discuss that at some point, then, because I find Pasquali's testimony very compelling (particularly the experiments he conducted).

Of course, I have an analogous sense with regard to Massei and Cristiani, who lose credibility in my view by assigning credence to people like Curatolo and Quintavalle (and indeed by not paying attention to Pasquali and his results).

Experiment: Knox case debate with Rolf Nelson

Contents

68 comments