Linkpost: Francesca v Harvard

post by Linch · 2023-12-17T06:18:05.883Z · LW · GW · 5 comments

This is a link post for https://www.francesca-v-harvard.org/

Contents

  Studies like this one often involve multiple dataset files.
None
5 comments

Just when I thought the Gino case came to a close, I started reading Francesca Gino's defense. It paints her as a belabored academic unfairly penalized by misleading and incompetent investigators, as well as a biased New Yorker reporter.

I'm reposting this here as I think it's useful to see both sides of every case, in an interesting parallel to recent, more local, events.

_____

Breaking My Silence

I have been sitting in near silence for over three months, since learning that HBS was putting me on unpaid administrative leave, and banning me from campus, teaching and research. It has been shattering to watch my career being decimated and my reputation completely destroyed. It has been hard to see how this situation impacted those around me - my family, my mentors, my collaborators and my students.

The information that has been available to the public, and the analysis posted by critics, may sound compelling. But the information is incomplete and misleading. The record needs to be corrected. This website is my attempt to do so.

Let that correction begin with this simple and unambiguous statement: I absolutely did not commit academic fraud.

____

 

In an earlier post, I refuted Data Colada’s critique of the PNAS paper by demonstrating that:

Data Colada cherry-picked the data it chose to include in its analysis,

Data Colada excluded a third condition in the study,

Data Colada excluded 2 of the 3 dependent variables in its analysis, and

Data Colada completely misrepresented the Excel calcChain function to buttress their claims.

Together, this allowed Data Colada to create an enormously misleading impression about my work. Importantly, I also demonstrated that if one were to exclude all of the data observations that Data Colada claimed should have been considered “suspicious,” the findings of the study still hold, suggesting there was no motivation for me to manipulate data.

In a more recent post, Data Colada doubled down on their claims, arguing that HBS’ investigation found additional evidence of data manipulation in this particular study.

What is this additional evidence?

In a nutshell, HBS claimed the following:

HBS claimed that it was able to obtain the original (unpublished) dataset for the study in question.

HBS claimed that, with the help of a forensics firm, it was able to compare this original dataset to the dataset published on OSF.

HBS claimed that this comparison yielded significant discrepancies.

HBS claimed that these discrepancies were evidence of fraud on my part.

Now that I’ve been able to enlist the help of my own forensics team, I have been able to conduct my own investigation into this matter. In the analysis below, I offer my findings and refute each of the claims put forth by HBS.

[...]

Studies like this one often involve multiple dataset files.


It took a while for my team and me to unpack the evidence in this case, but our unequivocal conclusion is that HBS used the wrong dataset in its investigation. They used the July 16 HBS version, when they should have the July 16 OG version.

Before I explain how we came to this conclusion, let me note that I do not believe Maidstone is to blame here. On the contrary—and this is a significant detail—Maidstone repeatedly expressed their lack of certainty that the July 16 HBS version was the dataset they should be relying on for their analysis. In their report, they repeatedly included caveats about this, noting that there was reason to believe there might be additional datasets they were not privy to, and it was therefore possible they were looking at the wrong file. Maidstone also explicitly noted that it was relying on “the Client’s description of provenance” (the client being HBS) in its analysis, in a clear attempt to distance itself from the possibility that the July 16 HBS version was not the correct dataset to be using.

As I said, I believe the July 16 OG file is the version that HBS should have used in its investigation. This file has the following characteristics: 

It contains all of data from the July 13th version, except for some data that appears to have been discarded or corrected.

It contains some additional data that that is not included in the July 13th version.

It is completely consistent with the data posted on OSF, with few exceptions (analysis entries with some summary statistics). 

The following sections explore each of these characteristics.

____

5 comments

Comments sorted by top scores.

comment by df fd (df-fd) · 2023-12-17T13:36:50.318Z · LW(p) · GW(p)

I am confused.

I have not read much of this rebuttal and I am not academically inclined but just reading the first part of this

https://www.francesca-v-harvard.org/data-colada-post-1

 

Correct me if I am wrong but Francesca is complaining that of all the duplicate and out of order ID, Data Colada is not listing all of them?

Francesca is also saying that Data Colada only picking on one variable that is suspicious and not talking about the other [?non suspicious] variable? Correct  if I am wrong but isn't this is just banana? obviously Data Colada would not talk about normal data.

 

Can someone with more familiarity with these things and have time to spare can read it and tell me if Francesca rebuttal make sense?

Replies from: korin43
comment by Brendan Long (korin43) · 2023-12-17T21:21:33.079Z · LW(p) · GW(p)

ACX shared these rebuttals that explain why Gino's defence doesn't make sense:

https://fashionalexpectations.substack.com/p/ginormous-coincidences https://twitter.com/JohnHBillings/status/1708187948208857363

I don't think these covers the calcChain part, which I'm now convinced is less damning than everything else, but it is still additional evidence (either the rows were swapped manually, or Excel just happened to recalculate the out-of-order rows in the most suspicious possible way).

comment by Rebecca (bec-hawk) · 2023-12-18T18:49:41.497Z · LW(p) · GW(p)

Does anyone know why it it Francesca vs Harvard and not Gino v Harvard?

Replies from: Linch
comment by Linch · 2023-12-19T01:58:43.714Z · LW(p) · GW(p)

My guess is that it's because "Francesca" sounds more sympathetic as a name.

Replies from: gwern
comment by gwern · 2023-12-19T15:35:47.150Z · LW(p) · GW(p)

Yes. 'Gino' (rather than 'Gina') is a guy name, while 'Francesca' is a woman's name. This incorrect framing (the correct legal use of 'X v Y' would of course be to use her inconveniently-male-sounding surname) is a cheap but useful PR trick for her, so she's not going to miss it, and this framing is part of her overall defense: that she's being persecuted out of misogyny and she is the real victim here.