Transformative trustbuilding via advancements in decentralized lie detection

post by trevor (TrevorWiesinger) · 2024-03-16T05:56:21.926Z · LW · GW · 7 comments

This is a link post for https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680134/

Contents

  Abstract
  Introduction
  The Development of Lie Detection Technology
  The Basic Science
    a. BOLD fMRI and Principle of Cognitive Subtraction
    b. Experiment Design
    c. Early Conclusions and Within-Subject Accuracy
  Reactions to the Early Scientific Discoveries
  fMRI Lie Detection in Court
  The Legal Implications of Semrau and Wilson
  The Current State of Scientific Concerns: What Needs to Be Done
  The Current State of Legal Concerns: What Issues Remain Unresolved?
  The Policy Analysis
  Conclusion
None
7 comments

Although the emergence of functional lie detection would be an obvious total paradigm shift for the entire court system, the author didn’t seem to realize that this is also an obvious total paradigm shift for much bigger things, e.g. hiring, high-trust friend groups, an immune system for deceptively aligned humans, discouraging harmful behavior, nihilistic profiteering, and/or excessively extreme self interest. 

Although centralization of power is a classic concern, broader open decentralized access could easily facilitate a new era of high-trust social dynamics within and between smaller groups.

So bear in mind, whenever the author says this is about courts, you’re allowed to think about whatever other use cases might come to mind, not just courts. This is about trust, cooperation, taken to an extreme level that we have never seen before on Earth, and plausibly can’t imagine until it happens. What kind of questions could you ask someone (or have them ask you) if you wanted to honestly build trust and collaborate? What problems in the world would vanish, perhaps preemptively? Probably a large majority.

We might even already be deep into a technological overhang, where transformation will materialize after a few technical breakthroughs, or perhaps just any effort at all towards applied research by serious, thoughtful, and pragmatic individuals. The current reputation of lie detectors not working basically revolves around the fact that they were first invented ~a century ago (hence the name "polygraph"; it was the coolest thing you could possibly do at the time with multiple graphs). So it's not particularly surprising that our civilization remained mediocre, in the context of such incredible capabilities existing but being so amazing that they were pursued way too early.

This paper (Using Brain Imaging for Lie Detection: Where Science, Law and Research Policy Combine) was published in 2012 and the world has changed a lot since then (sadly, with no room-temperature superconductors, which would facilitate large-scale deployment of fMRI hats). Since this is a snapshot of the state of brain research in a specific point in time (albeit a truly, truly excellent snapshot), I’ve sprinkled reminders throughout that this is from 2012. 

Abstract

Progress in the use of functional magnetic resonance imaging (fMRI) of the brain to evaluate deception and differentiate lying from truth-telling has created anticipation of a breakthrough in the search for technology-based methods of lie detection. In the last few years, litigants have attempted to introduce fMRI lie detection evidence in courts. This article weighs in on the interdisciplinary debate about the admissibility of such evidence, identifying the missing pieces of the scientific puzzle that need to be completed if fMRI-based lie detection is to meet the standards of either legal reliability or general acceptance.

We believe that the Daubert’s “known error rate” is the key concept linking the legal and scientific standards. We posit that properly-controlled clinical trials are the most convincing means to determine the error rates of fMRI-based lie detection and confirm or disprove the relevance of the promising laboratory research on this topic.

Reminder: This paper was published in 2012

This article explains the current state of the science and provides an analysis of the case law in which litigants have sought to introduce fMRI lie detection. Analyzing the myriad issues related to fMRI lie detection, the article identifies the key limitations of the current neuroimaging of deception science as expert evidence and explores the problems that arise from using scientific evidence before it is proven scientifically valid and reliable. We suggest that courts continue excluding fMRI lie detection evidence until this potentially useful form of forensic science meets the scientific standards currently required for adoption of a medical test or device.

Given a multitude of stakeholders and, the charged and controversial nature and the potential societal impact of this technology, goodwill and collaboration of several government agencies may be required to sponsor impartial and comprehensive clinical trials that will guide the development of forensic fMRI technology.

Reminder: This paper was published in 2012

Introduction

Recent progress in the use of functional magnetic resonance imaging (fMRI) of the brain to evaluate deception and differentiate lying and truth-telling has created anticipation of a breakthrough in the search for technology-based methods of lie detection. Attempts by commercial entities to introduce fMRI lie detection evidence in courts have prompted commentary and criticism on both ethical and scientific grounds without a corresponding generation of new research data to address such concerns.

Major unanswered questions include the sensitivity of the new technology to countermeasures, its external validity and accuracy, and the specificity of the observed fMRI patterns to deception. Our review suggests that while these are important, the critical knot of law and science that must be untangled to permit further translational progress is the determination of the “error rates,” of the technology as defined by the Daubert criteria of admissibility. This determination includes not only the accuracy of tests within each subject, but also their predictive power in the relevant population.

Reminder: This paper was published in 2012

The article seeks to explain for the interdisciplinary audience the pivotal difference between small-scale experimental research studies and properly controlled clinical trials that are dedicated to confirmation of the proofs of concept in the ecologically valid setting. We emphasize that such trials are critical to evidentiary reliability. Prior to such trials, expert testimony that “a given witness is deceptive in response to a given question” remains a risky and speculative leap from existing data. Given the multidisciplinary nature of the research and the diversity of special interests involved, funding clinical trials of fMRI-based lie detection technology is not a trivial endeavor. In light of its potential importance to society and the fields of law and medicine, we propose a public funding initiative leading to a peer-reviewed translational research program on the brain mechanisms of deception with a special emphasis on multicenter clinical trials of fMRI-based lie detection.

The perils of admitting unproven scientific evidence are well known, a point mentioned in the National Academy of Sciences’ Report, Strengthening Forensic Science in the United States: A Path Forward (2009)(“NRC Report”). The NRC Report criticizes forensic science, questioning “whether—and to what extent—there is any science in any given ‘forensic science’ discipline.” It also finds the judiciary to be “utterly ineffective” in requiring forensic scientists to prove the validity of their methods and the accuracy of their conclusions (National Research Council, 2009). Despite the Report’s scathing critique of many forensic science disciplines, many courts continue to admit it without reservation, despite proof that such evidence has contributed to numerous wrongful convictions. Once admitted, scientific evidence tends to become rooted and difficult to eradicate later and we believe this report should influence the legal community to require the emerging field of forensic neuroimaging, including fMRI-based lie detection, have a proper scientific foundation before being admitted in courts.

Reminder: This paper was published in 2012

Go to:

The Development of Lie Detection Technology

The United States judicial system places great weight in the belief that juries are effective and reliable in determining the credibility of the witness. Yet, behavioral and social research shows that humans are good at lying and quite poor at lie detection (Vrij, 2008). For example, an average person’s ability to detect deception in a face-to-face interaction with another individual is only modestly better than chance (Ekman & O’Sullivan, 1991). Thus, the critical importance of truthful testimony and the inadequacy of human lie detectors have prompted the perennial search for a technology-based objective method of lie detection or truth verification; this search continues today (Grubin, 2010NRC, 2009Stern, 2003).

Reminder: This paper was published in 2012

The polygraph, which measures activity of the peripheral nervous system to gauge truthfulness, has been the primary technical method for lie detection during the last century. Beginning with the Frye v. United States (1923) decision, most United States courts have expressed disapproval of polygraph-based evidence. The United States Supreme Court has noted the lack of consensus on the reliability and admissibility of the polygraph (Scheffer v. United States, 1998), and courts remain largely hostile to its admission into evidence (Faigman, Kaye, Saks, & Sanders, 2010Gallini, 2010). A meta-analysis commissioned by the Department of Defense found the sensitivity and specificity of the polygraph to be 59 and 92% respectively (Crewson, 2001). The National Academy of Sciences report (Stern, 2003) laments the lack of definitive research on the accuracy of the polygraph under various conditions and estimates it to be in the vicinity of 75% and as high as 99% and as low as 55% depending on the setting (i.e. experimental vs. forensic), questioning format, the operator, and response classification rules.

The polygraph is still widely used outside the courtroom in the United States; in particular, as pre-employment and in-employment screening technique for government agencies, such as the Federal Bureau of Investigation. Anecdotal evidence (Senate, 1994) and some retrospective studies led many scholars to believe that the polygraph would perform poorly in this capacity. Due to the relatively low prevalence of the types of misconduct targeted by polygraph examinations among the United States government workers, most of the individuals flagged by the polygraph are likely to be false positives and a substantial proportion of the liars are likely to be missed (Baldessarini, Finklestein, & Arana, 1983Raichle, 2009Wolpe, Foster, & Langleben, 2005).

Reminder: This paper was published in 2012

The more recently developed physiological measures considered to have potential for lie detection are Electroencephalography (EEG) and Functional Magnetic Resonance Imaging (fMRI). Both are established medical technologies developed and widely used for the assessment of brain activity. The EEG dates back to the 1920s (Berger, 1929), while fMRI was first reported in humans in 1992 (Kwong et al., 1992). The two techniques critically differ from the polygraph in that they measure the central (brain) rather than the peripheral (galvanic skin response, heart rate, blood pressure and respiration) correlates of the nervous system activity. EEG-based lie detection was pioneered by J.P. Rosenfeld, (Rosenfeld, Cantwell, Nasman, Wojdac, Ivanov, & Mazzeri, 1988), and has been a topic of sustained research since. fMRI is greatly superior to EEG in its ability to localize the source of the signal in the brain. EEG, on the other hand, is significantly less expensive, more mobile and has a better time resolution than fMRI. The recent progress in the ability of fMRI to reliably measure and localize the activity of the central nervous system (CNS) has created the expectation that an fMRI-based system would be superior to both the polygraph and the EEG for lie detection.

Go to:

The Basic Science

The scientific and forensic concerns of fMRI-based lie detection are reviewed in greater detail elsewhere (Langleben, 2008Langleben, Willard, & Moriarty, 2012Spence, 2008) so we provide only a basic overview here.

a. BOLD fMRI and Principle of Cognitive Subtraction

Magnetic resonance imaging (MRI) is a medical imaging technique using high magnetic fields and non-ionizing electromagnetic radiation to produce high-resolution, three-dimensional (3D) tomographic images of the body (Lauterbur, 1973). Functional MRI (fMRI) is distinguished from regular (structural) MRI by the speed of acquisition of each 3D image. In fMRI, serial images of the entire brain are acquired every few seconds, which is fast enough to observe changes in the regional blood volume and flow that are associated with cognitive activity.

Reminder: This paper was published in 2012

Blood-oxygenation-level dependent (BOLD) imaging is presently the fMRI technique most commonly used in cognitive neuroscience (Kwong, et al., 1992). BOLD relies on the difference in the magnetic properties of the contents of the blood vessels and the surrounding brain tissue as well as the magnetic difference between oxygenated and deoxygenated hemoglobin (Gjedde, 2001). BOLD fMRI does not depict absolute regional brain activity; rather, it indicates relative changes in regional activity over time. To make inferences about the nature of the regional brain activity, BOLD fMRI task designs rely on a principle of “cognitive subtraction” (Aguirre & D’Esposito, 1999). This principle assumes that the fMRI signal difference between two behavioral conditions that are identical in all but a single variable, is due to this variable. Therefore, a proper comparison (i.e. control) condition is critical for meaningful BOLD fMRI data (Gjedde, 2001). The fMRI activation maps reported in the literature usually represent a statistical subtraction between the fMRI activity maps related to the target and control variables (Owen, Epstein, & Johnsrude, 2001). It follows that the selection of comparison conditions is essential to a meaningful experimental fMRI paradigm. Ideally, the comparison and target conditions would be identical except for a single factor of interest. For example, statistically comparing the fMRI signal acquired when looking at a random sequence of white and black squares of the same size, would yield the difference between brain processing of the colors white and black (Owen, et al., 2001). In an fMRI deception experiment, questions that could invoke a lie or truth could be substituted for the two types of squares, but the same principle applies.

b. Experiment Design

fMRI deception experiment testing requires critical parameters, some of which are unique to fMRI and others that have been developed in basic psychological and polygraph research (Miller, 1993). The scenario of a deception task refers to the hypothetical setting in which experimental deception takes place. For example, some experiments involve participants in a mock crime situation and then question them about it (Kozel et al., 2005). Others probe participants about autobiographical information of different levels of intimacy (Abe, et al., 2009Spence et al., 2001). Finally, experiments that treated emotion, embarrassment and autobiographical memory as confounds rather than variables of interest, used relatively “neutral” scenarios that required concealing possession of a playing card for a monetary reward (Langleben et al., 2002). The task scenario also determines the risk/benefit ratio of the deception experiment. For example, critics of the practical relevance of fMRI deception research argue that the substantially lower risk/benefit ratio of deception using the concealed playing card scenario compared to lying about an actual crime, should lead to significantly different fMRI patterns associated with deception under these two scenarios. This debate can only be resolved by direct experimental manipulation of the risk/benefit ratios of the deception experiments.

Reminder: This paper was published in 2012

The fMRI paradigm refers to the order of the stimuli presentation during an fMRI task (Donaldson & Buckner, 2001). In “event-related” paradigms, fMRI 3D images are acquired for discrete “events,” typically on the time course of one-half to four seconds. Event-related designs have an advantage in their ability to isolate activity in near-immediate response to stimuli and also allow for better stimulus variety and control types (Donaldson & Buckner, 2001). However, because of their low statistical power, they require random repetition of each class (i.e. lie or truth) of stimuli up to a dozen times during an experiment. Moreover, event-related designs require maximizing the magnetic field strength and the signal-to-noise ratio of the MRI scanner. Event-related or hybrid fMRI paradigms are more relevant for deception than other designs, and most of the recent deception experiments have used this approach.

The experimental deception model refers to the method used to generate deceptive responses and the appropriate controls. The two basic deception-generating models are the Comparison Question Test (CQT) and the Guilty Knowledge Task (GKT), also referred to as the Concealed Information Test (CIT). These models are not unique to fMRI research and have been developed for forensic investigative use (Ben-Shakhar, Bar-Hillel, & Kremnitzer, 2002Lykken, 1991Stern, 2003) with the polygraph and later with EEG (Rosenfeld et al., 1988). In the CQT, test-takers answer a series of questions. One subset consists of questions unrelated to the topic of questioning, with the correct response known or presumed known. These questions are selected to be similar to the relevant questions in their attentional quality (e.g. salience) (Raskin & Honts, 2001). The inherent subjectivity of what constitutes comparable salience creates difficulty in adequately controlling these questions, a main criticism for the CQT’s detractors (Ben-Shakhar, 1991).

Reminder: This paper was published in 2012

The GKT or CIT involves a series of questions designed to elicit a fixed uniform response (typically “No”) to multiple items, including a piece of knowledge that a “guilty” subject would seek to conceal. A negative response to such an item would constitute a forced deception that is hypothesized to have higher salience than other items (Lykken, 1991). While not having the control problems of the CQT, the CIT’s reliance on the salience of deception (rather than the deceptive response itself) limits its specificity. CIT is unpopular among polygraph examiners in the United States who believe that obtaining pieces of information known only to a perpetrator is often impractical. However, it is the primary model used by law enforcement in Japan, where polygraph evidence is admissible in court (Ben Shakhar, 2001Nakayama, 2001).

Another parameter of importance to the experimental deception-generating models is whether responding deceptively is being endorsed by the experimenter (Miller, 1993). While in the real world, an individual’s deception would generally be undesirable to its target (a feature known to the deceiver, by definition), in most deception experiments, subjects are given explicit instructions (i.e. endorsement) to lie to some of the questions (Spence, et al., 2001). Such endorsement severely limits the ecological validity of the experiment. Some deception experiments have attempted to enhance ecological validity to introduce intent by allowing the subjects to choose when to lie during the task (Leeet al. 2002). Others have removed the appearance of endorsement of deception by separating the research team member who instructs participants to lie from the rest of the team, thus creating a “co-conspirator” (Langleben et al. 2005).

Reminder: This paper was published in 2012

c. Early Conclusions and Within-Subject Accuracy

Since 2000, academic researchers in several countries have used Blood Oxygenation Level Dependent (BOLD) functional Magnetic Resonance Imaging (fMRI) to study brain activity during experimental deception and malingering (Langleben et al., 2002Lee et al., 2002Spence et al., 2001). These early studies had to pool data from multiple subjects to make their findings. Subsequent improvement in fMRI technology permitted discrimination between an investigator-endorsed lie and truth in healthy individual subjects with an accuracy of over 75% (Davatzikos et al., 2005Kozel, et al., 2005Langleben, et al., 2005). While there remained inconsistencies across the multiple studies, “there has nevertheless emerged a recurrent pattern of findings suggesting that at some point in the future functional neuroimaging may be used to detect deception in situations that have significant legal consequence “ (Spence, 2010).

Simultaneously with the experimental progress, researches recognized and explored the limitations and existing and potential pitfalls related to the possible translation of this technology to clinical use (Kozel, 2005). Wolpe, Foster and Langleben (2005) and Happel (2005) were the first to elaborate that a comprehensive understanding of the new technologies’ error rates requires not only the recently reported within-subject accuracy, but also the positive and negative predictive power of the test, neither of which was known at the time (Hyman, 2010). The latter two parameters combine the inherent accuracy of a test and the expected prevalence of liars in the tested population and are a recognized milestone in the evaluation of clinical tests (Baldessarini, 1983). Their measurement requires large samples. Since 2005, this critical knowledge gap was underscored by several authoritative critics of the technology; however, no progress has been made in filling it.

Reminder: This paper was published in 2012

Go to:

Reactions to the Early Scientific Discoveries

After the initial fMRI studies were completed and published, the Trustees of the University of Pennsylvania and of the University of South Carolina filed separate patent applications for the technology and licensed it to start-up firms, Cephos and No Lie MRI. Articles in the New York Times and other publications quickly piqued the public’s interest in the forensic use of fMRI technology to detect deception (Marantz Henig, 2006Talbot, 2007;).

Legal and ethical scholars also began to weigh in on the fMRI lie detection (Greely & Illes, 2007Moriarty, 2008). Criticism included the obvious technical knowledge gaps that needed to be addressed and the potential societal risks and benefits of improving lie detectors and deception research (Wolpe, Foster, & Langleben, 2005), constitutional implications (Fox 2009; Halliburton, 2009; Pardo, 2006) and privacy concerns (Greely, 2006Happel, 2005Thompson, 2005). Others suggested that while validation studies were necessary for translation of the fMRI lie detection into forensic practice, such studies were ethically and methodologically challenging. (Halber, 2007Kanwisher, 2009). Halber argued that the accuracy rates of 80 to 90%, as reported in laboratory experiments, proved the method was inadequate for field applications (Halber, 2007).

Reminder: This paper was published in 2012

Some suggested outright regulation (Canli et al., 2007Greely & Illes, 2007). Tovino suggested banning fMRI veracity testing outside of clinical and research use until it was determined to be highly effective (Tovino, 2007), and another urged courts to self-impose a moratorium period to sort through the myriad scientific and jurisprudential issues at stake (Moriarty, 2009).

France, however, has taken the controversial step of banning commercial use of brain imaging but permitting its use in court. A new law, passed in 2011, provides that “[b]rain-imaging methods can be used only for medical or scientific research purposes or in the context of court expertise” (Oullier, 2012). According to a recent article published in Nature, none of the neuroscientists consulted during the drafting process encouraged the courtroom use of neuroimages (Oullier, 2012).

Reminder: This paper was published in 2012

Go to:

fMRI Lie Detection in Court

Despite early, sustained criticism by both scientific and legal scholars, the for-profit companies continued to push aggressively toward the courtroom. In spring, 2010, a New York State trial judge excluded fMRI expert testimony about a witness’s truthfulness in Wilson v. Corestaff Services, L.P., (2010). A few weeks later, a federal court in Tennessee granted the government’s motion to exclude fMRI expert testimony about defendant’s veracity in United States v. Semrau (2010). In both cases, parties sought to introduce the testimony of Dr. Steven Laken, CEO of Cephos, Inc., a company conducting commercial “credibility assessments” with fMRI.

Wilson v. Corestaff Services, L.P. was an employment discrimination suit in which the plaintiff offered fMRI testimony to shore up the credibility of a main witness. The defense filed a motion in limine to exclude such testimony, which the trial court granted without an evidentiary hearing. The court disallowed Dr. Laken’s testimony because the proposed testimony concerned a collateral matter—credibility of a witness—remarking that “anything that impinges on the province of the jury on issues of credibility should be treated with a great deal of skepticism.” The court also held that the testimony did not meet the Frye standard of admissibility, which requires novel scientific evidence to be generally accepted in the field to which it belongs:

Even a cursory review of the scientific literature demonstrates that the plaintiff is unable to establish that the use of the fMRI test to determine truthfulness or deceit is accepted as reliable in the relevant scientific community. The scientific literature raises serious issues about the lack of acceptance of the fMRI test in the scientific community to show a person’s past mental state or to gauge credibility.

Reminder: This paper was published in 2012

Wilson v. Corestaff Services L.P., (2010). There was no evidentiary hearing in Wilson and it settled without an appeal, so it is of marginal utility in terms of precedent. Nonetheless, for other states following the Frye standard, Wilson might be cited for its holding that the science lacks “general acceptance” in the field.

In United States v. Semrau, the trial court held an extensive evidentiary hearing to determine whether the proposed fMRI lie detection evidence was sufficiently reliable to be admitted at trial. Dr. Laken testified that the defendant was truthful when he denied committing Medicare fraud. Dr. Laken repeated the testing session on three consecutive occasions, due to problems in the first two. The first session was negative for deception but deemed suboptimal. The second session was positive for deception but Semrau complained of fatigue during the scan. Apparently, the second session had excessively long test questions with double negatives, such as “Except for X, have you ever done Y?”. The third session used reformulated test questions and was again negative for deception. Remarkably, Laken testified that he could not state whether Semrau was truthful with respect to any “specific incident question”; he could only testify to an overall picture of truthfulness.

Reminder: This paper was published in 2012

Magistrate Judge Tu M. Pham, appointed by the federal district court judge to hear the evidentiary motion, admitted testimony from opposing experts and reviewed affidavits submitted by experts. He analyzed the matter under substantially overlapping legal reliability standards - Federal Rule of Evidence (FRE) 702 governing expert testimony and the Supreme Court Daubert factors of (1) testability; (2) publication and peer review; (3) known error rate; (4) maintenance of standards and controls; and (5) general acceptance. (Daubert v. Merrell Dow Pharms., Inc., 1993).

The court found that the subject matter was tested and published in peer review journals, citing both legal and science journals discussing fMRI lie detection studies. Judge Pham was more troubled by Cephos’ claims about its tests’ error rates and testing standards. The court focused on the lack of ecological validity, remarking “[t]here are no known error rates for fMRI-based lie detection outside the laboratory setting, i.e, in the ‘real-world’ or ‘real-life’ setting;” a concern it voiced about both polygraph and fMRI lie detection.

Reminder: This paper was published in 2012

The Judge also reviewed other limitations and shortcomings of the fMRI studies that diminished the claim of a meaningful error rate: though peer-reviewed, all studies had small (N < 60) samples, included young and healthy participants who were not representative of the general population, and used different types of deception-generating paradigms. Further, the court opined that the critical flaw was the difference between the motivation of the research participants and real world suspects to lie. Finally, the court noted that all reviewed studies involved the investigators directing the participants to lie to various extents, possibly detecting brain activity related to task compliance rather than deception. In sum, the court held that based on the current state of the science, the “real life” error rate of fMRI-based lie detection was still unknown: a point with which we concur.

With respect to standards and controls, the court was troubled by the repeated tests used in the case at issue. The “decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.” The court determined that the use of fMRI for deception in the real-world was not generally accepted by the scientific community and concluded there was insufficient proof of legal reliability of the proposed evidence.

Reminder: This paper was published in 2012

The court also held, pursuant to FRE 403, that any probative value was substantially outweighed by the danger of unfair prejudice. By analogy to polygraph cases, the court noted that lie detection evidence to bolster credibility was highly prejudicial, particularly when credibility was a key issue and the scans were conducted without the prosecution’s knowledge. In addition, the court was troubled by Dr. Laken’s inability to state that Semrau was truthful as to any specific question, but could offer only a general impression of the subject’s truthfulness.

Semrau was convicted and has appealed (United States v. Semrau, 2011), providing an opportunity for the Court of Appeals for the Sixth Circuit to write an opinion with potentially precedential value for that Circuit and persuasive value to other federal courts.

Reminder: This paper was published in 2012

Go to:

Where do these cases leave the admissibility of fMRI evidence of deception? One must be careful about inferring too much from two trial court cases, particularly as one (Wilson) settled without appeal and the other (Semrau) is still evolving. Nonetheless, we can draw some limited general conclusions that might have predictive value about the legal future of fMRI veracity evidence and believe that Semrau (and to a lesser extent, Wilson) will be influential. We also address some of the competing arguments that might favor the admission of such evidence at this time.

There are four primary concerns these cases address and will likely be the focus of other courts’ decisions as well: credibility, reliability, general acceptance, and unfair prejudice. First, both opinions focused on the subject matter of the evidence—credibility. Wilson held that jurors did not need expert testimony on credibility; Semrau echoed U.S. Supreme Court concerns that collateral litigation over lie detection “threatens to distract the jury from its central function of determining guilt or innocence.” We believe courts will continue to be troubled by testimony that comments directly on credibility.

Reminder: This paper was published in 2012

The jury’s role as arbiter of credibility has long-standing, carefully-cultivated jurisprudential roots (Fisher, 1997Seaman, 2008 and a majority of courts disallow testimony that comments directly on the veracity of a particular witness, finding it not helpful to the jury or having little probative value. (Kaye, Bernstein, & Mnooken, 2012Faigman, Kaye, Saks, & Sanders, 2010–2011). “[E]xpert testimony which does nothing but vouch for the testimony of another witness encroaches up the jury’s vital and exclusive function to make credibility determinations, and therefore does not ‘assist the trier of fact’ as required by FRE 702” (United States v. Charley, 1999). Not all courts, however, disfavor such testimony, and a minority of jurisdictions hold that the trial court has discretion to decide if expert testimony on veracity should be admitted (Kaye et al., 2012, citing cases).

There are exceptions to the prohibition against experts providing testimony that comments on credibility. For example, experts routinely testify about witnesses suffering from serious mental illnesses that may cause delusions (Melton, Petrila, Poythress, & Slobogin, 2007). Additionally, many courts have admitted expert evidence that indirectly comments on credibility, particularly behavioral science testimony about child sexual abuse, behaviors of battered spouses, suggestibility of children in interrogations, problems of eyewitness identification, and reasons for false confession. (Faigman et al., 2010–2011Monahan, Walker, & Mitchell, 2008Myers, 2010Poulin, 2007). This testimony, often termed “social framework evidence,” permits experts to testify about general social science research results that are used to “construct a frame of reference or background context for deciding factual issues crucial to the resolution of a specific case.” (Monahan et al., 2008Walker & Monahan, 1987). Much of this testimony helps the jury decide if a given witness is credible without specifically commenting on the truthfulness of any particular witness. Not all courts approve of social framework testimony (particularly about eyewitness identification and false confession), holding it is not helpful to the jury in making decisions about witness credibility (United States v. Lumpkin, 1999). Other courts find social framework evidence too general to be helpful, since it is not about a particular witness, as noted by Monahan et al. (2008), citing cases.

Reminder: This paper was published in 2012

Except when parties stipulate to its admissibility, most courts hold that polygraph evidence is generally inadmissible. “Throughout the twentieth century, courts have been, at best, skeptical of polygraph tests and, at worst and more usual, hostile to them.” (Faigman et al., at §40.1, 2010–2011); (Gallini, 2010). While such hostility may be due mostly polygraph’s limited reliability (Gallini, 2010Scheffer, 1998Stern, 2003), courts are concerned about invading the jury’s role (United States v. Swayze, 2004) and may be uncomfortable with technology that purports to know when people are lying.

The Court of Appeals for the Sixth Circuit (the court deciding Semrau’s appeal) has held that polygraph evidence is presumptively inadmissible in the absence of a stipulation and is highly prejudicial where credibility is central to the verdict. (United States v. Sherlin, 1995). Nonetheless, discretion is granted to the trial court to decide whether the probative value of the polygraph evidence outweighs its prejudice. (United States v. Sherlin, 1995, using a modified FRE 403 test). We believe it is likely that many courts will react with disfavor to fMRI lie detection, reasoning that the evidence is about a collateral matter, is a direct comment on the credibility of a particular witness, and is unhelpful to the jury.

Reminder: This paper was published in 2012

Second, many courts will focus on the reliability of the evidence. Daubert’s criteria, especially the “error rates” standard, are formidable and many courts will likely find that fMRI lie detection cannot meet them at this juncture. The Semrau analysis is deep, careful, and compelling and will likely find traction with other courts: the experts cited and quoted in the opinion are considered well-qualified and authoritative. The current limitations of the science as discussed in the opinion are important concerns. As such, the next proponent of the fMRI credibility assessment evidence will have a difficult time encouraging a court to disregard the findings of the Semrau court.

More specifically, the concerns raised in Semrau about the lack of ecological validity will likely be troubling for other courts assessing the evidentiary reliability. The experimental data on fMRI lie detection has been derived from small scale laboratory studies of “normal” participants and have not been tested either in real-life situations or in populations that deviate from what is considered “normal” in experimental research. Additionally, these data were not derived from paradigms involving a level of risk to the participant that would approximate the risk/benefit ratio of deception in Semrau’s case. As we explained earlier, the Semrau court’s analysis of the shortcomings and limitation of the technology’s problematic “real world” error rate is compelling and we anticipate that most courts using a Daubert-type reliability standard will be inclined to follow Semrau’s reasoning. Reliability must be judged on a case-by-case basis—the “task at hand”—and not globally (Daubert, 1993Risinger, 2000), so it is conceivable that another litigant could make a more compelling showing in the courtroom about the reliability of fMRI lie detection. Nonetheless, it is currently difficult to separate the state of the science from any individual case.

Reminder: This paper was published in 2012

In addition to concerns that the evidence was not sufficiently reliable, the proposed evidence in Semrau was not a good “fit” with the questions at issue because the research studies could not be meaningfully applied to the truthfulness of the witness on the stand. The concept of “fit” considers whether the proposed evidence is relevant to resolving a fact in issue. In cases involving scientific evidence, Daubert recognizes that “scientific validity for one purpose is not necessarily scientific validity for other, unrelated purposes” (Daubert, 1993). The relevance of fMRI lie detection is inextricably tied to its reliability and FRE 702 requires a “valid scientific connection to the pertinent inquiry as a precondition to admissibility.”(Daubert, 1993). Thus, under both a relevance and reliability analysis, fMRI evidence currently falls short of what is required for admissibility.

Third, courts that use the Frye general acceptance test (such as Wilson did) will also likely disfavor the evidence. While a few scientists on the advisory board of Cephos filed affidavits in support of the science, most other neuroscientists involved with the fMRI lie detection research agree that it is not yet ready for forensic application (Spence, 2008). Thus, without new compelling data, a party seeking to prove general acceptance will have difficulty finding credible support within the scientific community. The multi-factor Daubert evidentiary reliability standard likewise uses the general acceptance factor in its analysis, and the lack of general acceptance among scientists in the field may well be critical to courts that follow Daubert.

Reminder: This paper was published in 2012

Fourth, we cannot fully discount the potential problems the combined effect of the superficial vividness of the evidence poses for fact finders unable to grasp the true scientific and statistical complexities of the fMRI technology. Early studies suggested that realistic brain images could influence the jury beyond what the evidence warrants (McCabe & Castel, 2008; Weisberg, Kiel, Goodstein, Rawson, & Gray, 2008), although there has been criticism of those studies (Schauer, 2010a). More recent data suggest that such images are not as overwhelmingly influential to a jury as originally believed. A recent large-scale study with a meta-analysis examined the influence of neuroscience expert testimony and neuroimaging testimony on mock juries determining guilt in a criminal case in which the defendant claimed not to have requisite intent to harm the victim. The authors conclude, that “the overwhelming consistent finding has been a lack of any impact of neuroimages on the decisions of our mock jurors” (Schweitzer et al., 2011). In the meta-analysis, the authors did find that a neurological explanation for defendant’s mental state—with or without brain images—was more influential to the jurors than a clinical psychological explanation. While this study is compelling, there is more to be done in the area, a point well explained by the authors.

It is likely the Sixth Circuit will affirm the lower court’s decision in Semrau, since federal courts of appeals review lower court decisions about expert evidence under an abuse of discretion standard. (General Electric Co. v. Joiner, 1997). It is also unlikely the Court of Appeals will find that the trial court in Semrau abused its discretion in excluding the proposed testimony under FRE 403. In general, an abuse of discretion will be found only if the trial court’s decision is “arbitrary,” “irrational,” “capricious,” “whimsical,” “fanciful,” or “unreasonable” [and]… the … exercise of its discretion will not be disturbed unless it can be said that ‘no reasonable person would adopt the district court’s view.’ (Nicolas, 2004). The Semrau decision is well reasoned and well-grounded in both facts and science, and it is unlikely a court of appeals will overturn it. Even if Semrau is affirmed, however, the Court of Appeals may choose not to address the issue in depth, simply finding that the court below did not abuse its discretion. If that happens, then the Magistrate Judge’s opinion may not carry much weight with other courts, since it may be considered an opinion limited to the facts of that case. Additionally, even if the Sixth Circuit writes an in-depth opinion on the reliability and admissibility of fMRI lie detection evidence, it will not be binding on other courts outside of the circuit and other federal courts may disregard it. Finally, the Court of Appeals may find that the defendant simply failed to meet the reliability standard in this case, but making no comment about the reliability of the science in general. Thus, the inadmissibility of this evidence is by no means certain in other courts. Yet, we believe the reasoning in Semrau will be persuasive, given the quality of the court’s analysis and its detailed explanation of the current limitations of fMRI lie detection.

Reminder: This paper was published in 2012

However, there are competing arguments that might favor admission of the testimony in future cases. Juries’ subjective assessments of credibility are quite poor and likely worse than the fMRI evidence. The basic fMRI veracity research is sound science of the type envisioned by Daubert: it is peer-reviewed research done by various scientists in quality laboratories under well-controlled conditions. (Schauer, 2010b). If admitted, it should be as probabilistic rather than categorical evidence, much the way DNA evidence is admitted. Empirical scholarship suggests that juries do not necessarily overvalue random match probabilities and can make reasonable use of complex material with appropriate instruction (Nance & Morris, 20022005). Thus, fMRI lie detection evidence, which would present less robust statistical significance than DNA evidence, may also not be overvalued by the jury. Additionally, the fMRI veracity research is also far better experimentally grounded than the commonly admitted individualization evidence (fingerprints, handwriting, tool-marks, etc.) roundly criticized by the NRC Report (2009). Finally, other forms of neuroimaging, such as nuclear medicine (PET and SPECT) evidence, are often admitted in civil and criminal trials for various purposes (Rushing, Pryma, & Langleben, 2012), often without proof of meeting Daubert’s reliability standard (Moriarty, 2008).

Criminal defendants, however, may be able to introduce the evidence in certain types of proceedings. Due to constitutional rights, statutory enactments, and concerns over wrongful convictions, fMRI credibility assessment testimony might be admissible without meeting either the Frye or Daubert standards: in the penalty phases of capital cases, where defendants have a constitutional right to present mitigating evidence (Smith v. Spivak, 2010); or to support a claim of post-conviction innocence where there is other, newly-discovered evidence.

Reminder: This paper was published in 2012

In capital cases, courts frequently permit defendants to introduce a variety of evidence (including neuroscience) to prove brain damage or mental impairment without stringent proof of reliability (Moriarty, 2008). For example, courts have admitted PET and SPECT scans during the penalty phase of capital cases to establish the defendant’s mental impairment, even when such evidence may not rise to the level of evidentiary reliability. The Supreme Court has consistently affirmed constitutional protections for defendants to introduce mitigating evidence in penalty hearings (McKoy v. North Carolina, 1990). “[S]tates cannot limit the sentencer’s consideration of any relevant circumstances that could cause it to decline to impose the [death] penalty” (McCleskey v. Kemp, 1987). More particularly, the juror may “not be precluded from considering, as a mitigating factor, any aspect of a defendant’s character or record and any of the circumstances of the offense that the defendant proffers as a basis for a sentence less than death” (Penry v. Lynaugh, 1989). A defendant may be able to make a compelling case that fMRI lie detection will meet this foregoing standard.

Although the Federal Rules of Evidence do not apply in sentencing proceedings, some courts have required proof of the reliability of evidence admitted in sentencing (United States v. Smith, 2010). This reliability requirement has been mentioned in capital case penalty hearings upholding the exclusion of polygraph evidence (United States v. Fulks, 2006). Given that the only cases addressing fMRI evidence of lie detection have found it both unreliable and not generally accepted, courts may not be receptive to the testimony even in the penalty phase. However, in light of the often lax standards for evidentiary reliability in the penalty phase, the frequent admission of nuclear medicine evidence in these hearings, and the strong constitutional support for defendants’ right to introduce mitigating evidence, it is possible that fMRI lie detection evidence will gain a foothold in the courtroom in this manner. For example, a court might permit fMRI evidence that the defendant is being truthful when he expresses remorse about a crime or denies remembering a crime because he was intoxicated. It is thus conceivable that either a trial court will permit such evidence or an appellate court will find an abuse of discretion where a trial court refused to allow such evidence.

Reminder: This paper was published in 2012

One court has already admitted fMRI evidence relevant to another concern in a penalty hearing. In a 2009 death penalty case in Illinois, State v. Dugan, the defense introduced expert testimony during the penalty phase that Dugan suffered from psychopathy, arguing that it affected defendant’s ability to control his impulse to kill (Hughes, 2010). The trial court allowed the expert to discuss the fMRI scans taken of Dugan’s brain as additional proof of the defendant’s psychopathy. The court also permitted expert testimony to help establish that Dugan’s psychopathy should make him less culpable. The trial court allowed the expert to explain the scans and to use diagrams of the brain, but did not permit him to use the actual fMRI images of Dugan’s brain activity. Despite the admission of such expert testimony, Dugan received the death penalty. However, there was a signed verdict form discovered after the sentencing indicating that the jury actually intended to render a verdict of life (Barnum and St. Clair, 2009). If the jury did originally decide not to impose the death penalty, it suggests the testimony was influential. However, Dugan’s appeal on this issue was dropped when Illinois abolished the death penalty (Barnum, 2009), so the issue remains unresolved.

fMRI lie detection evidence also has the potential to be admitted post-trial in a compelling case of claimed innocence. In Harrington v. State (2003), a trial court permitted testimony from an expert who testified about “brain fingerprinting”—a form of EEG that claims to be able to determine whether a person recognizes a word or image. Although brain fingerprinting has been roundly criticized (Rosenfeld, 2005), the trial court in that case heard testimony from Dr. Farwell, who testified that defendant’s brain waves were consistent with his claims of innocence and his alibi. The trial court ultimately denied Harrington’s claims, believing them time barred, but the Supreme Court of Iowa reversed, holding that the defendant was entitled to a new trial. Upon reviewing the record de novo and considering all the circumstances, the court’s confidence in the soundness of the defendant’s conviction was “significantly weakened.” Although the Supreme Court of Iowa mentioned Dr. Farwell’s testimony in a footnote, it neither commented on the appropriateness of its admission nor relied upon it in its decision. It is difficult, however, for defendants to get a new trial after conviction and appeal (Griffin, 2009), and other defendants who sought to hire Farwell met with judicial resistance (Moriarty, 2008). However, another court in a similar circumstance might be more impressed by fMRI evidence, which is based upon far more reliable science than the Brain Fingerprinting (Rosenfeld, 2005Schauer, 2010b).

Reminder: This paper was published in 2012

The Current State of Scientific Concerns: What Needs to Be Done

Irrespective of which party seeks to introduce the testimony or in what circumstances the proposed testimony is presented, the published indicia of accuracy and reliability of fMRI lie detection are not sufficient for the courtroom. The problem posed and answered here is how to bridge the gap between the basic studies done to date and a requisite standard of evidentiary reliability.

Under certain, controlled laboratory conditions, endorsed lie and truth were distinguished in individual subjects with 76% to 90% accuracy(Ganis, Rosenfeld, Meixner, Kievit, & Schendan, 2011Langleben et al., 2005). These findings have been moderated by two recent studies. In the first, (Kozel et al., 2009), used a sequence of two deception paradigms generating tasks that involved denying mock crimes. The first mock crime was the scenario from Kozel et al.’s, earlier study (2005), in which participants pretended to steal a watch or a ring. fMRI was able to correctly classify 25 out of 36 (69%) participants. Those participants whose lies were correctly identified then committed another mock crime and were compared with a control group that did not commit any mock crimes. All participants correctly identified on the first mock crime task were also identified on the second task. However, of the control group, only 5 out of 15 were correctly identified, yielding 100% sensitivity but only 33% specificity.

Reminder: This paper was published in 2012

Another study using a within-subject design and a sophisticated non-parametric analysis (Davatzikos, 2005) reported a classification accuracy of 100% (Ganis et al., 2011), although researchers found it to be reduced to 33% when participants used hand movements as countermeasures. These diverse scenarios, fMRI designs, and data analysis approaches do not allow a direct comparison or an estimate of the overall error rates of the technology. Moreover, they raise the question whether overall error rates are a meaningful variable or whether error rates for each testing scenario need to be evaluated separately.

Importantly, the group differences between lie and truth consistently involved the lateral and inferior prefrontal and posterior parietal cortices and appear unaffected by gender, handedness and language. While this is a fairly advanced state of basic science for a topic in behavioral fMRI research, legitimate forensic use requires substantially more validation. The major issues are validation in ecologically valid situations, where (1) stakes are higher; (2) the more significant potential confounds (subject’s age, medical condition, culture) are accounted for (Bizzi et al., 2009Langleben, 2008Simpson, 2008Spence, 2008); and (3) the effects of motor and cognitive countermeasures are evaluated in a deliberate fashion.

Reminder: This paper was published in 2012

Finally, while the inherent accuracy of lie detection within an individual subject is a prerequisite for further translational research, understanding the error rate of a test is not complete until its positive and negative predictive power are also known. The accuracy of discrimination between two conditions within subjects is not equivalent to the probability of detecting liars in a cohort containing liars and truth tellers, with truth-tellers being a majority. Though studies have begun to address these gaps (Abe, et al., 2009Ganis, Kosslyn, Stose, Thompson, & Yurgelun-Todd, 2003Ganis, Rosenfeld, Meixner, et al., 2011Kaylor-Hughes, et al., 2010Kozel, et al., 2009Mildner, Zysset, Trampel, Driesel, & Moller, 2005Nunez, Casey, Egner, Hare, & Hirsch, 2005), comprehensive answers to the translational questions require a more robust effort.

Several technical aspects of cognitive fMRI experiments have direct forensic relevance and raise additional questions that researchers might develop. First, BOLD fMRI, used in all fMRI studies of deception, is one of many fMRI techniques and fMRI itself is one of many approaches available on most high-field MRI scanners. Conceivably, other fMRI approaches could be superior to BOLD in lie detection. Second, BOLD fMRI describes changes in regional brain activity over time rather than providing an absolute measure of local brain activity. Consequently, “lie detection” using BOLD fMRI involves interpretation of the BOLD fMRI signal differences between a test and comparison questions; the questions used are thus critical to the result. Third, it is unclear whether there is a brain fMRI pattern specific to deception, and at least some of the studies indicate that the pattern of deception is specific to the experimental paradigm used to generate it. Though the left prefrontal cortex is a leading candidate for a region specifically activated during deception (Spence & Kaylor-Hughes, 2008Spence, Kaylor-Hughes, Brook, Lankappa, & Wilkinson, 2008), until these data are clinically validated, we cannot assume that fMRI patterns and error rates will generalize between deception tasks with different sequence and content of target or comparison questions. fMRI discrimination between lie and truth is possible without knowing whether there is a deception-specific fMRI pattern, as long as the difference between lie and truth in a specific questioning format (i.e. CIT) is known and reliable. This reliance of fMRI based lie detection on discrimination between two behavioral conditions (lie and a known truth or other baseline) generated by a pre-set question format allows the translational studies of clinical relevance to proceed without waiting for the outcomes of the search for the “lie center” in the brain. This question is part of the debate about localized vs. distributed functions in the brain that dates back to the nineteenth century and may continue well after the determination of the utility and scope of the potential use of fMRI for lie detection is complete. As an analogy, we use antidepressant drugs extensively, without knowing their exact mechanisms of action.

Reminder: This paper was published in 2012

Though the basic cognitive neuroscience study of deception is clinically important in the long run, the critical question of error rates and other translational questions described earlier can and should be answered independently of the basic research on the mechanisms, since they will determine the level of public interest in the entire field of fMRI based lie detection. Similarly, though the interaction among memory, emotion and deception is important academically (Phelps, 2009), and for the comprehensive understanding of the countermeasures to lie detection, the translational studies can proceed ahead of or simultaneously with basic research. It is also likely that many of the basic science questions on the mechanisms of deception could be incorporated into clinical trials with no added costs.

Scholars continue to discuss factors affecting admissibility related to both scientific and legal considerations. For example, Shen and Jones have focused on the design of the tasks, the ecological and external validity of the conditions, and concerns about statistical methods and group-data averaging implications (Shen and Jones, 2011). Other voiced concerns include data interpretations and the problems of ecological validity (Kanwisher, 2009), as well as the various juridical concerns that neuroscience lie detection—like other forms of lie detection—pose for courts (Rakoff, 2009Imwinkelried, 2011).

Reminder: This paper was published in 2012

While we believe the concerns raised in Semrau address the primary considerations related to legal reliability, it is also worth noting that complications arise from discrepancies in the meaning of crucial terms such as validity and reliability between law and science. For example, in medicine and biostatistics, the term “validity” refers to the relevance of the test. That is to say, whether the test actually measures what it purports to. For example, to determine whether fMRI lie detection is a valid test of deception, one would ask whether the brain activation detected by fMRI during a deception task is indeed related to deception. The term “reliability” refers to reproducibility of the test results when the test is repeated. With fMRI lie detection, this would mean that the same regions of the brain repeatedly show activation when presented with the same question within a single session and across several different sessions.

Courts and litigants, however, do not assign the same meaning to reliability or use it with the scientific level of precision. For example, when lawyers argue about the “reliability” of expert evidence, they debate whether the testimony is sufficiently “trustworthy” to constitute appropriate courtroom evidence; they rarely are referring to its reproducibility. A colorful example of the law’s interpretation of reliability is found in Justice Scalia’s concurrence in Kumho Tire Co., Inc. v. Carmichael (1999), where he notes that the court has the discretion “to choose among reasonable means of excluding expertise that is fausse and science that is junky.”

Reminder: This paper was published in 2012

In Daubert v. Merrell Dow Pharmaceuticals, Inc., the Supreme Court comments that “to qualify as ‘scientific knowledge,’ an inference or assertion must be derived by the scientific method. Proposed testimony must be supported by appropriate validation—i.e., ‘good grounds,’ based on what is known. In short, the requirement that an expert’s testimony pertain to ‘scientific knowledge’ establishes a standard of evidentiary reliability.” In footnote 9 following the quote, the court explains its understanding of the distinction between reliability and validity:

We note that scientists typically distinguish between “validity” (does the principle support what it purports to show?) and “reliability” (does application of the principle produce consistent results?)…. Although “the difference between accuracy, validity, and reliability may be such that each is distinct from the other by no more than a hen’s kick, … our reference here is to evidentiary reliability—that is, trustworthiness…. In a case involving scientific evidence, evidentiary reliability will be based upon scientific validity.

Reminder: This paper was published in 2012

Thus the court defines “legal reliability” in terms of “scientific validity.” While this muddling of the terms may have been intentional, it is equally probable that the court was aiming at the concept of validity: does the test actually do what it purports to do? Analyzing this standard in terms of the legal reliability of fMRI lie detection, the question is the same: Does the fMRI test determine whether a given person is or is not lying? The only answer that current data can provide is that in a controlled laboratory setting fMRI can identify deceptive responses with 71% to greater than 90% accuracy. Is that enough for “legal reliability”? We do not think so. Without knowing the positive and negative predictive power of the test, there is no accurate way to respond to Daubert’s “known error rate” inquiry. This science is currently in the area focused on in the Joiner court, where the court remarked that there may be “simply too great an analytic gap between the data and the opinion proffered” (General Electric Co. v. Joiner, 1997). Until properly controlled trials are done, the science remains in that “analytic gap.” But such current concerns about fMRI lie detection are not fatal to the endeavor—rather, the science is in its nascent form and requires time and funding to better define its clinical potential. Similar critiques were leveled at early studies conducted on eyewitness identification, which, after much continued research, now qualifies as scientifically reliable evidence (Cutler and Wells, 2009; Leippe, 1995). Despite the need for a good method to detect deception, we do not have one, and “the research should vigorously explore alternatives to the polygraph, including functional brain imaging” (Raichle, 2009).

A major concern with fMRI lie detection is the looming problem that subsequent studies will prove the early studies wrong; a possible outcome in all developing research. The danger of admitting scientific evidence before it is proven to be sufficiently reliable and valid is by now well known. For example, Garrett and Neufeld examined the trial transcripts of 137 exonerated defendants and concluded that approximately 60% of those trials included flawed science (Garrett and Neufeld, 2009). The NRC Report concludes that “no forensic method other than nuclear DNA analysis has been rigorously shown to have the capacity to consistently and with a high degree of certainty support conclusions about … ‘matching’ of an unknown item of evidence to a specific known source” (NRC, 2009). These forms of forensic evidence include fingerprints, toolmarks, handwriting, bitemarks and hair comparison; the often critical evidence in criminal trials.

Reminder: This paper was published in 2012

The Report finds that the interpretation of forensic evidence is not always based on scientific studies to determine its validity. There is no body of research on the limits and measures of performance or the problems of both bias and variability among those performing the analysis and no rigorous protocols to guide the subjective interpretations. The Report goes so far as to say “[t]he Law’s greatest dilemma in its heavy reliance on forensic evidence, however, concerns the question of whether—and to what extent—there is any science in any given ‘forensic science’ discipline.” (NRC, 2009). Despite the scathing critique of such forensic science, it continues to flow into the courtroom (Moriarty, 2010). For example, recent decisions have upheld the admission of both fingerprint and handwriting comparison, despite recognition of the NCR Report’s criticisms. (United States v. Love, 2011) (fingerprint comparison); (Pettus v. United States, 2012) (handwriting).

To date, even at this early stage, the fMRI lie detection research is far better-grounded than much of what passes for forensic science, as the more than two dozen peer-review articles on the subject establish. Yet we do not believe that even if fMRI lie detection is better than much of forensic science it is ready for the courtroom. Such potentially powerful testimony as fMRI lie detection should not be admissible without better proof of validity and reliability. The courts are now grappling with forensic science that has been admitted without adequate proof of reliability; we should not repeat this error with fMRI lie detection.

Reminder: This paper was published in 2012

The Policy Analysis

Though fMRI lie detectors are not ready for legal application, we believe that fMRI offers a theoretical possibility of improvement over current means of credibility assessment and could satisfy the yet unmet needs of the legal, defense, and law enforcement communities (NRC, 2009). Objective means for detecting deception have a high potential social benefit. Moreover, fMRI studies of deception have provided important scientific insights into the role of deception in cognition (Greene & Paxton, 2009Langleben, et al., 2002) that are relevant to such diverse topics as morality, drug addiction, and treatment non-adherence in chronic medical illness. Thus, the topic is well worth pursuing with both translational forensic and basic research. Therefore, it is in the public interest to guide the development of fMRI lie detection technology, rather than leave it to other stakeholders, such as for-profit companies.

Though companies offering commercial MRI veracity testing seem to promise more than they can deliver, we do not believe that new legislation is needed to regulate their activity or the admissibility of their data as evidence. First, the size and scope of these companies is exceedingly small. Second, despite substantial problems of reliability and jurisprudential concerns about polygraph, there has been no major movement to legislatively ban its use in all circumstances, except for non-Government pre-employment testing (OTA, 1990) and other limited categories, and certainly not to prohibit it as a category of evidence. Third, there has been no apparent movement to enact an FRE provision similar to Military Rule of Evidence 707, which bans polygraph evidence, despite the Supreme Court upholding the constitutionality of MRE 707 (Scheffer, 1998). Finally, there is likely little political interest in championing legislative prohibition about fMRI, given the current state of political affairs and the more critical public interest concerning the substantial shortcomings of forensic science currently in use. Rather than focus on regulation, we propose to use science to pull fMRI lie detection out of the limbo. Specifically, practical legal analysis and comprehensive translational experimental data are needed to resolve the remaining questions of fMRI veracity testing.

Reminder: This paper was published in 2012

The most important missing piece in the puzzle is Daubert’s “known error rate” standard. Determining the error rates for fMRI based lie detection requires validation of the method in settings convincingly approximating the real life situations in which legally significant deception takes place, in terms of the risk/benefit ratio, relevant demographic and the prevalence of the behavior in question.

Clinical validation of a test is an expensive enterprise usually performed by commercial interests. Under the medical model of drug and device development, controlled clinical trials are required to determine whether the device is efficacious and superior to existing alternatives and to determine the error rates in the target populations. Applied to fMRI lie detection, such trials would include testing the technology in key target populations and age groups under deception scenarios with various levels of risk and benefit. This implies that some of the trials would have to hold the deception scenario constant while testing the effect of a demographic variable on the outcomes, while others would have to hold the demographic constant and manipulate the experimental scenario or task. The relatively large number of variables is what is likely to require the large overall number of participants, though the number required for each study could be relatively small (50–100). Continuing the parallel to medical test development, the incidence of spontaneous deception in the target populations is variable and rather low. Baldessarini et al. (1983) elaborated on the potential clinical validation of the Dexamethasone Suppression Test (DST). In a research setting, DST had 70% sensitivity and 95% specificity for diagnosing depression. In Baldessarini’s example, the predictive value of a positive test (PPV) was 93% in the research sample that had a 50% prevalence of the disease (100 patients with depression and 100 healthy controls). In a specialty clinic, where the prevalence of depression was 10%, the PPV of the test declined to 63% and in a primary care setting, with sample of 1000 and disease prevalence of 1%, the PPV became a dismal 12% (Baldessarini et al., 1983).

Reminder: This paper was published in 2012

We draw three conclusions from this illustration. First, screening settings are more demanding on test accuracy and it is unlikely that fMRI or any other lie detector, including the polygraph, will ever reach the positive predictive power sufficient for screening for deception among large groups of mostly innocents. Second, fMRI based technology may be useful in the forensic settings where the prevalence of deception is much higher than in the general population. Third, a series of properly powered and controlled prospective studies (i.e. clinical trials) would be required to confirm or disprove this hypothesis. Such studies would be adequately powered to include a few target participants (liars) mixed into a proportionally large number of honest participants. This would permit meaningful calculations of the error rates, including within-subject accuracy and predictive values. Despite the ethical challenges such trials may pose, forensic functional imaging studies are not inconceivable in both normal and pathological populations (Fullam, McKie, & Dolan, 2009Hakun et al., 2009Kozel & Trivedi, 2008Yang et al., 2007). Another way of estimating new technology’s efficacy is a “head-to-head” comparison between fMRI and the polygraph. Finally, mathematical modeling could help extrapolate findings. Such studies would involve hundreds of participants and could cost between 5 and 15 million dollars, a price tag below an average pharmaceutical company study. Though a recommendation for more research may seem too general, guiding fMRI lie detection research toward socially beneficial and conclusive findings is unlikely to occur without targeted policy.

In clinical development terms, the fMRI lie detector is stuck between Phase I and Phase II clinical trials, with the commercial start-ups lacking the capacity to proceed to Phase III—a common situation with compounds or devices of unclear commercial value. For devices with clear public health interest, such as vaccines for drug addiction, the United States National Institutes of Health (NIH) have often bridged the funding gap. Despite a pivotal role of deception in a range of personality disorders, drug and alcohol abuse and treatment non-adherence, so far NIH has not recognized deception as a health issue. United States defense and intelligence agencies have funded research in this area, but its results have been slow to appear in scientific literature (Dedman, 2009; Moreno, 2006; Stern, 2003) and may be subject to non-scientific bias similar to those that afflicted the Department of Defense-sponsored polygraph research. A $5 million congressional earmark in the 2004 and 2005 defense budgets funded the Center for Advanced Technologies for Deception Detection (CATDD) at the University of South Carolina at Columbia (Hickman, 2005). At the time of this writing, we were unable to identify peer-reviewed publications on lie-detection from CATDD. The MacArthur Foundation’s $10M Law and Neuroscience Project has produced some important basic data on lie detection (Greene & Paxton, 2009) with ethical and legal analysis, but has not addressed the translational questions (Gazzaniga, 2008). Thus, no group has been able to spur the program of translational research outlined above, while the clinical nature and relatively large scope puts such project outside of the purview of the National Science Foundation.

Reminder: This paper was published in 2012

Conclusion

In conclusion, we believe that at the present stage of development, the most important policy intervention in the field of brain-based lie detectors is a public funding initiative leading to a peer-reviewed translational research program with a special emphasis on a series of multicenter clinical trials to determine the error rates of the technique, the sensitivity to countermeasures, the effect of high benefit to risk ratios, the relative accuracy compared to polygraph and the effects of age, gender, common pharmacological agents and cognitive status. The specificity of any given pattern of brain activity to deception is likely to be addressed as a byproduct of the studies described above. Considering a multitude of stakeholders, the charged and controversial nature of the topic and the potential societal impact of this technology, a collaboration of several agencies may be required to create a funding mechanism that could impartially assess and guide the development of forensic fMRI technology.

7 comments

Comments sorted by top scores.

comment by jbash · 2024-03-16T22:47:03.430Z · LW(p) · GW(p)

It's actually not just about lie detection, because the technology starts to shade over into outright mind reading.

But even simple lie detection is an example of a class of technology that needs to be totally banned, yesterday[1]. In or out of court and with or without "consent"[2]. The better it works, the more reliable it is, the more it needs to be banned.

If you cannot lie, and you cannot stay silent without adverse inferences being drawn, then you cannot have any secrets at all. The chance that you could stay silent, in nearly any important situation, would be almost nil.

If even lie detection became widely available and socially acceptable, then I'd expect many, many people's personal relationships to devolve into constant interrogation about undesired actions and thoughts. Refusing such interrogation would be treated as "having something to hide" and would result in immediate termination of the relationship. Oh, and secret sins that would otherwise cause no real trouble would blow up people's lives.

At work, you could expect to be checked for a "positive, loyal attitude toward the company" on as frequent a basis as was administratively convenient. It would not be enough that you were doing a good job, hadn't done anything actually wrong, and expected to keep it that way. You'd be ranked straight up on your Love for the Company (and probably on your agreement with management, and very possibly on how your political views comported with business interests). The bottom N percent would be "managed out".

Heck, let's just have everybody drop in at the police station once a month and be checked for whether they've broken any laws. To keep it fair, we will of course have to apply all laws (including the stupid ones) literally and universally.

On a broader societal level, humans are inherently prone to witch hunts and purity spirals, whether the power involved is centralized or decentralized. An infallible way to unmask the "witches" of the week would lead to untold misery.

Other than wishful thinking, there's actually no reason to believe that people in any of the above contexts would lighten up about anything if they discovered it was common. People have an enormous capacity to reject others for perceived sins.

This stuff risks turning personal and public life into utter hell.


  1. You might need to make some exceptions for medical use on truly locked-in patients. The safeguards would have to be extreme, though. ↩︎

  2. "Consent" is a slippery concept, because there's always argument about what sorts of incentives invalidate it. The bottom line, if this stuff became widespread, would be that anybody who "opted out" would be pervasively disadvantaged to the point of being unable to function. ↩︎

Replies from: TrevorWiesinger
comment by trevor (TrevorWiesinger) · 2024-03-16T23:16:39.344Z · LW(p) · GW(p)

Yes, this is why I put "decentralized" in the title even though it doesn't really fit. What I was going for with the post is that you read it yourself, except whenever the author writes about law, you think for yourself about stacking the various applications that you care about (not courts) with the complex caveats that the author was writing about (while they were thinking about courts). Ideally I would have distilled it as the paper is a bit long.

This credibly demonstrates that the world we live in is more flexible than it might appear. And on the macro-civilizational scale, this particular tech looks like it will place honest souls higher-up on net, which everyone prefers. People can establish norms of remaining silent on particular matters, although the process of establishing those norms will be stacked towards people who can honestly say "I think this makes things better for everyone", "I think this is a purity spiral" and away from those who can't.

At work, you could expect to be checked for a "positive, loyal attitude toward the company" on as frequent a basis as was administratively convenient. It would not be enough that you were doing a good job, hadn't done anything actually wrong, and expected to keep it that way. You'd be ranked straight up on your Love for the Company (and probably on your agreement with management, and very possibly on how your political views comported with business interests). The bottom N percent would be "managed out".

This is probably already happening [LW · GW].

comment by Slapstick · 2024-03-16T04:59:14.114Z · LW(p) · GW(p)

high-trust friend groups

I'm having a hard time imagining a scenario in which I would find this valuable in my friend groups. If I were ever unsure whether I could trust the word of a friend on an important matter, I'd think that would represent deeper issues than a mere lack of information a scan of their brain could provide. Perhaps I'm nieve or particular in some way in how I filter people.

Do you have examples for how this would aid friendships? Or the other domains you mentioned?

I could see it being very valuable but I also find the idea very frightening, and I am not someone who lies.

Replies from: Viliam, romeostevensit
comment by Viliam · 2024-03-17T09:48:27.857Z · LW(p) · GW(p)

The traditional technology used for similar purposes in some cultures is alcohol. The idea is that as alcohol impairs thinking, it impairs the ability to lie convincingly even more. Especially considering that even if one drunk person lies successfully to another drunk person, the next day the other person can reflect on the parts they remember with a sober mind.

Thus, alcohol is an imperfect lie detector with a few harmful side effects; and in cultures where it is popular, groups of friends do this together, and conspicuously avoiding it will provide evidence against your sincerity.

If I were ever unsure whether I could trust the word of a friend on an important matter, I'd think that would represent deeper issues than a mere lack of information a scan of their brain could provide.

Friendships exist on a scale. If you switch from "a stranger" to "100% trusted person" too quickly, you probably have some unpleasant surprises waiting for you in the future. Also, friendship is not transitive, and sometimes you need to know whether you can trust a friend of a friend (even when your friend says "yes"). I know some people whom I trust, but I definitely do not trust their judgment about other people.

Replies from: Slapstick
comment by Slapstick · 2024-03-17T17:24:47.198Z · LW(p) · GW(p)

I am sceptical about the role of alcohol you describe and dynamics around it as a form of lie detector, but I know there's a range of social dynamics I haven't necessarily been exposed to in my culture.

I have been in various groups that heavily drink on occasion, but I've never seen any evidence of people being viewed as having something to hide were they not to drink.

I think alcohol might make people more honest but I think it's usually things they already wanted to divulge but for lack of some courage or sense of emotional intimacy that alcohol can provide. It's hard for me to imagine alcohol playing a similar role as a lie detector for significant factual information people strongly want to hide.

Could you offer any examples of where a real lie detector would be valuable in friendships or potential friendships?

A lot of the things I might want to know seem challenging to address via a lie detector. "Will you do anything violent or steal or intentionally damage my property," People likely to do those things might honestly intend not to.

I could see it potentially being useful for people having sex more on the casual side.

comment by romeostevensit · 2024-03-16T22:00:37.191Z · LW(p) · GW(p)

Every subculture I've participated in has lowkey bad actors. The harms this causes are underrated imo.

Replies from: TrevorWiesinger
comment by trevor (TrevorWiesinger) · 2024-03-16T22:36:40.928Z · LW(p) · GW(p)

There's bad actors who infiltrate, deceptively align, move laterally, and purge talented people (see Geeks, Mops, and Sociopaths) but I think that trust is a bigger issue. 

High-trust environments don't exist today in anything with medium or high stakes, and if they did then "sociopaths" would be able to share their various talents without being incentivized to hurt anyone, geeks could let more people in without worrying about threats, and people could generally evaluate each other and find the place where their strengths resonate with others.

That kind of wholesome existence is something that we've never seen on Earth, and we might be able to reach out and grab it (if we're already in an overhang for decentralized lie detectors).