Why we need better science, example #6,281

post by lukeprog · 2011-12-10T23:25:39.363Z · LW · GW · Legacy · 24 comments

Avorn (2004) reports:

In a former British colony, most healers believed the conventional wisdom that a distillation of fluids extracted from the urine of horses, if dried to a powder and fed to aging women, could act as a general tonic, preserve youth, and ward of a variety of diseases. The preparation became enormously popular throughout the culture, and was widely used by older women in all strata of society. Many years later modern scientific studies revealed that long-term ingestion of the horse-urine extract was useless for most of its intended purposes, and that it causes tumors, blood clots, heart disease, and perhaps brain damage.

The former colony is the United States; the time is now; the drug is the family of hormone replacement products that include Prempro and Premarin (manufactured from pregnant mares' urine, hence its name). For decades, estrogen replacement in postmenopausal women was widely believed to have "cardio-protective" properties; other papers in respected medical journals reported that the drugs could treat depression and incontinence, as well as prevent Alzheimer's disease. The first large, well-conducted, controlled clinical trial of this treatment in women was not published until 1998: it found that estrogen replacement actually increased the rate of heart attacks in the patients studied. Another clinical trial published in 2002 presented further evidence that these products increased the risk of heart disease, stroke, and cancer. Further reports a year later found that rather than preventing Alzheimer's disease, the drugs appeared to double the risk of becoming senile. 

Armstrong (2006) adds:

The treatment seemed to work because those who used the drug tended to be healthier than those who did not. This was because it was used by people who were more interested in taking care of their health.

24 comments

Comments sorted by top scores.

comment by Scott Alexander (Yvain) · 2011-12-11T12:59:26.746Z · LW(p) · GW(p)

Though keep in mind that it's much more complicated than "Haha, some idiots were drinking chemicals from horse urine but now we know better."

HRT is "undoubtedly" effective for hot flashes, osteoporosis, and atrophic vaginitis. It's probably effective against some cancers. It may be effective against heart disease and dementia when taken in the first decade after menopause, but not after (the women in the study you mention were mostly in their 60s). However, all of the risks mentioned here are very real and so HRT is not indicated except when people have premature menopause or unusually severe menopausal symptoms (these people should talk to their doctors about it)

Also, why title this "Why We Need Better Science"? Sounds to me like we have great science, we just need people to hold their horses (pun not intended) until the results come in instead of chasing after every therapeutic fad.

comment by [deleted] · 2011-12-10T23:39:24.768Z · LW(p) · GW(p)

In addition to paying less attention to things that aren't randomized controlled trials, we should pay more attention to things that are. Example.

Replies from: buybuydandavis
comment by buybuydandavis · 2011-12-11T03:55:23.309Z · LW(p) · GW(p)

I've been surprised at how much deference is given to randomized clinical trials here. Particularly for hormone replacement - it's bad inference, and bad decision theory.

Since when are Bayesians enamored of the results of trials against a null hypothesis that throws away almost all the relevant information about you, and leaves the decision about treatment for you up to the results of a 95% confidence interval on people who are not you, for whom you've also thrown away (or never collected) most of the relevant data?

Of course it depends on the nature of the malady you're trying to treat, but the best indicator of whether someone works for you is a trial on you. If the treatment is generally safe, try it and see. Measure relevant variables. See how it works. on the symptoms. People doing HRT generally aim to keep their levels within reference ranges, using the actual hormones that already exist in our bodies, instead of horse hormones. Too much or too little testosterone may be bad, but it's not like my body has never had to deal with it before.

Maybe it's just because I'm a little older than the mean here, but I don't think I have decades to wait for decades of longitudinal studies on life extension strategies, which probably won't even get decent funding until a significant mass of people have used them for a decade without any such studies to rely on. If I want to be around for escape velocity, I think I've got to get moving now. I don't use a 10 year old cel phone. I'm not going to limit myself to 10 year old medical technology either.

Replies from: Yvain, DanielLC
comment by Scott Alexander (Yvain) · 2011-12-11T12:46:17.831Z · LW(p) · GW(p)

Absent any other prior, why would you use anything other than "My body will react to hormones the same way most other people's bodies react to hormones"?

And you can't self-experiment on risk of a heart attack. Your only endpoint is "I had a heart attack" or "I didn't have a heart attack", and even if you don't mind getting your experimental result exactly one instant too late to help you, with a sample size of one you can't draw any conclusions about whether taking HRT for ten years contributed to your heart attack or not.

And probably the most important reason is that medicine is weird. Even when the smartest people try to predict results that should be obvious, they very often get them wrong. "Based on what I know about the body, this sounds like it should work" is the worst reason to do anything. I know that sounds contrary to Bayes, but getting burned again and again by things that sound like they should work has recalibrated me on this one.

If you're saying that you have unusual incentives here - eg that you value the possibility of adding to your natural lifespan enough that you're willing to accept a small risk of subtracting from it and a large risk that you're wasting time and money, that's fair enough.

Replies from: wedrifid, Eugine_Nier, buybuydandavis
comment by wedrifid · 2011-12-11T14:01:15.144Z · LW(p) · GW(p)

"Based on what I know about the body, this sounds like it should work" is the worst reason to do anything.

"Because this hasn't worked for any of the thousands of people who have tried it before this is almost certainly going to work for me!"

comment by Eugine_Nier · 2011-12-12T01:41:32.971Z · LW(p) · GW(p)

And probably the most important reason is that medicine is weird. Even when the smartest people try to predict results that should be obvious, they very often get them wrong. "Based on what I know about the body, this sounds like it should work" is the worst reason to do anything. I know that sounds contrary to Bayes, but getting burned again and again by things that sound like they should work has recalibrated me on this one.

Reality isn't weird. What this means is that you know less about the body then you think you do.

Replies from: Yvain, buybuydandavis
comment by Scott Alexander (Yvain) · 2011-12-12T15:25:15.960Z · LW(p) · GW(p)

Well, "reality isn't weird" can mean a couple of different things. "Weird" is a two-part predicate like "sexiness"; things are only weird in reference to some particular mind's preconceptions. Even Yog-Sothoth doesn't seem weird to his own mother.

But if we use the word "weird" as a red flag to tell others that they can expect to be surprised or confused when entering a certain field, as long as we can predict that their minds and preconceptions work somewhat like ours, it's a useful word.

I think Eliezer's "reality is not weird" post was just trying to say that we can't blame reality for being weird, or expect things to be irreducibly weird even after we challenge our preconceptions. I don't think Eliezer was saying that we can't describe anything as "weird" if it actually exists; after all, he himself has been known to describe certain potential laws of physics as weird.

(man, basing an argument on the trivial word choices of a venerated community leader spotted in an old archive makes me feel so Jewish)

Replies from: Eugine_Nier, TheOtherDave
comment by Eugine_Nier · 2011-12-13T07:50:03.643Z · LW(p) · GW(p)

I think Eliezer's "reality is not weird" post was just trying to say that we can't blame reality for being weird,

But one can blame a theory for finding reality weird. In particular, you seem to be using "weird" to mean frequently behaves in ways that don't agree with our models. That should cause you to lower your confidence in the models.

comment by TheOtherDave · 2011-12-12T17:10:44.211Z · LW(p) · GW(p)

basing an argument on the trivial word choices of a venerated community leader spotted in an old archive makes me feel so Jewish

"Yes: that too is the tradition."

comment by buybuydandavis · 2011-12-12T08:11:31.798Z · LW(p) · GW(p)

And reality knows more. That's why I advocate checking with reality.

comment by buybuydandavis · 2011-12-12T08:10:16.953Z · LW(p) · GW(p)

Absent any other prior, why would you use anything other than "My body will react to hormones the same way most other people's bodies react to hormones"?

First, because I am not absent other informational priors. I have a lifetime of informational priors about my own body. I also have access to pubmed, wikipedia, my 23andme genomic data, my personal medical history, my family's medical history, and lab testing services that can take accurate measurements of me.

There are no clinical trials that have controlled for that information.

Second, because I know others are not identical to me. Basing my choices solely on some statistical outcome on a pool of patients where I have none of that kind of information, and indeed the doctors involved didn't take that information, and didn't factor it into their solutions for their patients, strikes me as throwing out most all of my relevant data and trusting the results produced by a blind man with a shotgun.

Moreover, refusing to experiment on yourself is to refuse to look at reality and take actual data about the system you're interested in - you. That's poor decisions theory, poor inference, and poor problem solving.

Yes, medicine is weird. Therefore, instead of thinking that you have it all worked out, or that a clinical trial has it all worked out for you, the rational thing to do is to evaluate options that might work, their costs and risks, try things, take measurements, update your model based on that additional data, and try again. Sure, if there are clinical trials, avail yourself of that information as well. Nice place to find candidate treatments. But you're deluded if you think a positive result means it will assuredly work for you, and you're deluded if you think a negative results, a "failure to reject", means it won't. At a minimum, if the trial didn't have a crossover study, it hasn't ruled out that the treatment is a perfect cure for some subset of people with the problem.

Any decent doctor I've had has basically said that all treatments are experiments for a particular person - maybe it will work for you, maybe not.

I don't know that I have incentives any different from anyone else with a malady. I wish to get better. I recognize that there are risks involved in the attempts to get better. What doctors fail to appreciate, probably because it's not really their problem, is that doing nothing also has a cost - the likely continuance of my malady.

We don't limit our pool of potential solutions to our problems to solutions "validated" by double blinded placebo controlled trials in any other aspect of life, because it isn't rational to do so. It's not rational for medical problems either.

Replies from: Yvain, Vladimir_Nesov, Vladimir_Nesov
comment by Scott Alexander (Yvain) · 2011-12-12T15:05:26.577Z · LW(p) · GW(p)

It's not about a positive result meaning something will "assuredly work for you". Only a Sith deals in absolutes. It's about cost-benefit analysis.

To give an example, no reasonable person would self-experiment to see if cyanide cures their rash. Although there's a distant probability your body has some wildly unusual reaction to cyanide in which it cures rashes, it's much more likely that cyanide will kill you, the same way it kills everyone else. Although it might be worth a shot if cyanide had no downside, we have very strong evidence that on average it has a very large downside.

The same is true of HRT. People were using it to improve their cardiovascular health. We found that, on average, it decreases cardiovascular health. You can still try using it on the grounds that it might paradoxically increase yours, but on average, you will lose utility.

Consider the analogy to a lottery. You have different numbers than everyone else does. Just because someone else lost the lottery with their numbers, doesn't mean you will lose the lottery with your numbers. But if we study all lottery participants for ten years and find that on average they lose money, then unless you have a specific reason to think your numbers are better than everyone else's (not just different), you should expect to lose money too.

Now things would be different with a treatment with no downside (like eating a lot of some kind of food, or taking a safe and cheap supplement) - as long as you don't mind the loss of time and money you can experiment all you want with those (though I still think you'd have trouble with bias and privileging the hypothesis, and that a rational person wouldn't find a lot of these harmless self-experiments worth the time and the money at all). And things would be different if the potential benefit and potential harm had different levels of utility for you: for example, if you wanted to cure your joint pain so badly you didn't mind risking heart attack as a side effect. I think this is what you're aiming at in your post above, and for those cases, I agree with you.

But when you're taking a treatment like HRT which is intended to prevent heart attacks, but actually on average increases heart attacks, then shut up and multiply.

Also, don't call it "self-experimentation" when you're talking about preventing cardiovascular disease, since you never end up with any usable self-data (as opposed to, say, self-experimenting with medication for joint pain, where you might get a strong result of your joint pain disappearing that you can trace with some confidence to the medication). Call it what it is - gambling.

comment by Vladimir_Nesov · 2011-12-12T16:27:23.619Z · LW(p) · GW(p)

We don't limit our pool of potential solutions to our problems to solutions "validated" by double blinded placebo controlled trials in any other aspect of life, because it isn't rational to do so.

Wrong. We don't do it because either there are no publications that answer our questions, so that we have to use something else, or because we have a more convenient method that works. Please don't appeal to "rationality".

Replies from: buybuydandavis
comment by buybuydandavis · 2011-12-13T03:22:07.840Z · LW(p) · GW(p)

Wrong. We don't do it because either there are no publications that answer our questions, so that we have to use something else, or because we have a more convenient method that works. Please don't appeal to "rationality".

I don't want to argue about who "we" is. I don't so limit myself. YMMV.

I accurately identified a failing strategy of finding solutions. I see no reason not to accurately identify a failure in rationality as such, and every reason to do so.

comment by Vladimir_Nesov · 2011-12-12T16:25:17.516Z · LW(p) · GW(p)

Second, because I know others are not identical to me. Basing my choices solely on some statistical outcome on a pool of patients where I have none of that kind of information, and indeed the doctors involved didn't take that information, and didn't factor it into their solutions for their patients, strikes me as throwing out most all of my relevant data and trusting the results produced by a blind man with a shotgun.

Information is relevant only to the extent you can use it. How specifically can you use it to improve on prior provided by studies, and why would that modified estimate be an improvement? (Every improvement is a change, but not every change is an improvement.)

comment by DanielLC · 2011-12-11T04:29:57.432Z · LW(p) · GW(p)

Since when are Bayesians enamored of the results of trials against a null hypothesis that throws away almost all the relevant information about you, and leaves the decision about treatment for you up to the results of a 95% confidence interval on people who are not you, for whom you've also thrown away (or never collected) most of the relevant data?

You say that like it isn't evidence, rather than simply being less powerful than it could have been. There is only a 5% chance of getting a false positive with a 95% confidence interval. Ignoring additional evidence will not change that.

but the best indicator of whether someone works for you is a trial on you.

It's hard to tell if something's working if you use it on one person with no control group. You can't tell how much of what symptoms are caused by the drug.

Maybe it's just because I'm a little older than the mean here, but I don't think I have decades to wait for decades of longitudinal studies on life extension strategies

You certainly can't find life-extension stuff that way. By the time you know how well it works, it's too late to decide whether or not to use it.

Replies from: FAWS, buybuydandavis
comment by FAWS · 2011-12-11T10:36:30.203Z · LW(p) · GW(p)

You say that like it isn't evidence, rather than simply being less powerful than it could have been. There is only a 5% chance of getting a false positive with a 95% confidence interval. Ignoring additional evidence will not change that.

But that does not tell you that how likely a given positive result is a false positive, you'd also need to know what fraction of (implicitly) tested hypotheses is true.

Replies from: DanielLC
comment by DanielLC · 2011-12-12T04:54:23.224Z · LW(p) · GW(p)

If it's on the border of the confidence interval, the probability of false positive is 50%. If it's twice as far as that, it's 5%. That should give an okay idea of where the range is for that. I'd much prefer something with a 99% confidence interval, but a 95% one is still pretty good. If the effect is just barely statistically significant, the odds ratio is still 10:1

I'm not sure how likely it is for their hypothesis to be true, but it's likely enough for them to risk spending money checking.

comment by buybuydandavis · 2011-12-12T08:21:50.549Z · LW(p) · GW(p)

You say that like it isn't evidence, rather than simply being less powerful than it could have been. There is only a 5% chance of getting a false positive with a 95% confidence interval. Ignoring additional evidence will not change that.

There are so many things wrong with this paragraph, but I'll limit my comments.

Where did I state or imply that the results of a clinical trial are not evidence?

What epistemic claims do you think you're justified in making about an individual based on a failure to reject a null hypothesis for a sample statistic for a population for a particular treatment regiment?

I would note, that if you're a Frequentist, failure to reject an hypothesis is technically, literally, no evidence of anything. But I"m not a frequentist, so I'm free to actually use the data collected in this "failure" for my inferences, but generally, failures to reject at 95% proves little.

Ignoring additional evidence will not change that.

And ignoring the evidence that something works for you will not change whether it work for you either.

It's hard to tell if something's working if you use it on one person with no control group.

Really? I can't sleep. I take a pill, and I fall asleep. I don't take a pill, and I don't fall asleep. How many trials of that do you need to be convinced?

You certainly can't find life-extension stuff that way. By the time you know how well it works, it's too late to decide whether or not to use it.

You can't find a lot of solutions to medical problems that way, because there is an ocean of problems that no one has the financial interest to spend millions of dollars for the clinical trial.

You admit that you can't find life extension solutions that way. Maybe you'll equally admit that the same applies to a host of other solutions. Whatever you include in the class of "not amenable to solution by clinical trial", what is your solution in those cases? Do nothing?

Replies from: DanielLC
comment by DanielLC · 2011-12-12T20:17:02.396Z · LW(p) · GW(p)

Where did I state or imply that the results of a clinical trial are not evidence?

You certainly didn't state it, but I got the impression that you don't think you should pay much attention to clinical trials. Perhaps I inferred what wasn't there.

but generally, failures to reject at 95% proves little.

If failure to reject the null hypothesis proves little, then you're unlikely to reject it even if it's false, which raises the question of why you did the study in the first place. You clearly should have made a big enough study to actually notice things.

Really? I can't sleep. I take a pill, and I fall asleep. I don't take a pill, and I don't fall asleep. How many trials of that do you need to be convinced?

I guess in that case (and many others, such as pain relief) it would work. The example in the original comment was life extension, where it clearly would not. Even so, you could rely on studies to tell you how likely it is to work, and how well it will work if it does. You also can't just rely on testing them yourself, given that only a tiny fraction of things help you sleep, but the same goes for studies.

You admit that you can't find life extension solutions that way.

I meant that you can't find them by testing them on yourself, since you can only do it once, unlike something to help you sleep. You can find them by clinical trials. You just saw an example of that.

comment by Grognor · 2011-12-11T11:30:58.054Z · LW(p) · GW(p)

A huge collection of anecdotes about pseudoscience ruining lives: http://whatstheharm.net/

Replies from: Jayson_Virissimo
comment by Jayson_Virissimo · 2011-12-11T13:12:36.604Z · LW(p) · GW(p)

A huge collection of anecdotes about pseudoscience ruining lives: http://whatstheharm.net/

Know of any collection of anecdotes about science ruining lives?

Replies from: J_Taylor
comment by MatthewBaker · 2011-12-11T02:57:14.157Z · LW(p) · GW(p)

So sad when science is failed by its warriors :(