1001 PredictionBook Nights
post by gwern · 2011-10-08T16:04:53.708Z · LW · GW · Legacy · 49 commentsContents
49 comments
I explain what I've learned from creating and judging thousands of predictions on personal and real-world matters: the challenges of maintenance, the limitations of prediction markets, the interesting applications to my other essays, skepticism about pundits and unreflective persons' opinions, my own biases like optimism & planning fallacy, 3 very useful heuristics/approaches, and the costs of these activities in general.
Plus an extremely geeky parody of Fate/Stay Night.
This essay exists as a large section of my page on predictions markets on gwern.net
: http://www.gwern.net/Prediction%20markets#1001-predictionbook-nights
49 comments
Comments sorted by top scores.
comment by taw · 2011-10-08T13:52:25.380Z · LW(p) · GW(p)
Another problem:
After fall of Tripoli and 90% of Libya falling under control of NTC, intrade contract that "Muammar al-Gaddafi to no longer be leader of Libya before midnight ET 31 Aug 2011" was getting lower each day intrade did not close it as done, since it became the contract that "intrade will not fuck up judging".
There was zero disagreement about facts and predictions of facts on the ground, everybody agreed that Gaddafi's government fell, so it turned into very heavy trading about intrade's sanity, and people estimated chance of sane judgment as ~67% event.
Intrade judged sanely that time, but with so much distrust in judges, prediction markets on anything more clear-cut than election results simply cannot work.
In one of previous threads about it, SilasBarta had an example where intrade fucked up, so I doubt this is an off chance.
Replies from: Jayson_Virissimo↑ comment by Jayson_Virissimo · 2011-10-18T14:49:00.975Z · LW(p) · GW(p)
Japan to announce it has acquired a nuclear weapon before midnight ET on 31 Dec 2011: 11% chance
Japan to announce it has acquired a nuclear weapon before midnight ET on 31 Dec 2012: 15% chance
Japan to announce it has acquired a nuclear weapon before midnight ET on 31 Dec 2013: 5% chance
This was found here at 07:45, 10/18/2011.
Replies from: gwern↑ comment by gwern · 2011-10-18T15:19:46.272Z · LW(p) · GW(p)
Yes, that's pretty amusing. On the other hand, remember that the further in the future a contract is, the more mispriced you can expect it to be - Intrade does not pay interest or otherwise compensate you for opportunity cost. (I believe this was covered in one of the footnotes.)
comment by Psychosmurf · 2011-10-10T19:01:35.712Z · LW(p) · GW(p)
This detachment itself seems to help accuracy; I was struck by a psychology study demonstrating that not only are people better at falsifying theories put forth by other people, they are better at falsifying when pretending it is held by an imaginary friend!
I think we've just derived a new heuristic. Pretend that your beliefs are held by your imaginary friend.
Replies from: gwern, Kaj_Sotala, lessdazed↑ comment by gwern · 2011-10-10T20:45:01.607Z · LW(p) · GW(p)
I agree. When I first read the essay, I went to myself so that is why 'rubber-duck debugging' works!
Replies from: gwern↑ comment by Kaj_Sotala · 2011-10-10T21:43:11.535Z · LW(p) · GW(p)
An explanation of why this works.
Short version: suppose that reasoning in the sense of "consciously studying the premises of conclusions and evaluating them, as well as generating consciously understood chains of inference" evolved mainly to persuade others of your views. Then it's only to be expected that we will only study and generate theories at a superficial level by default, because there's no reason to waste time evaluating our conscious justifications if they aren't going to be used for anything. If we do expect them to be subjected to closer scrutiny by outsiders, then we're much more likely to actually inspect the justifications for flaws, so that we'll know how to counter any objections the others will bring up.
Replies from: BrandonReinhart↑ comment by BrandonReinhart · 2011-10-12T05:38:17.827Z · LW(p) · GW(p)
An exercise we ran at minicamp -- which seemed valuable, but requires a partner -- is to take and argue for a position for some time. Then, at some interval, you switch and argue against the position (while your partner defends). I used this once at work, but haven't had a chance since. The suggestion to swap sides mid argument surprised the two, but did lead to a more effective discussion.
The exercise sometimes felt forced if the topic was artificial and veered too far off course, or if one side was simply convinced and felt that further artificial defense was unproductive.
Still, it's a riff on this theme.
comment by xv15 · 2011-10-08T23:11:56.480Z · LW(p) · GW(p)
As we evaluate predictions for accuracy, one thing we should all be hyper-aware of is that predictions can affect behavior. It is always at least a little bit suspect to evaluate a prediction simply for accuracy, when its goal might very well have been more than accuracy.
If I bet you $100 on even odds that I'm going to lose 10 lbs this month, it doesn't necessarily indicate that I think the probability of it happening is > 50%. Perhaps this bet increases the probability of it happening from 10% to 40%, and maybe that 30% increase in probability is worth the expected $20 cost. More generally, even with no money on the table, such a bet might motivate me, and so it cannot be inferred from the fact that I made the bet that I actually believe the numbers in the bet.
Or let's talk about the planning fallacy. Instrumentally, it might make sense to hold the belief that you'll get your project done before the deadline, if that sort of thinking motivates you to finish it earlier than you would if you were completely resigned to finishing it late from the get-go. It might even be detrimental to your project-finishing goals to become enlightened about the reality.
And of course, predictions can affect other people than the predicter. It is ludicrous to look at the public predictions of the Fed chairman and call him a fool when he turns out to be wrong.
Sometimes a prediction is a prediction. But this is definitely something to keep in mind. And gwern, given that you have all this data now, you might find it interesting to see if there are any systematic differences between predictions that do and don't have any significant effect on behavior.
comment by Jayson_Virissimo · 2011-10-10T15:48:10.996Z · LW(p) · GW(p)
Okay, I'm sold. See you on PredictionBook...
comment by torekp · 2011-10-10T22:44:28.522Z · LW(p) · GW(p)
I am considerably more skeptical of op-eds and other punditry, after tracking the rare clear predictions they made
In the case of a few well-studied pundits you should examine the evidence gathered by other prediction trackers. Some pundits are well outside the dumb luck range on a ten-point scale:
The best? Paul Krugman with a PVS of 8.2 (You can see a screenshot of his score sheet to the right. Note: Score sheets for each of the pundits are in the full text document).
The worst? Cal Thomas, with a PVS of -8.7 (You read that right. Negative eight point seven...).
Kinda surprising to me that you can beat dumb luck in inaccuracy. I hope they do a followup.
Replies from: Eugine_Nier, gwern, JoshuaZ↑ comment by Eugine_Nier · 2011-10-11T04:21:01.238Z · LW(p) · GW(p)
Since the study focused on the period around the 2008 elections, which the Democrats won on nearly all levels, and since most pundits tend to be biased towards believing that what they wish would happen will happen, it's not surprising that liberals' predictions did better and some conservatives scored worse than random. I suspect we'd see the trend go the other way for say predictions about the 2010 midterms. The fundamental problem is that the predictions weren't independent.
Replies from: torekp, Will_Sawin↑ comment by torekp · 2011-10-12T00:17:57.544Z · LW(p) · GW(p)
Since the correlation between liberalism and correctness was weak, most pundits probably wouldn't gain or lose much score in a more politically-average year. In Krugman's case, for example, most of the scored predictions were economic not political forecasts. In Cal Thomas's case however, your explanation might basically work.
Replies from: Eugine_Nier↑ comment by Eugine_Nier · 2011-10-12T03:23:06.420Z · LW(p) · GW(p)
True, of course in Krugman's case I suspect most of his predictions amounted to predicting that the financial crisis was going to be really but, and thus were also correlated.
Replies from: gjm↑ comment by gjm · 2012-04-20T09:37:40.422Z · LW(p) · GW(p)
Another LW discussion of Krugman's alleged accuracy pointed both here and to a spreadsheet with the actual predictions. About half of his predictions did indeed amount to saying that the financial crisis was going to be really bad. There were some political ones too but they weren't of the "my team will win" form, and he did well on those as well.
↑ comment by Will_Sawin · 2011-10-11T04:41:33.409Z · LW(p) · GW(p)
In particular, one should be skeptical of having lots of people who consistently do worse than average.
I think, though, that it would, in fact, be worthwhile to do the analysis combining 2008 and 2010. I think Paul Krugman had already started panicking by then.
More interesting might be to see how much data it takes for prediction markets to beat most/all pundits.
Replies from: gwern, wedrifid↑ comment by wedrifid · 2011-10-11T05:49:24.055Z · LW(p) · GW(p)
In particular, one should be skeptical of having lots of people who consistently do worse than average.
Outliers? That's actually what I would expect. People with superior prediction skills can become significantly positive. The same people could use their information backwards to become significantly negative but it is damn hard to reliably lose to a vaguely efficient market significantly if you are stupid (or uninformed).
Replies from: Will_Sawin↑ comment by Will_Sawin · 2011-10-12T05:15:13.328Z · LW(p) · GW(p)
Sorry, I should have said "worse than random". To do worse than random, one would have to take a source of good predictions and twist it into a source of bad ones. The only plausible explanation I could think of for this is that you know a group of people who are good at predicting and habitually disagree with them. It seems like there should be far less such people than there are legitimate good predictors.
It's easy to lose to an efficient market if you're not playing the efficient market's games. If you take your stated probability and the market's implied probability and make a bet somewhere in between, you are likely to lose money over time.
Replies from: wedrifid, gjm↑ comment by wedrifid · 2011-10-12T06:04:30.843Z · LW(p) · GW(p)
Sorry, I should have said "worse than random".
We are in complete agreement, and I should have been explicit and said I was refining a detail on an approximately valid point!
It seems like there should be far less such people than there are legitimate good predictors.
And it seems like those that do exist should have less money to be betting on markets! If not then it would seem like the other group is making some darn poor strategic predictions regarding the rest of their life choices!
It's easy to lose to an efficient market if you're not playing the efficient market's games.
Yes, like it is easy for a thief to get all my jewelry if I break into his house and put it on the table. Which I suppose is the sort of thing they do on Burn Notice to frame the bad guys for crimes. Which makes me wonder if it would be possible to frame someone for, say, insider trading or industrial espionage by losing money to someone such that their windfall is suspicious.
Replies from: Will_Sawin↑ comment by Will_Sawin · 2011-10-12T06:47:53.978Z · LW(p) · GW(p)
My point is that you're losing in a context of prediction accuracy, not losing money.
↑ comment by gjm · 2012-04-20T09:40:28.002Z · LW(p) · GW(p)
that you know a group of people who are good at predicting and habitually disagree with them.
It seems to me that this is exactly the sort of thing that can really happen in politics. Suppose you have two political parties, the Greens and the Blues, and that for historical reasons it happens that the Greens have adopted some ways of thinking that actually work well, and the Blues make it their practice to disagree with everything distinctive that the Greens say.
(And it could easily happen that there are more Blues than Greens, in which case you'd get lots of systematically bad predictors.)
↑ comment by gwern · 2011-10-11T20:15:35.220Z · LW(p) · GW(p)
Yes, I remember that study - it wasn't as long term as I would like, and I always wonder about the quality of a study conducted by students, but it was interesting anyway.
Replies from: None↑ comment by [deleted] · 2011-10-11T20:19:31.291Z · LW(p) · GW(p)
The last time I cited this study, I remember that their sample size was well under thirty for each of their pundits. At that level, what's the point of statistics?
Replies from: gwern↑ comment by gwern · 2012-08-25T23:29:13.533Z · LW(p) · GW(p)
If the effect size is large enough, 30 observations is plenty & enough to do stats on. Go through a power calculation sometime with, say, d=0.7.
↑ comment by JoshuaZ · 2011-10-10T23:01:05.814Z · LW(p) · GW(p)
Kinda surprising to me that you can beat dumb luck in inaccuracy.
It shouldn't be. Assume that your pundits in general do no better than chance. In a large sample, some of them are going to have to have to do really badly. Even if your pool on average is better than chance one should still expect a few much worse.
That said, even given that, -8.7 by their metric looks really badly.
According to that study, being a lawyer by training was one of the things that caused predictors to do badly. Note that Cal Thomas doesn't fall into that category.
comment by beoShaffer · 2011-10-08T03:42:15.321Z · LW(p) · GW(p)
This post inspired me to actually create a prediction book account. So far I only have one prediction but its a start.
comment by Eugine_Nier · 2011-10-11T04:50:06.692Z · LW(p) · GW(p)
I was thinking about what's a good way to measure how well-calibrated you are. The most obvious way is to say your well calibrated if, e.g., 70% of your predictions at 70% confidence level are correct; however, that implicitly assumes your predictions are independent. You can try getting around this by making lots of predictions in different areas; however, this leaves open the possibility that you might be differently calibrated in different areas.
comment by Craig_Heldreth · 2011-10-08T19:16:18.613Z · LW(p) · GW(p)
There is a lot of good stuff to think about in this article. I especially like the poker post on "this is what 5% feels like" as it is in line with a number of things which I have enjoyed doing to calibrate my own intuition. Some of those items posted on prediction book confuse me, however.
Example: "‘I predict…a major earthquake, magnitude seven or more will happen within twenty kilometers of a line from nagano city, nagano to matsudo, chiba’ posted by gwern." And then you predict 100% against. (Ignoring the conventional wisdom that 1.0 is not a probability) I wonder what you are doing with such an item which seems a waste of keystrokes and brain thought cycles. Can you go into (a little more) what you are doing there?
Replies from: gwern↑ comment by gwern · 2011-10-08T19:44:32.639Z · LW(p) · GW(p)
PB doesn't allow you to put in decimals, so it's sort of understood that 100% simply means >99.5% or so. There I simply noticed a stupid prediction and decided to note it down; easy predictions are relaxing, and I had just put in scores of Eva predictions.
comment by JoshuaZ · 2011-10-12T13:25:19.155Z · LW(p) · GW(p)
A related issue I seem to be having on Prediction Book. Judging from my past predictions I'm generally underconfident by quite a bit. According to the outcome graph no matter what percentage I estimate for something to happen, as long as I am giving it more than 50%, it seems to happen around 88% of the time. Slightly more for the >90% section, and similarly for the rounding to 100%, so my probability are at least weakly correlated in the right direction. Any thoughts on what I can do to better calibrate myself?
Replies from: gwern↑ comment by gwern · 2011-10-12T15:03:19.479Z · LW(p) · GW(p)
You don't have very many predictions judged, so I'm not sure how reliable your worry is - only 8-11 predictions for each decile seems quite possible to just get lucky. Assuming that's not the case, you could try mechanically bumping up every prediction by 10% and see what happens.
Replies from: Eugine_Nier, JoshuaZ↑ comment by Eugine_Nier · 2011-10-12T23:24:43.108Z · LW(p) · GW(p)
Also I notice there are many sets of predictions of the the form X will happen in 1 month/2 months/1 year/... with a separate prediction for each time period. How are these scored? Since these types of predictions are highly correlated scoring them individually can cause people to appear over or under confident.
Replies from: gwern↑ comment by gwern · 2011-10-13T01:50:12.507Z · LW(p) · GW(p)
Is there any reason to think the sets won't balance out eventually?
Replies from: Eugine_Nier↑ comment by Eugine_Nier · 2011-10-15T02:47:11.904Z · LW(p) · GW(p)
The will, for very large values of eventually.
comment by Paul Crowley (ciphergoth) · 2011-10-08T07:34:50.557Z · LW(p) · GW(p)
Footnotes beyond 10 have gone missing.
Very interesting stuff - thank you!
Replies from: gwerncomment by JoshuaZ · 2011-10-08T02:27:29.648Z · LW(p) · GW(p)
The link under fee changes does not work.
Otherwise this seems like an interesting summary. I've been thinking of making a similar post with my experience on PredictionBook, but you've been a lot more systematic about your predictions.
Replies from: saturncomment by soreff · 2012-01-01T01:40:33.704Z · LW(p) · GW(p)
1514 public predictions? Gwern, you don't just have more courage than I have. You have orders of magnitude more courage than I have.
Replies from: gwern↑ comment by gwern · 2012-01-01T03:38:42.655Z · LW(p) · GW(p)
I'm actually up to 1889 as of tonight.
I'm a lot more comfortable doing it because now I have evidence I'm actually pretty decent at it. (For example, on Good Judgement Project, I finished the year 50 out of the 206 in my specific group, despite typoing at least 1 entry.)
comment by taw · 2011-10-08T14:20:04.219Z · LW(p) · GW(p)
Footnotes like that are a horrible idea from usability point of view. This is a wrong medium. Don't we have some kind of spoiler tags?
Replies from: gwern