PredictionBook.com - Track your calibration
post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-10-14T00:08:43.863Z · LW · GW · Legacy · 53 commentsContents
53 comments
Our hosts at Tricycle Developments have created PredictionBook.com, which lets you make predictions and then track your calibration - see whether things you assigned a 70% probability happen 7 times out of 10.
The major challenge with a tool like this is (a) coming up with good short-term predictions to track (b) maintaining your will to keep on tracking yourself even if the results are discouraging, as they probably will be.
I think the main motivation to actually use it, would be rationalists challenging each other to put a prediction on the record and track the results - I'm going to try to remember to do this the next time Michael Vassar says "X%" and I assign a different probability. (Vassar would have won quite a few points for his superior predictions of Singularity Summit 2009 attendance - I was pessimistic, Vassar was accurate.)
53 comments
Comments sorted by top scores.
comment by Cyan · 2009-10-14T14:02:27.793Z · LW(p) · GW(p)
maintaining your will to keep on tracking yourself even if the results are discouraging, as they probably will be...
I predict with probability 0.95 that my 95% intervals will contain the quantity I'm estimating around 50% of the time.
Replies from: gwern, LauraABJcomment by Jack · 2009-10-19T04:56:53.370Z · LW(p) · GW(p)
Eliezer, you ought to be ashamed of yourself!
comment by CannibalSmith · 2009-10-14T11:05:42.365Z · LW(p) · GW(p)
The sun will rise tomorrow morning
( 80% confidence; 5 wagers; 1 comment )
O_o
Replies from: Jayson_Virissimo↑ comment by Jayson_Virissimo · 2011-10-31T07:30:14.156Z · LW(p) · GW(p)
- David Hume is fence sitting 236 years ago.
↑ comment by gwern · 2011-10-31T14:48:21.749Z · LW(p) · GW(p)
Very funny, but to be fair, he would've given it a higher probability*, even if he might not have reinvented Laplace's law of succession and been able to give the sun rising a 1/1826250 chance of not rising.
* Remember, the empiricism vs rationalism debate could be summed up as 'is infinite certainty possible?'
comment by Jonathan_Graehl · 2009-10-14T01:41:06.422Z · LW(p) · GW(p)
Calibration may be achievable by a general procedure of making and testing (banded) predictions, but I wouldn't trust anyone's calibration in a particular domain on evidence of calibration in another.
In other words, people will have studied the accuracy of only some of their maps.
Replies from: gwern↑ comment by gwern · 2011-08-17T22:36:09.445Z · LW(p) · GW(p)
Do you have any evidence for this? I don't remember any strongly domain-specific results in Tetlock's study, the book I read about calibration in business, or any studies. Nor does Wikipedia mention anything except domain experts being overconfident (as opposed to people being random outside their domain even when supposedly calibrated, as you imply), which is fixable with calibration training.
And this is what I would expect given that the question is not about accuracy (one would hope experts would win in a particular domain) but about calibration - why can't one accurately assess, in general, one's ignorance?
(I have >1100 predictions registered on PB.com and >=240 judged so far; I can't say I've noticed any especial domain-related correlations.)
Replies from: Jonathan_Graehl, Jonathan_Graehl, JoshuaZ↑ comment by Jonathan_Graehl · 2011-08-18T06:02:25.264Z · LW(p) · GW(p)
p.s. that's a lot of predictions :)
Replies from: lessdazed↑ comment by lessdazed · 2011-08-18T07:10:30.604Z · LW(p) · GW(p)
How many would you have thought gwern had?
Replies from: Jonathan_Graehl↑ comment by Jonathan_Graehl · 2011-08-18T07:23:32.804Z · LW(p) · GW(p)
I found this question puzzling, and difficult to answer (I'm sleep deprived). Funny joke if you were sneakily trying to get me to make a prediction.
Unfortunately I'm pretty well anchored now.
I'd expect LW-haunters who decide to make predictions at PB.com to make 15 on the first day and 10 in the next year (with a mode of 0).
↑ comment by Jonathan_Graehl · 2011-08-18T06:00:56.744Z · LW(p) · GW(p)
Your point regarding the overconfidence of most domain experts is a strong one. I've updated :) This is not quite antipodal to the incompetent most overestimating their percentile competence - D-K.
I was merely imagining, without evidence, that some of the calibration training would be general and some would be domain specific. Certainly you'd learn to calibrate, in general. You just wouldn't automatically be calibrated in all domains. Obviously, if you've optimized on your expertise in a domain (or worse: on getting credit for a single bold overconfident guess), then I don't expect you to have optimized your calibration for that domain. In fact, I have only a weak opinion about whether domain experts should be better or worse calibrated on average in their natural state. I'm guessing they'll overly signal confidence (to their professional+status benefit) moreso than that they're really more overconfident (when it comes to betting their own money).
Replies from: gwern↑ comment by gwern · 2011-08-18T16:31:01.541Z · LW(p) · GW(p)
Fortunately, Dunning-Kruger does not seem to be universal (not that anyone who would understand or care about calibration would also be in the stupid-enough quartiles in the first place).
Certainly you'd learn to calibrate, in general. You just wouldn't automatically be calibrated in all domains.
Again, I don't see why I couldn't. All I need is a good understanding of what I know, and then anytime I run into predictions on things I don't know about, I should be able to estimate my ignorance and adjust my predictions closer to 50% as appropriate. If I am mistaken, well, in some areas I will be underconfident and in some overconfident, and they balance out.
Replies from: Jonathan_Graehl↑ comment by Jonathan_Graehl · 2011-08-18T21:23:46.848Z · LW(p) · GW(p)
If there's a single thing mainly responsible for making people poor estimators of their numerical certainty (judged against reality), then you're probably right. For example, it makes sense for me to be overconfident in my pronouncements if I want people to listen to me, and there's little chance of me being caught in my overconfidence. This motivation is strong and universal. But I can learn to realize that I'm effectively lying (everyone does it, so maybe I should persist in most arenas), and report more honestly and accurately, if only to myself, after just a little practice in the skill of soliciting the right numbers for my level of information about the proposition I'm judging.
I have no data, so I'll disengage until I have some.
↑ comment by JoshuaZ · 2011-08-17T23:32:07.094Z · LW(p) · GW(p)
(I have >1100 predictions registered on PB.com and >=240 judged so far; I can't say I've noticed any especial domain-related correlations.)
Note that there are some large classes of predictions which by nature will strongly cluster and won't show up until a fair bit in the future. For example there are various AI related predictions going about 100 years out. You've placed bets on 12 of them by my count. They strongly correlate with each other (for example general AI by 2018 and general AI by 2030). For that sort of issue it is very hard to notice domain related correlation when almost nothing in the domain has reached its judgement date yet. There are other issues with this sort of thing as well, such as a variety of the long-term computational complexity predictions (I'm ignoring here the Dick Lipton short-term statements which everyone seems to think are just extremely optimistic.). Have there been enough different domains that have had a lot of questions that one could notice domain specific predictions?
Replies from: gwerncomment by [deleted] · 2012-10-14T22:22:31.823Z · LW(p) · GW(p)
Hi. Who do I go to to request that PredictionBook have an option not to see other people's estimates before adding your estimate to someone else's prediction? I'm anchoring on other people's estimates and it's preventing me from using the site to calibrate myself without generating a lot of my own predictions.
Replies from: gwern↑ comment by gwern · 2012-10-15T00:01:39.381Z · LW(p) · GW(p)
You'd go to the PredictionBook GitHub repo to open a bug report; but PB is mostly in maintenance mode, so unless you're a Ruby programmer...
Replies from: Nonecomment by rwallace · 2009-10-14T10:39:26.883Z · LW(p) · GW(p)
+1 Interesting! I've put in a prediction... and also pressed the wrong button on somebody else's prediction (for which the time hasn't elapsed yet) and marked it judged right, hopefully clicking Unknown undoes that...
The advantage of a site like this having been brought to the attention of geeks is that there are at least a few predictions listed to which my answer isn't "how the heck would I know?" :)
Replies from: rwallace↑ comment by rwallace · 2009-10-14T10:46:34.036Z · LW(p) · GW(p)
Seems like a few other people have been doing the pressing the wrong button thing, if I'm now understanding the user interface correctly? I've tried setting some of those still in the future predictions to unknown, hopefully that's the right thing to do. If so, would it be possible to change the user interface to avoid this error?
Replies from: Emile↑ comment by Emile · 2009-10-14T14:53:36.700Z · LW(p) · GW(p)
Same here - once I entered a percentage, I wasn't sure which button to press, I hesitated between "right" (meaning the percentage I was giving was my confidence that it was right) and "my 2 cents" (which I thought only applieds to when you entered a comment). I selected "right", which was wrong.
The interface needs a bit of polishing.
Replies from: ektimocomment by gwern · 2010-07-30T06:35:55.288Z · LW(p) · GW(p)
I have spent some time extracting bets and predictions from Long Bets under my account.
So far I have all open bets with fixed dates imported, and roughly a third of the predictions.
It would be nice if a bunch of LWers could go in and put down their own probabilities. It's true that most of them don't come due any time soon - looking at upcoming predictions I see the first Long Bet item coming up in 5 months, followed by 2 or 3 within half a year after that. But the more who use it, the more short-term predictions and the more useful. The rich get richer, etc.
Replies from: gwern↑ comment by gwern · 2010-08-02T07:06:20.664Z · LW(p) · GW(p)
I've finished importing all the sensible bets and predictions from LB. I suppose the next target is Wrong Tomorrow, and then I'll turn to Intrade.
(I see next to no contributions from LWers. This disappoints me; do we all think we are well-calibrated or what?)
Replies from: gwern↑ comment by gwern · 2010-08-11T05:59:40.043Z · LW(p) · GW(p)
Wrong Tomorrow turns out to be something like half or more expired predictions which haven't been judged (and only the moderators/admins can do that, so...). Imported the outstanding predictions faster than I expected, so that's done.
Next is Intrade and our 2010 predictions, unless anyone has other suggestions.
comment by kess3r · 2009-10-14T23:22:54.518Z · LW(p) · GW(p)
This is pure awesome. Finally something has been done! This is akin to the mythbusters going on TV and doing science instead of just talking about how awesome science is.
Apologies for my little rant above.
As for the site itself, other than being awesome, it needs a few tweaks. There is no place to discuss the site itself and possible improvements to it. Also, I wish there was a feature to hide the result until after I vote.
Replies from: mattcomment by DanArmak · 2009-10-14T01:11:52.835Z · LW(p) · GW(p)
This is quite interesting & exciting.
Are they planning on adding features relevant to a prediction market (apart from betting money)? E.g., tracking better reputation/score based on success or transitive trust; or tracking the overall predicted value of a prediction with many bets, weighed by the success/reputation/... of the betters.
Replies from: Eliezer_Yudkowsky↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-10-14T02:13:21.969Z · LW(p) · GW(p)
Whether they add features will depend on whether people seem interested in using it, they say.
Replies from: matt↑ comment by matt · 2009-10-14T04:53:13.179Z · LW(p) · GW(p)
Official answer: Eliezer's right. If we see traffic growing we'll invest in further development.
We can think of many things we could do to make the site better… but those users who currently use it don't use it enough, and if they tell their friends about it their friends don't become regular users (often enough).
Hosting the current code is very cheap and easy, so the site's in little danger of being shut down, but we won't be developing it further unless you guys and gals (and your friends, and their friends) pile on the love.
Replies from: kess3r↑ comment by kess3r · 2009-10-15T22:41:41.704Z · LW(p) · GW(p)
Just out of curiosity, are you a startup, a non profit or a guy doing a side project?
I predict the site's userbase will not explode overnight but will escalate in the shape of a hockey stick. That's how these things usually happen. You will have to keep improving it even while the userbase is still low, otherwise people will think the site is dying and they will stop showing up. Interesting things need to already be happening on the site before a larger audience will keep coming back to it, not vice versa.
Also, you need to add documentation no matter how simple and intuitive you think the sites features are. They don't seem as intuitive from the outside. By 'documentation' I mean a short and EXPLICIT description of what each feature does. I like the 'help' button near the timeframe for the prediction. You could add help buttons next to everything. Also a faq would be nice.
Overall I think the site has great potential. Keep up the good work.
Replies from: matt↑ comment by matt · 2009-10-16T04:35:57.224Z · LW(p) · GW(p)
Just out of curiosity, are you a startup, a non profit or a guy doing a side project?
We're Investling, which is a handfull of startups and an IT consultancy. We're for-profit, with some non-profit projects on the side (in part because we'll make more profits if we can help save the world from surprise conversion to paperclips). The majority of our non-profit work is SIAI related.
I predict the site's userbase will not explode overnight but will escalate in the shape of a hockey stick. […]
Some projects follow that pattern. Some projects never hockey-stick. How can you tell which curve you're riding?
We have many projects running: some have maintained exponential growth since we became involved; some are too young to judge; and some are on the low end of a curve that may be a hockey stick and may just be a project that doesn't have any legs. I very much hope that the LW crowd will latch on to PBook (keep coming back, tell your friends, etc.). If you do (we do - several of us are very keen LWers) and we see traffic growing, we'll flood more resources into the project. If it languishes we'll continue to host it and may even open source it, but it seems more sensible to flood our resources into projects that are winning. I really don't want to see PBook die, but I'm trying to count warm fuzzies consciously.
Also, you need to add documentation […]
We know the documentation is sparse (or, more precisely, the user interface isn't intuitive - documentation is evidence of a UI failure and good design is self-documenting). If you guys are still around in 14 days we should talk about more dev resources.
Replies from: thomblake, gwern↑ comment by thomblake · 2009-10-16T12:55:24.307Z · LW(p) · GW(p)
good design is self-documenting
Yes yes yes. Four times yes.
and we see traffic growing
Right now the UI is so slow / bad that I couldn't see myself using it.
Replies from: anonym↑ comment by anonym · 2009-10-18T19:31:27.665Z · LW(p) · GW(p)
Agreed on the UI being incredibly confusing (and slow).
In terms of usability, if they just moved the judgment buttons down below, added text like "Render final judgment on this prediction" to make it obvious what judgment does, and changed "My 2 cents" to "Submit Estimate" or something like that, it would be a huge improvement over the current. These sorts of very minor cosmetic UI changes would be trivial to make.
↑ comment by gwern · 2010-07-29T06:03:28.269Z · LW(p) · GW(p)
I just signed up and did a bunch of predictions. Here are my initial impressions:
The majority of our non-profit work is SIAI related.
A tool like PB is like spaced repetition flash card programs or writing Wikipedia articles - a long-term tool. Some benefits appear quickly, yes, but the bulk of the benefits arrive over years or decades. (PB is somewhat like Long Bets.)
As the saying goes, "In the long run, the utility of all non-Free software approaches zero. All non-Free software is a dead end." If I invest time in PB, what guarantee do I have that I will be able to get my data out of PB when* it dies, especially for topics I didn't write? Are you guys going to license the content under a CC license?
(You should do it early, while there still isn't too much content - once Wikipedia got large, it took years and years and a unique one-time exemption by the FSF to liberate its content from the GFDL into a CC license.)
* And it will die eventually. Every site either dies or evolves out of recognition.
** My data is vastly more important to me than the website software. If I had to, I could run a personal PB in just a flat text file, after all.
2) comments are ridiculously constrained. I dunno if you guys were trying for some sort of auto-Twitter compatibility, but it's really annoying. If you need to dump comments on Twitter and they're too long, then just truncate them.
3) I just judged a Michael Jackson-related prediction wrong, with a citation that the predicted event happened in the wrong year. But in the history section, my comment never appeared!
My current workaround is to make a 0 or 50% prediction (wrong/right), explain my reasoning as best as I can in so short a space, and then separately mark it wrong/right. This is unfair to my score, since obviously I can choose 0 or 100% and always be right.
4) The black boxes on prediction pages (eg. "Join this prediction") are horrible. I was convinced for the longest time that they were buttons to push, and that they were disabled by some JavaScript pokery until I went and read the page source.
5) Newlines in comments do not get translated to a space or two in the comment; they get translated to nothing whatsoever.
6) No apparent way to edit 'due dates' for predictions; many unjudged predictions can't be judged at all because they seem to have been created expired.
7) On userpages, the most recent prediction/action gets split in half by the statistics graph in my Firefox; screenshot.
8) Years get interpreted badly. '2029' becomes - somehow - 2 hours from right now, as opposed to 19 years. screenshot
9) The in-browser JS date checker seems to be quite inaccurate. I've been entering all my dates as '1 January 2024' and the like, which it has never validated - but which turn into the right date when actually submitted.
10) The site is slow. And it seems to be on the server itself. I'm the only user right now, and yet predictions can take as much as 10 seconds to enter. I don't understand how it can be so slow, given that a prediction is a 4-tuple of (date,prediction,owner, user-confidence) which probably adds up to less than a kilobyte of data.
Replies from: matt↑ comment by matt · 2010-08-03T07:40:27.975Z · LW(p) · GW(p)
It's our intention to Open Source the PredictionBook code… and has been for at least six months, but we keep not quite getting around to it. It's also my intention to write a top level post about why I think PBook isn't getting much traffic (it being slow is only one reason).
Any one with a reputation on this site that wants access to the code before we get it open sourced is welcome to contact me directly. The code's on github.com and is written in Ruby on Rails.
(gwern, if you want access send me your promise that you'll behave responsibly and your github username.)
Replies from: gwern↑ comment by gwern · 2010-08-03T08:41:02.789Z · LW(p) · GW(p)
but we keep not quite getting around to it.
I know the feeling.
It's also my intention to write a top level post about why I think PBook isn't getting much traffic (it being slow is only one reason).
I have my own theories (mostly that people aren't very interested in truth-seeking, pace Hanson, and that the benefits are too long-term, cf. SRS flashcards), but that's just my perspective as a user.
gwern, if you want access send me your promise that you'll behave responsibly and your github username.
Do you mean access to the data? As I said, I'd like to edit the dates on some of the predictions...
I've signed up at http://github.com/gwern
Replies from: matt↑ comment by matt · 2010-08-04T10:39:24.428Z · LW(p) · GW(p)
Do you mean access to the data?
No. People have private predictions in there, so I don't think I can in clear conscience give you access to anyone's predictions but your own (and giving you access to only your own is about half as much work as properly open sourcing the project). I mean the code… and you didn't quite send me your promise that you'll behave responsibly yet.
Replies from: gwern↑ comment by gwern · 2010-08-08T12:55:34.152Z · LW(p) · GW(p)
Well, alright. I see I didn't specifically say 'public data'. That's what I want.
and you didn't quite send me your promise that you'll behave responsibly yet.
I think it's kind of silly to ask for such a promise, but for what it's worth, you have it. (What irresponsible things could I do with just the codebase? I'm no cracker to find security holes and exploit them on the live site.)
comment by Jack · 2009-10-19T16:49:12.360Z · LW(p) · GW(p)
Can some explain to me what is going on with this prediction given this prediction. I'm not going crazy, right:? People are confused.
Replies from: thomblake↑ comment by thomblake · 2009-10-19T20:39:48.889Z · LW(p) · GW(p)
Interesting - by the time I checked this, it looks like there aren't any inconsistent estimates.
Replies from: Jack↑ comment by Jack · 2009-10-19T21:20:33.230Z · LW(p) · GW(p)
Right now the later prediction has 9 points higher probability than the sooner prediction. I counted two or three cases of individual users posting higher probabilities for the later prediction. Unless they're 're really confident the first cryonic revival takes place during that decade they're making a huge mistake. My best explanation is that they just saw a farther off date and assumed a higher probability of everything...
comment by ektimo · 2009-10-16T18:30:03.816Z · LW(p) · GW(p)
Some of my predictions are of the sort "the stock market will fall 50% tomorrow with 20% odds" (not a real prediction!). If it did happen I should get huge credit, but would it show up as negative credit since I predicted there was only a 20% chance it would happen? Is there some way it would be possible to do this kind of prediction with PredictionBook?
I predict this comment will get less than 4 points by Oct. 19 with 75% odds.
Replies from: gwern↑ comment by gwern · 2009-10-16T19:39:09.596Z · LW(p) · GW(p)
If it did happen I should get huge credit, but would it show up as negative credit since I predicted there was only a 20% chance it would happen? Is there some way it would be possible to do this kind of prediction with PredictionBook?
It seems to me like you're asking about 2 different issues: the first is not desiring to be penalized for making low-probability bets; but that should be handled already by low confidences - if you figure it at 1 in 5, then after only a few failed bets things should and ought to start looking bad for you, but if at 1 in thousands, each failed prediction ought to affect your score very little.
Presumably PredictionBook is offering richer rewards for low-probability successes, just like a 5% share on a prediction market pays out (proportionately) much more than a 95% share would; on net you would do the same.
The second issue is that you seem to think that certain predictions are simply harder to predict better than chance, and that you should be rewarded for going out on a limb? (20% odds on a big market bet tomorrow is much more detailed than the default 1-in-thousands-chance-per-day prediction.)
I don't know what the fair reward here is. If few people are making that prediction at all, then it should be easy to do better than them. In prediction markets, one expects that unpopular markets will be easier to arbitrage and beat - the thicker the market, the more efficient; standard economics. So in a sense, unpopular predictions are their own reward.
But this doesn't prevent making obscure predictions ('will I remember to change my underwear tomorrow?') Nor would it seem to adequately cover 'big questions' like open scientific puzzles or predictions about technological development (think the union of Longbets & Intrade). Maybe there could be a bonus for having predictions pay out with confidence levels higher than the average? This would attract well-calibrated people to predictions where others are not informed or are too pessimistic.
comment by UnholySmoke · 2009-10-15T13:24:07.246Z · LW(p) · GW(p)
Cracking idea, like it a lot. Hofstadter would jump for joy, and in his honour: