Ambiguity in Prediction Market Resolution is Harmful

aphyer

Ambiguity in Prediction Market Resolution is Harmful

post by aphyer · 2022-09-26T16:22:48.809Z · LW · GW · 17 comments

  SOMEBODY SET US UP THE BOMB
  CURRENT MANIFOLD QUESTIONS
  MARKET STRUCTURE
  ACTIONS
None
17 comments

(Disclaimers: I work in the financial industry, though not in a way related to prediction markets. Anything I write here is my opinion and not that of my employer.)

SOMEBODY SET US UP THE BOMB

You've heard of this new 'prediction market' fad, so you decide to try it out. You make a Manifold account and look for interesting questions.

There's a question about the 2022 LessWrong Petrov Day celebration "What % of Petrov Day will elapse before someone uses the big red button to take down Less Wrong's frontpage?" You navigate to the LW homepage to find out more information...and the site is down! Someone has pressed the button only a couple hours into Petrov Day! If it just went down now, that would mean the market would resolve to 10%...and if it went down a while ago the market would resolve even lower...but the market is currently trading at ~20%.

You're new to the idea of prediction markets, but even you have heard that if a market seems to be doing something silly, you can profit from correcting it. You buy the market down to 10%, and sit happily waiting for your profits.

Huh. [LW(p) · GW(p)] It seems like LW admins have put the homepage back up? Apparently there was some kind of bug letting unauthorized users press the button. That's a bit odd? Still, it doesn't seem like it should affect you? Someone did use the big red button to take down LW's homepage only a few hours into Petrov Day, so the market should....

Wait, what? The market is not resolving? Apparently whoever organized it decided that the webpage going down due to someone pressing the red button "didn't count" and is letting the market continue and waiting for a "real" takedown of the webpage.

You are now short a large investment at prices ranging from 10-20%, while the market has responded to this announcement by going back up to ~30%...and it's rising further as the webpage continues not going down.

You thank your lucky stars that this fiasco only involved play money, delete your Manifold account, and resolve never to support real-money prediction markets for any genuine issue.

CURRENT MANIFOLD QUESTIONS

The Petrov Day question above is what prompted me to write this. However, I don't think it's an isolated bad question. I think many other Manifold questions have larger levels of ambiguity. (The Petrov Day question at least only ended up ambiguous due to an unusual event).

A few examples from scrolling down my Manifold search page (apologies to the authors of these. I don't think you're unusually bad people, I'm just picking on you as examples of what I consider a common problem):

"Will Russia use chemical or biological weapons in 2022?"
- This market seems to me much more likely to end up in a state of ambiguity than to resolve cleanly to YES.
- There is no fine print as to what happens if e.g. Ukraine alleges Russian use of chemical weapons, Russia denies it, the UN says 'this is a serious accusation and we are forming an investigative committee', and people form opinions based on their priors. That seems to me far more likely than 'clear verified use of chemical weapons by Russia'.
- EDITED TO ADD (thank you gbear605 for the comment): while the question statement doesn't mention anything, there is a question to the author in the comments about how they will resolve an ambiguous situation. Sadly the answer is extremely subjective, which preserves most of the problem.
"Will Belarus extradite (en masse) Russian males trying to leave the country in November 2022?"
- Fine print says 'According to news from reputable Russian or Belorussian sources'.
- No mention of what counts as 'en masse'. 100 people? 1,000? 10,000?
- If someone asked 'will illegal immigrants enter the US en masse in 2023', the ambiguity would be clear. I would expect people to evaluate that question differently based on their stance on immigration in US politics.
"Will AI outcompete best humans in competitive programming before the end of 2023"
- What 'competitive programming'? Are there specific contests we'll look at?
- What will happen if DeepMind claims their AI to be able to program to a world-class level and yet fails to enter it in any verifiable external competitions (*cough* like what they did with chess *cough*)?

MARKET STRUCTURE

One of the most valuable things a market resolution process can do is be unambiguous. Once reality has happened, it should be easy for someone who looks at reality, and looks at the market, to determine how the market should resolve.

If this is not the case, traders in the market are not trading on their knowledge of reality - they are trading on their knowledge of the market resolution process. And market prices aren't informative about reality - they are informative about the resolution process.

(While I don't work in this area directly, I've heard that similar things happen in real markets in the field of distressed-debt investing and credit default swaps - investments in these areas apparently are frequently decided by court cases over the legality of various things, and so serious investing in this area requires serious legal support and careful reading of documents).

The Manifold questions above are related to the real-world issues they ask about. However, if I were trying to trade real money in them, by far the most important thing I would want to determine would be the political leanings/prior views of the authors:

If Ukraine alleges Russian use of chemical weapons, but without hard proof, and the UN says 'we are performing an investigation', what will the author of the first question do?
If Russian news sources make a lot of noise about a small number of extraditions, and hard numbers aren't available, what will the author of the second question do?
If DeepMind claims their AI can program well, but does not exhibit it in any competitions, and does not replace thousands of programmers in their jobs, what will the author of the third question do?

These are real-world questions of their own in a sense. There is real-world research you could do to investigate them (e.g. you could look at the author's prior statements on Russia/Ukraine/AlphaZero). These questions are not what these markets claimed to be about...so what? These questions are what these markets are actually about.

If you put serious money on the Russian-chemical-weapons Manifold market and got hedge funds interested in it (do not do this), the question these hedge funds would actually be researching to decide between 5% and 10% would be 'how sympathetic has the author previously been to Russia/Ukraine'.

This is a problem.

ACTIONS

My recommended courses of action would be:

When writing a market, put a few seconds of time into considering 'what things could happen'.
- Try to arrange your question to be about a narrow factual issue rather than about a vague fuzzy thing/matter of opinion. (There is a reason we have financial markets that pay off based on Google's profits, and not markets that pay off based on Google's impact on society. Even if the latter might be more useful, it is much harder to actually define).
- If you are outsourcing market evaluation to some external source, make it clear which one.
- If part of market evaluation needs to be based in your personal judgment, at least make it clear that this is the case, and make it clear under which circumstances you will resolve the market which way.
When evaluating a market under edge cases, evaluate exactly the written-down question and not the question you feel like you would have written down if you had predicted this exact outcome.
As a trader, you should be very mistrustful of current Manifold markets. This doesn't mean you can't trade in them! Prediction markets are robust to silliness like this - but the way in which they are robust is that your job as a trader is not to predict the real-world outcome, but to predict the psychology of whoever evaluates the market. If you don't think that's your comparative advantage, you should look elsewhere. (Or just screw around with play money. Either way.)
As a Manifold organizer, you should consider this to be a problem. I'm not exactly sure what a good solution to it on your end would be - I haven't thought about it for very long - but as a demo of 'how do prediction markets work' Manifold currently does not fill me with confidence.

17 comments

Comments sorted by top scores.

comment by Matthew Barnett (matthew-barnett) · 2022-09-26T19:07:28.267Z · LW(p) · GW(p)

On Metaculus, people have attempted to solve this problem by having moderators review each question before it goes live. The result has generally been that most vague questions require substantial re-editing before they can appear on the website.

Also, over time, users have become quite skilled at noticing when questions are likely to resolve ambiguously.

Even with this cultural difference between the sites, many questions still resolve ambiguously on Metaculus. It's just really, really hard to say exactly what you mean when predicting the future.

Replies from: sinclair-chen

↑ comment by Sinclair Chen (sinclair-chen) · 2022-09-26T22:06:08.282Z · LW(p) · GW(p)

I think Metaculus's level of verbosity in resolution criteria is bad in that it makes questions longer to write and longer to understand (because it takes longer to read and because its more complex). Part of the goal of Manifold is to remove trivial inconveniences [LW · GW] so that people actually forecast at all, and so that we get markets on literally everything.
I think the synthesis here is to have a subset of high quality markets (clear resolution criteria, clear meta-resolution norms) but still have a fat tail of medium-quality questions.

comment by David Chee (david-chee) · 2022-09-26T21:47:03.859Z · LW(p) · GW(p)

David here from Manifold.

Really cool to see you taking the time to present your thoughts in a way that is constructive whilst not holding back with your criticisms. A lot of people would have just gotten frustrated with their experience on the site and have just moved on, so thank you for this post! We definitely think about these things a lot as we design Manifold and this post could definitely have an impact on where we lean within tough design problems.

First off, let me briefly discuss why we are bullish on using a creator-based resolution mechanism.

Creator reputation. This is definitely the most important point. Regular users quickly learn which markets are run by credible creators who know to set clear criteria. And in the rare instance an unexpected variable occurs making the resolution less clear, these creators will often defer to what the comments agree should be the correct resolution. Admittedly this is definitely not obvious for new users who are interacting with our markets for the first time and are often times trading in good faith that all the markets are reliable. We are aiming to get a reputation system set up at some point, but we haven't had the capacity to do so yet
Creator resolution facilitates types of questions that otherwise couldn't exist.
- Fun, personal markets that often only the creator will know the results. eg. this one from James Medlock.
- Subjective/recommendation markets.
Creators resolving works >95% of the time. For the majority of markets, there are no issues. Very rarely does a creator's internal bias or ulterior motive affect the resolution. It's unfortunate that the first market you interacted with may be one of the minority and we will be more conscious about ensuring big/featured markets are reliable. I strongly believe if you were to use the site more that you would find a lot of your concerns are exacerbated by your experience with an outlier.
Interpretation of a result is often necessary. As others have alluded to in the comments, there is often conflict between a market that is useful and answers what you want to know vs one that is extremely well-defined in advance. It will always be the case that questions come up where someone has to make a call and it makes sense that the person creating the market is the best person to do so. We have heard some good suggestions regarding crowd-sourcing resolutions so might be open to possible types of implementations of that.

Now that that's out of the way, let's discuss some of the specific criticisms you outlined. The following is more a reflection of my own thoughts and hasn't necessarily been discussed in depth with the team.

The market probability becomes more reflective of the creator's tendencies and bias and not the event itself.
- I will agree that this is definitely true with some of our markets. But I actually think that is okay! One thing that sets Manifold apart from other prediction markets is that we support a wide range of markets that is only possible by giving creators full autonomy. I think it is a good thing that each individual creator is expressed in their market. Not all markets have to provide actionable, informative data even if this may not align with what is traditionally sought after with forecasting. Just to be clear though, one of our primary goals is to facilitate markets that provide actionable and informative discussion and data. But, I suspect as the site grows that it will become more obvious what the intent of each individual market is. There very well could be multiple instances of the same market which will actually have different probabilities based on who is the creator and subtle differences in resolution criteria.
Markets on Manifold are too ambiguous.
- I don't have too much to say on this as I agree with you that our markets are generally too ambiguous and what Jack said covers a lot of my own thoughts. We do encourage creators to try and be as clear as possible, but many people are still learning how to create questions. One of the main purposes of the site is to lower the entry-barrier to prediction markets and allow anyone to create one easily so it isn't a surprise a large number of markets aren't that great or clear. Fortunately, there is already a clear culture developing where users will comment when they see unambiguity or a possible loophole in a market. And I think you would find that our big serious markets tend to be fairly unambiguous at this point.
Manifold is untrustworthy.
- I'm not sure I agree with this but that may definitely be my bias haha. I do think though that users should regard every market they bet on with mild scepticism and not default to trusting the creator. Now, as a site is it optimal to have a bunch of unreliable content? Maybe not, particularly for the new user experience and I can empathise with why this may cause you to mistrust the site. But it's unrealistic to expect a site that gives its users full autonomy and does minimal curation to have only trustworthy content. But it is something we will need to think about more. Hopefully, it will be mostly solved once we introduce a reputation system of some sort. And from my experience, the number of markets that have been "untrustworthy" can be counted on one hand so far.

Thanks again for taking the time to try out our site (even if it didn't go as you expected) and for doing this write-up. Am happy to discuss things further. Maybe what you've outlined is a bigger problem than we've realised and we need to spend a lot more effort designing the site around it, but I'm not yet convinced. It is good to know though that it is turning some users away.

comment by jackc · 2022-09-26T18:39:54.441Z · LW(p) · GW(p)

I agree that ambiguity is bad, and most Manifold markets are probably too imprecise and ambiguous. My usual style is trying to be fairly precise in the forecasting questions I write, and I definitely second your recommendations!

However, I want to point out that the problem isn't just ambiguity, but really complexity. The more you try to nail down the resolution criteria, the more likely it becomes for there to be a serious mismatch between the top-line summary of the question (the question that you are actually interested in answering) and the detailed decision tree used to resolve it.

Here's a great example with a real-money, CFTC-regulated prediction market "Will student loan debt be forgiven?" on Kalshi. The rules were as unambiguous as possible up-front, specifying what would count as "broad-based debt relief" and things like that - things that people would reasonably think are important details to check. Unfortunately, it turned out that there was a huge detail in the resolution criteria that the market missed - the resolution rules required that the forgiveness be done via Executive Order or law. And in this case, the debt relief was clearly done but it wasn't done by law or executive order. So lots of traders lost lots of real money, and the prediction market clearly failed at its job. Link to the market: https://kalshi.com/events/SDEBT/markets/SDEBT-23JAN01. (Note: the page has been updated after the fact to highlight this issue.)

This is to highlight the point that writing unambiguous resolution criteria that also match the top-line summary question is really really hard! It's not uncommon on Metaculus for there to be a fairly clear subjective answer that differs from the actual resolution because of some issue that most people agree is a mere technicality. So I think that trying to disambiguate is good, but being willing to step back from technicalities is also potentially good. There are advantages to both approaches. I agree with you that the average market on Manifold is way too underspecified and ambiguous, but if you require Metaculus levels of precision then many people wouldn't be willing to spend the effort to ask the questions at all, and that is also a loss.

Replies from: jackc

↑ comment by jackc · 2022-09-26T18:55:07.798Z · LW(p) · GW(p)

Also, on the Petrov day market, let's suppose the question had been "What % of Petrov Day will elapse before someone uses the big red button to take down Less Wrong's frontpage for the rest of the day?" ("for the rest of the day" is the only change.) I would consider this reasonably unambiguous - if LW decides to bring the page back up because it was a "mistake" then it shouldn't resolve yet. But I suspect that people would have bet on that nearly the same as the actual question, and your hypothetical user who saw the site was down at 10% would also have been burned. It's still better to avoid the ambiguity, I agree, but the problem of traders being burned by details is still there even if you avoid ambiguity in the details.

This sort of thing happens in the financial markets too. I'm thinking of all the games that happen over credit-default-swaps, for example. It would be nice if we could magically reduce complexity to reduce the impact of these sorts of issues, but it's a risk market participants are taking on by trading in the markets, and I think the value the markets provide is still clearly worth it (or else people wouldn't be willing to trade in them)!

comment by Martin Randall (martin-randall) · 2022-09-26T17:51:37.512Z · LW(p) · GW(p)

Traders have the option, where a question is ambiguous, of asking the resolver how they would resolve it in some hypothetical scenario. This is true on Manifold as on Metaculus. I find this is normally more profitable for me than trying to get inside the head of the resolver.

There is a separate issue of resolver reputation, where some resolvers have a history of being biased in favor of their own positions, or just getting wrong. Definitely a weakness of current Manifold.

This post has a lot of good advice that I agree with, thanks for writing it.

Replies from: ChristianKl

↑ comment by ChristianKl · 2022-09-27T11:26:02.173Z · LW(p) · GW(p)

I find this is normally more profitable for me than trying to get inside the head of the resolver.

If you ask in the comments, then the information about how the reviewer is likely to answer becomes public knowledge. There's more profit to be made if you are able to correctly predict the behavior without it being public knowledge.

Replies from: martin-randall

↑ comment by Martin Randall (martin-randall) · 2022-09-27T14:09:05.579Z · LW(p) · GW(p)

Good point. Sometimes I do both. First I bet based on my attempt to get inside the head of the resolver. Then I ask them a question. When they respond I bet further (in one direction or another) based on their answer. When the market catches up with the new information I can slowly exit the market and bet elsewhere.

You're right that someone else can move the market based on the response first, but the site gives me a small assist: I get a notification when someone replies to my question, whereas nobody (yet) is subscribed to every comment thread. Maybe that will change with more volume. Also, since I asked the question, I probably have already thought about how different answers should move this and other markets.

Someone who was better at psychology-prediction than me has some better strategies, especially if someone else is betting on the market who thinks they are good at psychology-prediction and is not. There's lots of profit to be made that way, but also lots of loss.

comment by David Chee (david-chee) · 2022-09-27T23:12:20.441Z · LW(p) · GW(p)

FYI the creator of the market decided to resolve the market to N/A.

This means everyone's mana is fully restored as if they never interacted with the market.

comment by gwillen · 2022-09-26T21:53:34.103Z · LW(p) · GW(p)

Strongly agree about the existence of the problem. It's something I've put a bit of thought into.

One thing I think could help, in some cases, would be to split the market definition into

the question definition, and
the resolution method

And then specify the relationship between them. For example:

Question: How many reported covid cases will there be in the US on [DATE]?

Resolution method: Look at https://covid.cdc.gov/covid-data-tracker/ a week after [DATE] for the reported values for [DATE].

Resolution notes: "Whatever values are reported that day will be used unconditionally." or "If the values change within the following week, the updated values will be used instead." or "The resolution method is a guideline only; ultimately the intent of the question, as interpreted by the question author, overrides."

This will only solve a subset of ambiguous resolutions, but I think it would still be a big help to spell some of these things out more clearly.

comment by tailcalled · 2022-09-26T16:53:24.061Z · LW(p) · GW(p)

Markets should probably have multiple outcomes (which in politically controversial scenarios should ideally also be evaluated by different people with different biases) and allow people to make correlated bets over the outcomes.

comment by Slider · 2022-09-26T21:20:50.921Z · LW(p) · GW(p)

An off-hand attempt at market mechnanism to solve this. Have "resolution issue" predictors. Like "There will be a news paper article of lenght more than 20 cm written by a full time journalist about biological weapons AND the vague biological weapons question will resolve in the negative".

Replies from: sinclair-chen

↑ comment by Sinclair Chen (sinclair-chen) · 2022-09-26T22:13:52.802Z · LW(p) · GW(p)

Maybe we should just let people bet on N/A similar to Augur (with some stronger norm of resolving N/A in ambiguous cases)

comment by Sinclair Chen (sinclair-chen) · 2022-09-26T20:59:53.543Z · LW(p) · GW(p)

(Engineer at Manifold here.) I largely agree! Letting people make markets on anything means that many people will make poorly operationalized markets. Subjective resolution is good for bets on personal events among friends, which is an important use case, but it bad for questions with an audience bigger than that.

We need to do a better job of:
1. resolution reliability, like by adding a reputation system or letting creators delegate resolution to more objective individuals / courts.
2. helping users turn their vague uncertainties into objective questions - crucial but less straightforward.
3. surfacing higher quality content

comment by gbear605 · 2022-09-26T18:18:33.532Z · LW(p) · GW(p)

For the "Will Russia use chemical or biological weapons in 2022?" question, the creator provided information about an ambiguous outcome, though it seems very subjective:

If, when the question closes, there is widespread reporting that Russia did the attacks and there is not much reported doubt, then I will resolve YES. If it seems ambiguous I will either resolve as N/A or at a percentage that I think is reasonable. (Eg. resolve at 33% if I think there’s a 2/3 chance that the attacks were false flag attacks.) This of course would also go the other way if there are supposed Ukrainian attacks that are widely believed to be caused by Russia.

Replies from: aphyer

↑ comment by aphyer · 2022-09-26T18:23:22.047Z · LW(p) · GW(p)

I think there are several similar such markets - the one I was looking at was at https://manifold.markets/Gabrielle/will-russia-use-chemical-or-biologi-e790d5158f6a and lacks such a comment.

EDITED: Ah, you are correct and I am wrong, the text you posted is present, it's just down in the comments section rather than under the question itself. That does make this question less bad, though it's still a bit weird that the question had to wait for someone to ask the creator that (and, again, the ambiguity remains).

I'll update the doc with links to reduce confusion - did not do that originally out of a mix of not wanting to point too aggressively at people who wrote those questions and feeling lazy.

comment by jp · 2022-10-12T12:45:45.854Z · LW(p) · GW(p)

I really buy the argument Sinclair makes about reducing trivial inconveniences here. Let’s make a model.

Ambiguity has two main negative effects, according to me:

Reducing precision in the prediction, because some of the prediction is based around interpretation of the creator’s foibles rather than a “true” resolution of a better-specified question.
Making the forecaster’s lives worse, because they want to be forecasting world events, not guessing as to the creators behavior in unclear situations.

Let’s set 1. aside for now. 2. seems like a big deal for sure. But also big is the drag on creating prediction markets imposed by Metaculus-style process. The way to balance these two seems to me to be a question that hinges on what you want your impact to be. If you’re trying to make the world as good a place as possible, you might have quite a strong preference for there being plenty of markets that can be made with low overhead. If the experience for forecasters is bad enough, then you won’t get predictions on those questions, but my empirical belief is that Manifold is striking a better balance right now than Metaculus.

As for 1., again, we can answer it according to, what’s most useful for our goals, and again, I want to claim Manifold is doing well here.

tl;dr There are real tradeoffs here.

I like the post btw!

Ambiguity in Prediction Market Resolution is Harmful

Contents

SOMEBODY SET US UP THE BOMB

CURRENT MANIFOLD QUESTIONS

MARKET STRUCTURE

ACTIONS

17 comments