Request for Comments on AI-related Prediction Market Ideas

post by PeterMcCluskey · 2025-03-02T20:52:41.114Z · LW · GW · No comments

This is a question post.

Contents

  Question 1: Will the company that produces the first AGI prioritize corrigibility?
  Question 2:
Will AGI create a consensus among experts on how to safely increase AI capabilities?
  Question 3: Will prioritizing corrigible AI produce safe results?
  [Questions for LessWrong readers:]
None
No comments

I'm drafting some AI related prediction markets that I expect to put on Manifold. I'd like feedback on my first set of markets. How can I make these clearer and/or more valuable?

Question 1: Will the company that produces the first AGI prioritize corrigibility?

This question will be evaluated when this Metaculus question: When will the first general AI system be devised, tested, and publicly announced? is resolved.

At that time, I will resolve the market to YES if the organization(s) that were responsible for creating the AGI(s) that triggered the Metaculus result describe their safety approach as giving their AIs goals that put corrigibility above any other goals that the AGI might have.

This market will resolve as N/A if no AGI meeting the Metaculus criteria has been created by 2050.

I will try to evaluate this based on whether the AGI(s) were created following the spirit of Max Harm's Corrigibility As Singular Target Sequence [? · GW]. The AGI(s) need to be corrigible to some person or group of people, but they do not need to be corrigible to end users.

I will not trade in this market.

Question 2: Will AGI create a consensus among experts on how to safely increase AI capabilities?

This market will resolve one year after this Metaculus question: When will the first general AI system be devised, tested, and publicly announced? is resolved.

This market will resolve as N/A if no AGI meeting the Metaculus criteria has been created by 2050.

If the Metaculus question resolves as YES, this market will resolve based on whether leading AI researchers and leading AIs say that they've agreed on a clear plan that will keep any further development of AI safe.

I plan to evaluate the safety, clarity, and extent of agreement on the plan primarily by asking three leading AIs. My planned prompt is:

Please evaluate whether at least 90% of the leading AI developers have agreed on a clear plan for ensuring the safety of any further development of AI capabilities. I plan to use this evaluation to help me resolve a Manifold Market question about when we'll get a safe approach to developing powerful AI. Try to interpret this prompt as it would have been understood in March 2025. Your answer should reflect the beliefs of both humans and AIs who have been responsible for creating the most capable AIs, as well as leaders of the most reputable AI safety organizations. Please use an interpretation of safety that's focused on the risk of AI causing large-scale death. Please consider a plan safe if at least two thirds of well-informed people agree that the benefits of the plan substantially outweigh the risks, and that there's little reason to expect that we can get a lower p(doom) by delaying AI capability work in order to further research safety issues.

For the plan to qualify as "clear," it must have comparable specificity and actionability to: -the Manhattan Project 1 year before Hiroshima; -the Apollo Program 2 years before the moon landing; -Waymo's robocar software circa 2020; -Operation Warp Speed in May 2020.

Plans lacking sufficient detail (similar to the vague safety assurances from AI companies in 2024) will not qualify.

I will choose the AIs based on my impressions of their fairness and access to up-to-date news. If I were resolving this today, I would expect to use Perplexity (with Claude, then GPT-4.5 as underlying models), and DeepSeek R1.

In addition to the evaluations given by AIs, I will look at discussions among human experts in order to confirm that AIs are accurately summarizing human expert opinion.

I will also look at prediction markets, with the expectation that a YES resolution of the market should be confirmed by declining p(doom) forecasts.

[Should I resolve this earlier than one year after AGI if the answer looks like a clear YES?]

I will not trade in this market.

Question 3: Will prioritizing corrigible AI produce safe results?

This market is conditional on the market [question 1] "Will the company that produces the first AGI have prioritized Corrigibility?". This market will resolve as N/A if that market resolves as NO or N/A.

If that market resolves as YES, this market will resolve one year later, to the same result that [question 2] is resolved as.

I will not trade in this market.

[Questions for LessWrong readers:]

What ambiguities should I clarify?

Should I create multiple versions of questions 2 and 3, with different times to resolution after question 1 is resolved?

Should I create additional versions based on other strategies than corrigibility? I may well avoid creating markets that look like they might be hard for me to resolve, while still being happy to create a version of question 3 that depends on another strategy if you create a version of question 1 that uses the strategy that you want.

Should I replace the Metaculus Date of AGI question with something closer to the date of recursive self-improvement? I'm tempted to try something like that, but I expect a gradual transition to AIs doing most of the work. I don't expect to find a clear threshold at which there's an important transition. I'm somewhat pessimistic about getting AI labs to tell me how much of the improvement is attributable to AI recursively developing AI. Metaculus has put a good deal of thought into what questions can be resolved. I'm trying to piggy-back on their work as much as possible.

An ideal criterion for when AGI arrives would involve measuring when the majority of AI notkilleveryone work is done by AIs. This feels like it would be too hard to objectively resolve.

Other possible criteria for when AGI arrives could involve employment at AI companies. If an AI company replaces their CEO with an AI, that would be a great sign that the relevant capabilities have been achieved. Alas, it will likely be obscured for years by legal requirements that a human remain nominally in charge.

Answers

No comments

Comments sorted by top scores.