Critique-a-Thon of AI Alignment Plans

iknownothing-1

Critique-a-Thon of AI Alignment Plans

post by Iknownothing · 2023-12-05T20:50:07.661Z · ? · GW · 3 comments

  Benefits: 
  The Critique-a-Thon is a 3 day event, with 2 Stages :
    Stage 1 - Adding Critiques (up to 16th December)
  Stage 2 - Discussion & Write-Up (December 17th to 18th)
    Prizes
None
3 comments

AI-Plans.com will be hosting a Critique-a-Thon from December 16th to 18th, where participants will be proposing and discussing critiques of AI Alignment plans.

Judges include:

Nate Soares [LW · GW], President of MIRI @na
@Ramana Kumar [LW · GW], researcher at DeepMind
@Peter S. Park [LW · GW] , MIT postdoc at the Tegmark lab
@Charbel-Raphaël [LW · GW] Segerie head of AI unit at EffiSciences
@Robert Miles [LW · GW] - who will be judging Communication

Benefits:

The Critique-a-Thon provides an excellent opportunity to gain critical insights into AI Safety by diving into what makes an alignment plan likely to work or fail.
We will highlight key elements of alignment plans, which will be fed back to the researchers and can be used to improve their plans.
It’s also a great opportunity to get feedback from expert judges.

The Critique-a-Thon is a 3 day event, with 2 Stages :

Stage 1 - Adding Critiques (up to 16th December)

We’ll be adding critiques to alignment plans on ai-plans.com, in the form of two categories: Strengths and Vulnerabilities.

A “Strength” should indicate how a plan is useful as a solution for alignment. A “Vulnerability” should do the opposite.
Prizes will go to those who add the most critiques- critiques voted into a negative score won’t be counted.

You can get started on this stage whenever you like- you could start adding critiques today!.
The deadline is December 16th, midnight, GMT.

Prizes
1st place: $100
2nd place: $60
3rd place: $40

Stage 2 - Discussion & Write-Up (December 17th to 18th)

We’ll go into pairs. Each pair will select an alignment plan that has had critiques proposed. One will make the case for, the other will make the case against, the strength/vulnerability being true and accurate.
Then, the next day, we'll swap sides and finish with a write-up of the proposed strength/vulnerability.
See previous winning critiques for examples:

August Critique-a-thon

September Critique-a-Thon

These discussions will help refine the critiques and strengthen the arguments and swapping reduces any disadvantage from a lack of prior knowledge on a topic.
If your opponent knows a lot more about the topic and makes points you hadn’t thought of, you can use them yourself the next day and see what the counters are - and identify if they’re something to be mentioned in your write-up, which is what will be judged.
These discussions will take place on the Discord.

Join here 👉 https://discord.gg/aGVtu5JyjJ

Prizes

1st Place : $400
2nd Place : $250
3rd Place : $150

3 comments

Comments sorted by top scores.

comment by momom2 (amaury-lorin) · 2023-12-05T21:25:53.763Z · ? · GW

Epistemic status: Had a couple conversations on AI Plans with the founder, participated in the previous critique-a-thon. I've helped AI Plans a bit before, so I'm probably biased towards optimism.

Neglectedness: Very neglected. AI Plans wants to become a database of alignment plans which would allow quick evaluation of whether an approach is worth spending effort on, at least as a quick sanity check for outsiders. I can't believe it didn't exist before! Still very rough and unuseable for that purpose for now, but that's what the critique-a-thon is for: hopefully, as critiques accumulate and more votes are fed into the system, it will become more useful.

Tractability: High. It may be hard to make winning critiques, but considering the current state of AI Plans, it's very easy to make an improvement. If anything, you can filter out the obvious failures.

Impact: I'm not as confident here. If AI Plans works as intended, it could be very valuable to allocate funds more efficiently and save time by figuring out which approaches should be discarded. However, it's possible that it will just fail to gain steam and become a stillborn project. I've followed it for a couple months, and I've been positively surprised several times, so I'm pretty optimistic.

The bar to entry is pretty low; if you've been following AIS blogs or forums for several months, you probably have something to contribute. It's very unlikely you'll have a negative impact.
It may also be an opportunity for you to discuss with AIS-minded people and check your opinions on a practical problem; if you feel like an armchair safetyist and tired to be one, this is the occasion to level up [LW · GW].
Another way to think about it is that the engagement was very low in previous critique-a-thon so if you have a few hours to spare, you can make some easy money and fuzzies even if you're not sure about the value in utilons.

comment by Iknownothing · 2023-12-05T20:54:24.788Z · ? · GW

Hi, I'm Kabir Kumar, the founder of AI-Plans.com, I'm happy to answer any questions you might have about the site or the Critique-a-Thon!

comment by Iknownothing · 2023-12-07T17:03:06.190Z · ? · GW

Update: Rob Miles will also be judging some critiques! He'll be judging Communication!

Critique-a-Thon of AI Alignment Plans

Contents

Benefits:

The Critique-a-Thon is a 3 day event, with 2 Stages :

Stage 1 - Adding Critiques (up to 16th December)

Stage 2 - Discussion & Write-Up (December 17th to 18th)

Prizes

3 comments