No Clickbait - Misalignment Database
post by Kabir Kumar (kabir-kumar-1) · 2024-02-18T05:35:44.078Z · LW · GW · 10 commentsContents
10 comments
This is a database of cases of Misalignment - classified by Type of Misalignment, Type of AI, etc.
Link to add more:
https://docs.google.com/forms/d/e/1FAIpQLSfE7ZeSV6W_YmKYrgy7BaiFKj90dBJ2qDUaYXzbpi_ILEs9sQ/viewform?usp=sf_link
Link to the DB: https://docs.google.com/spreadsheets/d/1uXzWavy1mS0X-uQ21UPWHlAHjXFJoWWlN62EyKAoUmA/edit?usp=sharing
Made it last week.
Currently there are 115 entries - 62 of which are from the Specification Gaming db made by DeepMind https://deepmindsafetyresearch.medium.com/specification-gaming-the-flip-side-of-ai-ingenuity-c85bdb0deeb4
For some reason, as far as I know, this is the first public database like this.
The closest that I know of are the Specification Gaming database and the https://incidentdatabase.ai/
For a community that's supposed to be science fans, I'm pretty baffled at the lack of something as basic as this existing- among many, many other things.
If you know of any cases, please add them.
Edits:
Added link to the DB: https://docs.google.com/spreadsheets/d/1uXzWavy1mS0X-uQ21UPWHlAHjXFJoWWlN62EyKAoUmA/edit?usp=sharing
Made more clear what's DB, what's form.
10 comments
Comments sorted by top scores.
comment by sudhanshu_kasewa · 2024-02-23T16:27:46.205Z · LW(p) · GW(p)
It might be worth (someone) writing out what is meant by each kind of misalignment category, as used in the db. Objective misalignment, specific gaming, value misalignment all seem overlapping, and I'm not at all sure what physical misalignment is supposed to be pointing to.
Replies from: kabir-kumar-1↑ comment by Kabir Kumar (kabir-kumar-1) · 2024-02-26T17:51:31.238Z · LW(p) · GW(p)
for sure. right now it's just a google form and google sheets. would you be interested in taking charge of this?
Replies from: sudhanshu_kasewa↑ comment by sudhanshu_kasewa · 2024-03-08T12:13:37.639Z · LW(p) · GW(p)
No, this is not something I can undertake -- however, the effort itself need not be very complicated. You've already got a list of Misalignment types in the form: create a google doc with definitions/descriptions of each of these, and put a link to that doc in this question.
comment by quetzal_rainbow · 2024-02-20T09:50:25.016Z · LW(p) · GW(p)
There is only link to add database entry, without link to view database itself.
Replies from: kabir-kumar-1↑ comment by Kabir Kumar (kabir-kumar-1) · 2024-02-20T11:50:19.759Z · LW(p) · GW(p)
Ah, sorry, here's the link! https://docs.google.com/spreadsheets/d/1uXzWavy1mS0X-uQ21UPWHlAHjXFJoWWlN62EyKAoUmA/edit?usp=sharing
Thank you for pointing that out, also added it to the post!
comment by iva · 2024-02-20T09:31:30.205Z · LW(p) · GW(p)
I think you copy patsed the wrong link - the first link leads to a form one can use to add an example, not to the list of examples.
Replies from: kabir-kumar-1↑ comment by Kabir Kumar (kabir-kumar-1) · 2024-02-20T19:06:01.635Z · LW(p) · GW(p)
Thank you, I've labelled that as the form link now and added the DB link.
comment by Kabir Kumar (kabir-kumar-1) · 2024-02-20T08:51:50.698Z · LW(p) · GW(p)
Updated to 115.
comment by Tianyi (Alex) Qiu (TianyiQ) · 2024-02-20T13:29:08.699Z · LW(p) · GW(p)
There's also the goal misgeneralization database by DeepMind, in parallel to the misspecification one: blogpost, database.
Replies from: kabir-kumar-1↑ comment by Kabir Kumar (kabir-kumar-1) · 2024-02-20T19:03:01.185Z · LW(p) · GW(p)
Thank you! I'll add those as well!