Mechanism Design for AI Safety - Reading Group Curriculum
post by Rubi J. Hudson (Rubi) · 2022-10-25T03:54:20.777Z · LW · GW · 3 commentsContents
3 comments
3 comments
Comments sorted by top scores.
comment by Joel Becker (joel-becker) · 2022-12-27T21:25:16.281Z · LW(p) · GW(p)
Interesting, thank you! Has the group gotten around to discussing something like "lessons from contract theory or corporate governance for factored cognition-style proposals" at all?
Replies from: Rubi↑ comment by Rubi J. Hudson (Rubi) · 2022-12-28T19:15:09.158Z · LW(p) · GW(p)
Not yet! We're now meeting on a monthly schedule, and there has only been one meeting since completing the list here. I'll look into finding a relevant paper on the subject, but if you have any recommendations please let me know.
comment by jacob_cannell · 2022-10-25T19:53:19.058Z · LW(p) · GW(p)
General question: mechanism design generally must cope with unknown agent beliefs/values and thus use truth incentivizing mechanisms, etc.
Has there been much/any work on using ZK proof techniques to prove some of these properties to make coordination easier? (ie proof that I agent X is the result of some program P using at least some amount of compute - training on dataset D using optimizer O and utility function U, etc)