Mechanism Design for AI Safety - Reading Group Curriculum

post by Rubi J. Hudson (Rubi) · 2022-10-25T03:54:20.777Z · LW · GW · 3 comments

Contents

3 comments

3 comments

Comments sorted by top scores.

comment by Joel Becker (joel-becker) · 2022-12-27T21:25:16.281Z · LW(p) · GW(p)

Interesting, thank you! Has the group gotten around to discussing something like "lessons from contract theory or corporate governance for factored cognition-style proposals" at all?

Replies from: Rubi
comment by Rubi J. Hudson (Rubi) · 2022-12-28T19:15:09.158Z · LW(p) · GW(p)

Not yet! We're now meeting on a monthly schedule, and there has only been one meeting since completing the list here. I'll look into finding a relevant paper on the subject, but if you have any recommendations please let me know.

comment by jacob_cannell · 2022-10-25T19:53:19.058Z · LW(p) · GW(p)

General question: mechanism design generally must cope with unknown agent beliefs/values and thus use truth incentivizing mechanisms, etc.

Has there been much/any work on using ZK proof techniques to prove some of these properties to make coordination easier? (ie proof that I agent X is the result of some program P using at least some amount of compute - training on dataset D using optimizer O and utility function U, etc)