Darklight's Shortform

post by Darklight · 2021-02-20T15:01:33.890Z · LW · GW · 2 comments

2 comments

Comments sorted by top scores.

comment by Darklight · 2021-02-20T15:01:34.314Z · LW(p) · GW(p)

So, I had a thought.  The glory system [LW · GW] idea that I posted about earlier, if it leads to a successful, vibrant democratic community forum, could actually serve as a kind of dataset for value learning.  If each post has a number attached to it that indicates the aggregated approval of human beings, this can serve as a rough proxy for a kind of utility or Coherent Aggregated Volition.

Given that individual examples will probably be quite noisy, but averaged across a large amount of posts, it could function as a real world dataset, with the post content being the input, and the post's vote tally being the output label.  You could then train a supervised learning classifier or regressor that could then be used to guide a Friendly AI model, like a trained conscience.

This admittedly would not be provably Friendly, but as a vector of attack for the value learning problem, it is relatively straightforward to implement and probably more feasible in the short-run than anything else I've encountered.

Replies from: Darklight
comment by Darklight · 2021-02-20T15:09:01.110Z · LW(p) · GW(p)

Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.