post by Darklight
Comments sorted by top scores.
comment by Darklight ·
2021-02-20T15:01:34.314Z · LW(p) · GW(p)
So, I had a thought. The glory system [LW · GW] idea that I posted about earlier, if it leads to a successful, vibrant democratic community forum, could actually serve as a kind of dataset for value learning. If each post has a number attached to it that indicates the aggregated approval of human beings, this can serve as a rough proxy for a kind of utility or Coherent Aggregated Volition.
Given that individual examples will probably be quite noisy, but averaged across a large amount of posts, it could function as a real world dataset, with the post content being the input, and the post's vote tally being the output label. You could then train a supervised learning classifier or regressor that could then be used to guide a Friendly AI model, like a trained conscience.
This admittedly would not be provably Friendly, but as a vector of attack for the value learning problem, it is relatively straightforward to implement and probably more feasible in the short-run than anything else I've encountered.Replies from: Darklight
↑ comment by Darklight ·
2021-02-20T15:09:01.110Z · LW(p) · GW(p)
Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.