How to store human values on a computer

oliver-siegel

How to store human values on a computer

post by Oliver Siegel (oliver-siegel) · 2022-11-05T19:17:56.595Z · LW · GW · 17 comments

17 comments

17 comments

Comments sorted by top scores.

comment by Ruby · 2022-11-05T19:13:23.706Z · LW(p) · GW(p)

Moderator here. I approved this post but strong downvoted it (according to the "votes are for what you want to see more/less of" approach). It's not exactly obviously super wrong, but feels perhaps too basic and I don't want a tonne more of this on the site. Perhaps it feels like it's written without building on the Sequences which touches a lot on this. Though I could be pretty wrong and we should encourage people to think about AI wherever there at. Not sure. I think engaging on all previous AI content is unnecessary, but I will downvote if i think content doesn't feel like the author has read the Sequences. Hence downvote but still approved.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-06T03:02:10.162Z · LW(p) · GW(p)

Thank you! Could I get a link to "The Sequences" ? I can't find it here: https://www.lesswrong.com/tags/all [? · GW]

Replies from: quanticle, Ruby, maxwell-clarke

↑ comment by quanticle · 2022-11-06T04:32:19.648Z · LW(p) · GW(p)

https://www.readthesequences.com/

Edit: Specifically, you may wish to read: https://www.readthesequences.com/The-Hidden-Complexity-Of-Wishes

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-06T09:07:20.634Z · LW(p) · GW(p)

Thanks for sharing! Yes, is seems that the computational complexity could indeed explode at some point.

But then again, an average human brain is capable of storing common sense values and ethics, so unless there's a magic ingredient in the human brain, it's probably not impossible to rebuild it on a computer.

Then, with an artificial brain that has all the benefits of never fatiguing and such, we may come close to a somewhat useful Genie that can at least advise on the best course of action given all the possible pitfalls.

Even if it'll just be, say 25% better than the best human - all humans could get access to this Genie on their Smartphone, how cool would that be?

But I'll have to dig deeper into The Sequences, seems very comprehensive.

I found Monica Anderson's blog quite inspiring, as well. She writes about model free, holistic systems. https://experimental-epistemology.ai/

Replies from: quanticle

↑ comment by quanticle · 2022-11-07T05:13:19.342Z · LW(p) · GW(p)

But then again, an average human brain is capable of storing common sense values and ethics, so unless there’s a magic ingredient in the human brain, it’s probably not impossible to rebuild it on a computer.

Of course it's possible to rebuild human morality on a computer. There is, however, a vast unfathomable chasm between possible and easy.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-07T08:05:42.725Z · LW(p) · GW(p)

Yea, i agree!

But if it were easy, everyone would do it... ;p

Based on your knowledge, what do you think might be the biggest hurdles to making it possible, using a system similar like the one i described above?

Replies from: quanticle

↑ comment by quanticle · 2022-11-08T06:34:52.410Z · LW(p) · GW(p)

The biggest problem, as I see it, is that you haven't come to a thorough understanding of what is it that you mean by "all the actionable tangible methods and systems that help fulfill this positive value goal and then contrast with all the negative problems that exist in the world that exist with respect to that positive goal".

In other words, what you've written there is just, "Make the computer do good things, and also make the computer not do bad things."

Yes, it would be wonderful if we could make the computer just do good things and make the computer not do bad things. But if it were that easy, AI alignment would be a trivial problem.

Edit: Mysterious Answers To Mysterious Questions is a good sequence post that explains the issues with your approach.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-18T22:24:32.196Z · LW(p) · GW(p)

Interesting article! It reminds me of Monica Anderson's blog: https://experimental-epistemology.ai/

She embraces the mysticism and proposes that holistic, non-reductionist model free systems are undeniably effective.

> "The biggest problem, as I see it, is that you haven't come to a thorough understanding of what you mean"

That's another one, that Monica writes a lot about: Understanding.

What does is mean to understanding something? And what is the meaning of meaning?

Yes, they sounds like metaphysical, mystical ideas, and they might be fundamentally unsolvable (See: Hard problem of consciousness or explanatory gap).

But we already see that it must be possible for systems to exist in this universe, that are aligned within a group. Many groups of humans use their brains to figure out how to coexist peacefully.

So unless humans posses a mystical ingredient, it must be possible to recreate this sort of understanding in machines.

But we don't currently know how to teach this to this machines, in part because we lack a good dataset, and we don't understand it, ourselves.

Do you think that alignment is fundamentally an engineering problem, or is it one of the humanities and one of philosophy?

Replies from: quanticle

↑ comment by quanticle · 2022-11-19T05:24:00.592Z · LW(p) · GW(p)

It's an engineering problem. If I'm honest, I see essentially zero room for the humanities in AI alignment. The level of fuzzy-thinking and lack of rigor that characterizes the humanities is a hindrance, where alignment is concerned.

In other words, we can discuss the philosophical implications of having machines that "understand", after we've implemented the guardrails that prevent those machines from ending intelligent life.

EDIT: I read the first two articles on the blog that you linked, and I found it to be a classic example of what "Mysterious Answers To Mysterious Questions" is warning about. "Understanding" is used the same way that ancient natural philosophers used "phlogiston", or "elan vital" or "luminiferous aether".

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-21T07:59:48.859Z · LW(p) · GW(p)

Fair point! But how do you know that this ungrounded mysticism doesn't apply to current debate about the potential capabilities of AI systems?

Why is an AI suddenly able to figure out how to break the laws of physics and be super intelligent about how to end intelligent life, but somehow incapable of comprehending the human laws of ethics and morality, and valuing life as we know it?

What makes the laws of physics easier to understand and easier to circumvent than the human laws of ethics and morality? (And also, navigating the human laws of ethics and morality must be required for ending all life. Unless software suddenly has the same energy as enriched plutonium or something like that, and one wrong bit flip causes an explosive chain reaction)

What makes it so much more difficult to understand critical thinking and "how to store human values in a computer", and in contrast what makes "accidentally ending all intelligent life" so easy, by comparison?

It seems to me that "ASI on mission to destroy the humans" is the same thing as "luminiferous aether".

We taught AI English and how to draw pictures and create art. Both pretty "fuzzy" things.

How hard can it be to train AI on a dataset of 90% of known human values and 90% of known problems and solutions with respect to those values for a neural net to have an "above average human"-grasp on the idea that "ending all intelligent life" computes as "that's a problem and it's immoral" ?

Beyond that, alignment is unsolvable anyways for AGI systems that perform at above human intelligence. Can't predict the future with a software, because there could always be software that uses the future predicting software and negates the output - aka the Halting Problem. Can't do anything about that.

Replies from: quanticle

↑ comment by quanticle · 2022-11-21T18:12:08.960Z · LW(p) · GW(p)

Why is an AI suddenly able to figure out how to break the laws of physics and be super intelligent about how to end intelligent life, but somehow incapable of comprehending the human laws of ethics and morality, and valuing life as we know it?

Why do you think an AI would need to break the laws of physics in order to become superintelligent? As Eliezer [LW · GW] and gwern have pointed out, the laws of physics are no bar to a machine achieving power beyond our capability to stop.

What makes it so much more difficult to understand critical thinking and “how to store human values in a computer”, and in contrast what makes “accidentally ending all intelligent life” so easy, by comparison?

"Accidentally ending all intelligent life" is the default outcome. It's what happens when you program a self-optimizing maximizing process and unleash it. As Eliezer said, once, "The AI does not hate you. The AI does not fear you. The AI merely sees that you are composed of atoms that it could use for its own purposes."

Furthermore, why do you think the comprehension is the problem? A superintelligence may fully comprehend human values, but it might be programmed in a way where it just doesn't care [LW · GW]. A superintelligent AI tasked with maximizing the number of paperclips in the universe will of course be capable of comprehending human morality and ethics. It might even say that it agrees. But its utility function is fixed. Its goal is to maximize paperclips. It will do whatever it can to maximize the number of paperclips and if that happens to go against what it knows of human morality, well, so much the worse for human morality, then.

How hard can it be to train AI on a dataset of 90% of known human values and 90% of known problems and solutions with respect to those values for a neural net to have an “above average human”-grasp on the idea that “ending all intelligent life” computes as “that’s a problem and it’s immoral” ?

I look forward to you producing such a database.

Beyond that, alignment is unsolvable anyways for AGI systems that perform at above human intelligence. Can’t predict the future with a software, because there could always be software that uses the future predicting software and negates the output—aka the Halting Problem. Can’t do anything about that.

That is a misunderstanding of the Halting Problem.

↑ comment by Ruby · 2022-11-06T16:51:46.789Z · LW(p) · GW(p)

The Sequences is the original name, but it got edited down and renamed to "Rationality:A-Z"

https://www.lesswrong.com/rationality [? · GW]

↑ comment by Maxwell Clarke (maxwell-clarke) · 2022-11-06T11:21:23.382Z · LW(p) · GW(p)

I think you might also be interested in this: https://www.lesswrong.com/posts/Nwgdq6kHke5LY692J/alignment-by-default [LW · GW] In general John Wentworths alignment agenda is essentially extrapolating your thoughts here and dealing with the problems in it.

It's unfortunate but I agree with Ruby- your post is fine but a top-level lesswrong post isn't really the place for it anymore. I'm not sure where the best place to get feedback on this kind of thing is (maybe publish here on LW but as a short-form or draft?) - but you're always welcome to send stuff to me! (Although busy finishing master's next couple of weeks)

Replies from: oliver-siegel, maxwell-clarke

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-07T08:02:35.044Z · LW(p) · GW(p)

Thank you for the resource!

I'm planning to continue publishing more details about this concept. I believe it will address many of the things mentioned in the post you linked.

Instead of posting it all at once, I'm posting it in smaller chunks that all connect.

I have something coming up about preventing instrumental convergence with formalized critical thinking, as well as a general problem solving algorithm. It'll hopefully make sense once it's all there!

Replies from: maxwell-clarke

↑ comment by Maxwell Clarke (maxwell-clarke) · 2022-11-07T11:04:40.018Z · LW(p) · GW(p)

Respect for thinking about this stuff yourself. You seem new to alignment (correct me if I'm wrong) - I think it might be helpful to view posting as primarily about getting feedback rather than contributing directly, unless you have read most of the other people's thoughts on whichever topic you are thinking/writing about.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2022-11-18T22:26:37.053Z · LW(p) · GW(p)

Absolutely, I'm here for the feedback! No solution should go without criticism, regardless of what authority posted the idea, or how much experience the author has. :)

↑ comment by Maxwell Clarke (maxwell-clarke) · 2022-11-06T11:22:40.543Z · LW(p) · GW(p)

Oh or EA forum, I see it's crossposted

How to store human values on a computer

Contents

17 comments