Consciousness is irrelevant - instead solve alignment by asking this question

oliver-siegel

Consciousness is irrelevant - instead solve alignment by asking this question

post by Oliver Siegel (oliver-siegel) · 2023-03-04T22:06:46.424Z · LW · GW · 6 comments

  How would you explain to someone what the difference is between consciousness and conscience? 
None
6 comments

I've noticed that there are 2 flavors of the alignment problem. One is about the technical how to and practical engineering, the other is about humanities, social sciences, human behavior and psychology.

What they both have in common is doomsday scenarios about paperclip maximizers.

I don't know about you, but personally I don't care if an unaligned AI is aware of its experience while turning the universe into a paperclip, or if the whole thing is happening entirely without consciousness. What i care about is: How can we make AI smart enough to prevent that from happening?

That said, I find especially in amateur circles that CONSCIENCE is not just omitted from the discussion, it's actually poorly understood.

Arguably, if we find an algorithm for empathy, we can solve the alignment problem rather easily. Humans seem to be capable of monitoring their actions for morality and preventing harm to others, how hard can it be to build an artificial brain that can do the same?

How would you explain to someone what the difference is between consciousness and conscience?

6 comments

Comments sorted by top scores.

comment by quanticle · 2023-03-05T23:07:21.654Z · LW(p) · GW(p)

Conscience, as you've defined it, is value alignment. If the AI values the same things that we do, when offered a choice between two courses of action, it will choose the one that serves to enhance human values rather than degrade them. Designing an AI that does this, with no exceptions, is very hard.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2023-04-20T00:37:47.333Z · LW(p) · GW(p)

Thank you for your comment!

In your opinion, what's the biggest challenge about feeding a DNN with human values, and then adjusting the biases in such a manner that it's not degrading them?

We've taught AI how to speak, and it appears that openAI has taught their AI how to produce as little offensive content as possible. So it seems to be feasible, or not?

Replies from: quanticle

↑ comment by quanticle · 2023-04-20T03:33:35.061Z · LW(p) · GW(p)

We’ve taught AI how to speak, and it appears that openAI has taught their AI how to produce as little offensive content as possible.

The problem is that the AI can (and does) lie. Right now, ChatGPT and its ilk are a less than superhuman levels of intelligence, so we can catch their lies. But when a superhuman AI starts lying to you, how does one correct for that? If a superhuman AI starts veering off in a direction that is unexpected, how does one bring it back on track [? · GW]?

@gwern short story, Clippy highlights many of the issues with naively training a superintelligent algorithm on human-generated data and expecting that algorithm to pick up human values as a result. Another post to consider is The Waluigi Effect [LW · GW], which raises the possibility that the more you train an agent to say correct, inoffensive things, the more you've also trained a shadow-agent to say incorrect, offensive things.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2023-04-21T16:48:45.239Z · LW(p) · GW(p)

Makes perfect sense!

Isn't that exactly why we should develop an artificial conscience, to prevent an AI from lying or having a shadow side?

A built in conscience would let the AI know that lying is not something it should do. Also, using a conscience in the AI algorithm would make the AI combat it's own potential shadow. It'll have knowledge of right and wrong / good or bad, and it's even got superhuman ability to orient itself towards that which is good & right, rather than to be "seduced" by the dark side.

Replies from: quanticle

↑ comment by quanticle · 2023-04-22T10:03:49.063Z · LW(p) · GW(p)

Ah, but how do you make the artificial conscience value aligned with humanity? An "artificial conscience" that is capable of aligning a superhuman AI... would itself be an aligned superhuman AI.

Replies from: oliver-siegel

↑ comment by Oliver Siegel (oliver-siegel) · 2023-04-30T16:25:32.693Z · LW(p) · GW(p)

Correct! That's my point with the main post. I don't see anyone discussing conscience, I mostly hear them contemplate consciousness or computability.

As far as how to actually do this, I've dropped a few ideas on this site, they should be listed on my profile.

Consciousness is irrelevant - instead solve alignment by asking this question

Contents

6 comments