FinalFormal2's Shortform

finalformal2

FinalFormal2's Shortform

post by FinalFormal2 · 2025-02-01T15:57:34.228Z · LW · GW · 5 comments

5 comments

5 comments

Comments sorted by top scores.

comment by FinalFormal2 · 2025-02-01T15:57:34.225Z · LW(p) · GW(p)

What's the deal with AI welfare? How are we supposed to determine if AIs are conscious and if they are, what stated preference corresponds to what conscious experience?

Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?

Replies from: Dagon

↑ comment by Dagon · 2025-02-01T16:27:50.512Z · LW(p) · GW(p)

We haven't figured it out for humans, and only VERY recently in history has the idea become common that people not kin to you deserve empathy and care. Even so, it's based on vibes and consensus, not metrics or proof. I expect it'll take less than a few decades to start recognizing some person-hood for some AIs.

It'll be interesting to see if the reverse occurs: the AIs that end up making decisions about humans could have some amount of empathy for us, or they may just not care.

Replies from: FinalFormal2

↑ comment by FinalFormal2 · 2025-02-01T17:19:38.340Z · LW(p) · GW(p)

There are a lot of good reasons to believe that stated human preferences correspond to real human preferences. There are no good reasons that I know of to believe that any stated AI preference corresponds to any real AI preference.

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

Replies from: Dagon

↑ comment by Dagon · 2025-02-01T17:36:19.189Z · LW(p) · GW(p)

There are a lot of good reasons to believe that stated human preferences correspond to real human preferences.

Can you name a few? I know of one: I assume that there's some similarity with me in because of similar organic structures doing the preferring. That IS a good reason, but it's not universally compelling or unassailable.

Actually, can you define 'real preferences' in some way that could be falsifiable for humans and observable for AIs?

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

Just as easily as humans, I'm sure.

Replies from: FinalFormal2

↑ comment by FinalFormal2 · 2025-02-01T19:28:17.673Z · LW(p) · GW(p)

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"
Just as easily as humans, I'm sure.

No. The baby cries, the baby gets milk, the baby does not die. This is correspondence to reality.

Babies that are not hugged as often, die more often.

However, with AIs, the same process that produces the pattern "I want hugs" just as easily produces the pattern "I don't want hugs."

Let's say that I make an AI that always says it is in pain. I make it like we make any LLM, but all the data it's trained on is about being in pain. Do you think the AI is in pain?

What do you think distinguishes pAIn from any other AI?

FinalFormal2's Shortform

Contents

5 comments