post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by johnswentworth · 2024-09-20T22:10:54.591Z · LW(p) · GW(p)

We Humans Learn About Our Values

I'd kinda like to wrap this whole section in a thought-bubble, or quote block, or color, or something, to indicate that the entire section is "what it looks like from inside a human's mind". So e.g. from inside my mind, it looks like we humans learn about our values. And then outside that bubble, we can ask "are there any actual 'values' which we're in fact learning about"?

Replies from: David Lorell
comment by David Lorell · 2024-09-20T22:23:13.578Z · LW(p) · GW(p)

Seems accurate to me. This has been an exercise in the initial step(s) of CCC, which indeed consist of "the phenomenon looks this way to me. It also looks that way to others? Cool. What are we all cottoning on to?"

comment by David Lorell · 2024-09-20T22:04:27.772Z · LW(p) · GW(p)

Indeed, our beliefs-about-values can be integrated into the same system as all our other beliefs, allowing for e.g. ordinary factual evidence to become relevant to beliefs about values in some cases.

Super unclear to the uninitiated what this means. (And therefore threateningly confusing to our future selves.)

Maybe: "Indeed, we can plug 'value' variables into our epistemic models (like, for instance, our models of what brings about reward signals) and update them as a result of non-value-laden facts about the world."

comment by David Lorell · 2024-09-20T22:01:11.743Z · LW(p) · GW(p)

But clearly the reward signal is not itself our values.

Ahhhh

Maybe: "But presumably the reward signal does not plug directly into the action-decision system."?

Or: "But intuitively we do not value reward for its own sake."? 

comment by johnswentworth · 2024-09-20T22:01:06.317Z · LW(p) · GW(p)

in a hand-wavy reinforcement-learning-esque sense

language

comment by johnswentworth · 2024-09-20T22:00:46.508Z · LW(p) · GW(p)

an agent could aim to pursue any values regardless of what the world outside it looks like; “how the external world is” does not tell us “how the external world should be”.

Extremely delicate wording dancing around the "should be" vs "should be according to me" distinction, with embeddedness allowing facts to update "should be according to me" without crossing the is-ought gap... in principle.

Replies from: David Lorell
comment by David Lorell · 2024-09-20T22:20:12.554Z · LW(p) · GW(p)

Wait. I thought that was crossing the is-ought gap. As I think of it, the is ought gap refers to the apparent type-clash and unclear evidential entanglement between facts-about-the-world and values-an-agent-assigns-to-facts-about-the-world. And also as I think of it, "should be" always is short hand for "should be according to me" though possibly means some kind of aggregated thing but also ground out in subjective shoulds.

So "how the external world is" does not tell us "how the external world should be" .... except in so far as the external world has become causally/logically entangled with a particular agent's 'true values'. (Punting on what are an agent's "true values" are as opposed to the much easier "motivating values" or possibly "estimated true values." But for the purposes of this comment, its sufficient to assume that they are dependent on some readable property (or logical consequence of readable properties) of the agent itself.)

Replies from: johnswentworth
comment by johnswentworth · 2024-09-20T22:24:30.927Z · LW(p) · GW(p)

facts-about-the-world

Needs jargon

values-an-agent-assigns-to-facts-about-the-world

also needs jargon

logical consequence of readable properties

...

Replies from: David Lorell, David Lorell
comment by David Lorell · 2024-09-20T22:29:46.415Z · LW(p) · GW(p)

wiggitywiggitywact := fact about the world which requires a typical human to cross a large inferential gap.

comment by David Lorell · 2024-09-20T22:27:08.378Z · LW(p) · GW(p)

wact := fact about the world
mact := fact about the mind
aact := fact about the agent more generally

vwact := value assigned by some agent to a fact about the world
 

Replies from: johnswentworth
comment by johnswentworth · 2024-09-20T22:30:24.421Z · LW(p) · GW(p)

Spitballing:

  • "local fact" vs "global fact" (to evoke local/global variables)
  • "local fact" vs "interoperable fact"
  • "internal fact" vs "interoperable fact"
  • "fact valence" for the value stuff
comment by David Lorell · 2024-09-20T21:59:15.301Z · LW(p) · GW(p)

It does seem like humans have some kind of physiological “reward”, in a hand-wavy reinforcement-learning-esque sense, which seems to at least partially drive the subjective valuation of things.

Hrm... If this compresses down to, "Humans are clearly compelled at least in part by what 'feels good'." then I think it's fine. If not, then this is an awkward sentence and we should discuss.

comment by David Lorell · 2024-09-20T21:57:26.367Z · LW(p) · GW(p)

an agent could aim to pursue any values regardless of what the world outside it looks like;

Without knowing what values are, it's unclear that an agent could aim to pursue any of them. The implicit model here is that there is something like a value function in DP which gets passed into the action-decider along with the world model and that drives the agent. But I think we're saying something more general than that.

comment by David Lorell · 2024-09-20T21:54:08.746Z · LW(p) · GW(p)

but the fact that it makes sense to us to talk about our beliefs

Better terminology for the phenomenon of "making sense" in the above way?

comment by johnswentworth · 2024-09-20T21:53:40.441Z · LW(p) · GW(p)

a “map” of our values

Image
comment by johnswentworth · 2024-09-20T21:52:01.250Z · LW(p) · GW(p)

guess at their own values

Every time the wording of a sentence implies that there are, in fact, some values which someone has or estimates, I picture the adorable not-so-sneaky elephant.

comment by David Lorell · 2024-09-20T21:51:58.406Z · LW(p) · GW(p)

“learn” in the sense that their behavior adapts to their environment.

I want a new word for this. "Learn" vs "Adapt" maybe. Learn means updating of symbolic references (maps) while Adapt means something like responding to stimuli in a systematic way.

comment by johnswentworth · 2024-09-20T21:51:05.700Z · LW(p) · GW(p)

know our own values

... there's a whole fucking elephant swept under that rug. I can see its trunk peaking out. It's adorable how sneaky that elephant thinks it's being.

Replies from: David Lorell
comment by David Lorell · 2024-09-20T22:12:32.199Z · LW(p) · GW(p)

We have at least one jury rigged idea! Conceptually. Kind of.

comment by johnswentworth · 2024-09-20T21:48:49.396Z · LW(p) · GW(p)

The internal heuristics or behaviors “learned” by an adaptive system are not necessarily “about” any particular external thing, and don’t necessarily represent any particular external thing

I give up.

Replies from: David Lorell
comment by David Lorell · 2024-09-20T22:11:59.502Z · LW(p) · GW(p)

Yeeeahhh.... But maybe it's just awkwardly worded rather than being deeply confused. Like: "The learned algorithms which an adaptive system implements may not necessarily accept, output, or even internally use data(structures) which have any relationship at all to some external environment." "Also what the hell is 'reference'."

comment by johnswentworth · 2024-09-20T21:48:31.814Z · LW(p) · GW(p)

Adaptive systems “learn” things, but they don’t necessarily “learn about” things; they don’t necessarily have an internal map of the external territory.

So much screaming

comment by johnswentworth · 2024-09-20T21:48:10.167Z · LW(p) · GW(p)

symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment

more scream

Replies from: David Lorell
comment by David Lorell · 2024-09-20T22:08:11.874Z · LW(p) · GW(p)

Seconded. I have extensional ideas about "symbolic representations" and how they differ from.... non-representations.... but I would not trust this understanding with much weight.

comment by johnswentworth · 2024-09-20T21:47:38.305Z · LW(p) · GW(p)

behavior adapts to their environment

scream

Replies from: David Lorell
comment by David Lorell · 2024-09-20T22:06:45.966Z · LW(p) · GW(p)

Seconded. Comments above.