No77e's Shortform

no77e-noi

No77e's Shortform

post by No77e (no77e-noi) · 2023-02-18T12:45:42.224Z · LW · GW · 17 comments

17 comments

17 comments

Comments sorted by top scores.

comment by No77e (no77e-noi) · 2025-03-02T11:36:10.214Z · LW(p) · GW(p)

For a while now, some people have been saying they 'kinda dislike LW culture,' but for two opposite reasons, with each group assuming LW is dominated by the other—or at least it seems that way when they talk about it. Consider, for example, janus and TurnTrout who recently stopped posting here directly. They're at opposite ends and with clashing epistemic norms, each complaining that LW is too much like the group the other represents. But in my mind, they're both LW-members-extraordinaires. LW is clearly obviously both, and I think that's great.

Replies from: elityre, Gunnar_Zarncke

↑ comment by Eli Tyre (elityre) · 2025-03-02T21:28:17.598Z · LW(p) · GW(p)

What are the two groups in question here?

Replies from: no77e-noi

↑ comment by No77e (no77e-noi) · 2025-03-02T23:09:56.229Z · LW(p) · GW(p)

I think it's probably more of a spectrum than two distinct groups, and I tried to pick two extremes. On one end, there are the empirical alignment people, like Anthropic and Redwood; on the other, pure conceptual researchers and the LLM whisperers like Janus, and there are shades in between, like MIRI and Paul Christiano. I'm not even sure this fits neatly on one axis, but probably the biggest divide is empirical vs. conceptual. There are other splits too, like rigor vs. exploration or legibility vs. 'lore,' and the preferences kinda seem correlated.

Replies from: danielechlin, no77e-noi

↑ comment by danielechlin · 2025-03-03T02:24:17.795Z · LW(p) · GW(p)

Whenever I try to "learn what's going on with AI alignment" I wind up on some article about whether dogs know enough words to have thoughts or something. I don't really want to kill off the theoretical term (it can peek into the future a little later and function more independent of technology, basically) but it seems like kind of a poor way to answer stuff like: what's going on now, or if all the AI companies allowed me to write their 6 month goals, what would I put on it.

↑ comment by No77e (no77e-noi) · 2025-03-04T09:45:56.914Z · LW(p) · GW(p)

I'm curious about what people disagree with regarding this comment. Also, I guess since people upvoted and agreed with the first one, they do have two groups in mind, but they're not quite the same as the ones I was thinking about (which is interesting and mildly funny!). So, what was your slicing up of the alignment research x LW scene that's consistent with my first comment but different from my description in the second comment?

↑ comment by Gunnar_Zarncke · 2025-03-09T22:41:39.523Z · LW(p) · GW(p)

On first approximation, in a group, if people at both ends of a dimension are about equally unhappy with whst the moderate middle does, assuming that is actually reasonable, but hard to know, then it's probably balanced.

comment by No77e (no77e-noi) · 2024-06-16T21:37:39.658Z · LW(p) · GW(p)

Has anyone proposed a solution to the hard problem of consciousness that goes:

Qualia don't seem to be part of the world. We can't see qualia anywhere, and we can't tell how they arise from the physical world.
Therefore, maybe they aren't actually part of this world.
But what does it mean they aren't part of this world? Well, since maybe we're in a simulation, perhaps they are part of the simulation. Basically, it could be that qualia : screen = simulation : video-game. Or, rephrasing: maybe qualia are part of base reality and not our simulated reality in the same way the computer screen we use to interact with a video game isn't part of the video game itself.

Replies from: JBlack

↑ comment by JBlack · 2024-06-17T05:16:26.504Z · LW(p) · GW(p)

We can't see qualia anywhere, and we can't tell how they arise from the physical world.

Qualia are the only thing we^[1] can see.

We don't see objects "directly" in some sense, we experience qualia of seeing objects. Then we can interpret those via a world-model to deduce that the visual sensations we are experiencing are caused by some external objects reflecting light. The distinction is made clearer by the way that sometimes these visual experiences are not caused by external objects reflecting light, despite essentially identical qualia.

Nonetheless, it is true that we don't know how qualia arise from the physical world. We can track back physical models of sensation until we get to stuff happening in brains, but that still doesn't tell us why these physical processes in brains in particular matter, or whether it's possible for an apparently fully conscious being to not have any subjective experience.

^{^}
At least I presume that you and others have subjective experience of vision. I certainly can't verify it for anyone else, just for myself. Since we're talking about something intrinsically subjective, it's best to be clear about this.

Replies from: no77e-noi

↑ comment by No77e (no77e-noi) · 2024-06-17T11:26:43.140Z · LW(p) · GW(p)

We don't see objects "directly" in some sense, we experience qualia of seeing objects. Then we can interpret those via a world-model to deduce that the visual sensations we are experiencing are caused by some external objects reflecting light. The distinction is made clearer by the way that sometimes these visual experiences are not caused by external objects reflecting light, despite essentially identical qualia.

I don't disagree with this at all, and it's a pretty standard insight for someone who thought about this stuff at least a little. I think what you're doing here is nitpicking on the meaning of the word "see" even if you're not putting it like that.

comment by No77e (no77e-noi) · 2023-03-13T08:26:24.319Z · LW(p) · GW(p)

Iff LLM simulacra resemble humans but are misaligned, that doesn't bode well for S-risk chances.

Replies from: no77e-noi

↑ comment by No77e (no77e-noi) · 2023-03-13T10:27:35.754Z · LW(p) · GW(p)

Waluigi effect also seems bad for s-risk. "Optimize for pleasure, ..." -> "Optimize for suffering, ...".

comment by No77e (no77e-noi) · 2023-03-13T08:25:26.711Z · LW(p) · GW(p)

An optimistic way to frame inner alignment is that gradient descent already hits a very narrow target in goal-space, and we just need one last push.

A pessimistic way to frame inner misalignment is that gradient descent already hits a very narrow target in goal-space, and therefore S-risk could be large.

comment by No77e (no77e-noi) · 2023-03-09T15:51:33.296Z · LW(p) · GW(p)

This community has developed a bunch of good tools for helping resolve disagreements, such as double cruxing. It's a waste that they haven't been systematically deployed for the MIRI conversations. Those conversations could have ended up being more productive and we could've walked away with a succint and precise understanding about where the disagreements are and why.

Replies from: no77e-noi

↑ comment by No77e (no77e-noi) · 2023-03-09T15:57:46.643Z · LW(p) · GW(p)

We should implement Paul Christiano's debate game with alignment researchers instead of ML systems

comment by No77e (no77e-noi) · 2023-02-26T18:17:20.919Z · LW(p) · GW(p)

If you try to write a reward function, or a loss function, that caputres human values, that seems hopeless.

But if you have some interpretability techniques that let you find human values in some simulacrum of a large language model, maybe that's less hopeless.

The difference between constructing something and recognizing it, or between proving and checking, or between producing and criticizing, and so on...

comment by No77e (no77e-noi) · 2023-02-18T12:45:42.426Z · LW(p) · GW(p)

As a failure mode of specification gaming, agents might modify their own goals.

As a convergent instrumental goal, agents want to prevent their goals to be modified.

I think I know how to resolve this apparent contradiction, but I'd like to see other people's opinions about it.

comment by No77e (no77e-noi) · 2023-02-18T15:41:30.399Z · LW(p) · GW(p)

Why this shouldn't work? What's the epistemic failure mode being pointed at here?

No77e's Shortform

Contents

17 comments