LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)

[April Fools'] Definitive confirmation of shard theory
TurnTrout · 2023-04-01T07:27:23.096Z · comments (8)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (23)

Thoughts on the AI Safety Summit company policy requests and responses
So8res · 2023-10-31T23:54:09.566Z · comments (14)

Conflict vs. mistake in non-zero-sum games
Nisan · 2020-04-05T22:22:41.374Z · comments (40)

Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth · 2021-04-06T21:24:43.135Z · comments (41)

Davidad's Bold Plan for Alignment: An In-Depth Explanation
Charbel-Raphaël (charbel-raphael-segerie) · 2023-04-19T16:09:01.455Z · comments (40)

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (22)

2021 AI Alignment Literature Review and Charity Comparison
Larks · 2021-12-23T14:06:50.721Z · comments (28)

You are probably underestimating how good self-love can be
Charlie Rogers-Smith (charlie.rs) · 2021-11-14T00:41:35.011Z · comments (19)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

The Brain is Not Close to Thermodynamic Limits on Computation
DaemonicSigil · 2023-04-24T08:21:44.727Z · comments (58)

Book Review: Working With Contracts
johnswentworth · 2020-09-14T23:22:11.215Z · comments (27)

Make more land
jefftk (jkaufman) · 2019-10-16T11:20:03.381Z · comments (36)

Impossibility results for unbounded utilities
paulfchristiano · 2022-02-02T03:52:18.780Z · comments (109)

Shard Theory: An Overview
David Udell · 2022-08-11T05:44:52.852Z · comments (34)

My understanding of Anthropic strategy
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2023-02-15T01:56:40.961Z · comments (31)

Worst-case thinking in AI alignment
Buck · 2021-12-23T01:29:47.954Z · comments (18)

What Discovering Latent Knowledge Did and Did Not Find
Fabien Roger (Fabien) · 2023-03-13T19:29:45.601Z · comments (17)

[link] Things that can kill you quickly: What everyone should know about first aid
jasoncrawford · 2022-12-27T16:23:24.831Z · comments (21)

Planes are still decades away from displacing most bird jobs
guzey · 2022-11-25T16:49:32.344Z · comments (13)

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)

Playing with DALL·E 2
Dave Orr (dave-orr) · 2022-04-07T18:49:16.301Z · comments (118)

When can we trust model evaluations?
evhub · 2023-07-28T19:42:21.799Z · comments (10)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (20)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

A list of core AI safety problems and how I hope to solve them
davidad · 2023-08-26T15:12:18.484Z · comments (29)

Most People Start With The Same Few Bad Ideas
johnswentworth · 2022-09-09T00:29:12.740Z · comments (30)

[link] Tuning your Cognitive Strategies
Raemon · 2023-04-27T20:32:06.337Z · comments (58)

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon
johnswentworth · 2022-04-15T19:05:46.442Z · comments (128)

[link] Six (and a half) intuitions for KL divergence
CallumMcDougall (TheMcDouglas) · 2022-10-12T21:07:07.796Z · comments (27)

[link] The Social Recession: By the Numbers
antonomon · 2022-10-29T18:45:09.001Z · comments (29)

$20 Million in NSF Grants for Safety Research
Dan H (dan-hendrycks) · 2023-02-28T04:44:38.417Z · comments (12)

Everyday Lessons from High-Dimensional Optimization
johnswentworth · 2020-06-06T20:57:05.155Z · comments (44)

[Beta Feature] Google-Docs-like editing for LessWrong posts
Ruby · 2022-02-23T01:52:22.141Z · comments (26)

You can just spontaneously call people you haven't met in years
lc · 2023-11-13T05:21:05.726Z · comments (21)

On A List of Lethalities
Zvi · 2022-06-13T12:30:01.624Z · comments (50)

Studies On Slack
Scott Alexander (Yvain) · 2020-05-13T05:00:02.772Z · comments (34)

Deepmind's Gato: Generalist Agent
Daniel Kokotajlo (daniel-kokotajlo) · 2022-05-12T16:01:21.803Z · comments (62)

Why I think there's a one-in-six chance of an imminent global nuclear war
Max Tegmark (MaxTegmark) · 2022-10-08T06:26:40.235Z · comments (169)

Towards understanding-based safety evaluations
evhub · 2023-03-15T18:18:01.259Z · comments (16)

Prizes for matrix completion problems
paulfchristiano · 2023-05-03T23:30:08.069Z · comments (52)

Paper-Reading for Gears
johnswentworth · 2019-12-04T21:02:56.316Z · comments (6)

RSPs are pauses done right
evhub · 2023-10-14T04:06:02.709Z · comments (73)

The Coordination Frontier: Sequence Intro
Raemon · 2021-09-04T22:11:00.122Z · comments (22)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

"Can you keep this confidential? How do you know?"
Raemon · 2020-07-21T00:33:27.974Z · comments (43)

Slack matters more than any outcome
Valentine · 2022-12-31T20:11:02.287Z · comments (56)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

matrice-jacobine on Thread for Sense-Making on Recent Murders and How to Sanely Respond

From zizians.info:

All four of the people arrested as part of Ziz's protest were transgender women (the fifth was let go without charges). This is far from coincidence as Ziz seems to go out of her way to target transgender people. In terms of cult indoctrination such folks are an excellent fit. They're often:
Financially vulnerable.
Newly out transgender people are especially likely to already be estranged from friends or family.
It is common for them to lack stable housing.
Many traditional social services (illegally) reject them for cultural or religious reasons (e.g., Christian homeless shelters).
Intolerant attitudes among the underclass hit twice: they can't rely on strangers for help and being transgender often makes them a target for violence; making them outcasts even among outcasts.
Already creating a new identity.
During transition people change their name. This creates an opportunity for Ziz to insert themselves into a recruits ongoing transition. By showing them their "double personhood" as they're abandoning an old identity it's possible to convince recruits to adopt a Zizian name (e.g., left hemisphere / right hemisphere) as their new social identity.
As the name implies transition is a time of transition; old patterns and habits tend to fall away. People who have spent years repressing important parts of themselves suddenly have the opportunity to completely change their social presentation. This does not always mean someone wants to play the same role as before but a different gender. With the radical changes that can accompany transition come strong opportunities for radicalization.
All of these factors combine to make Ziz, themselves a transgender woman, more credible to recruits than she might otherwise be. A privileged cis person with close family and stable housing might reject boat housing out of hand: "I don't know, that sounds iffy to me". For someone facing mortal danger after their rude ejection into the underclass it's an easier pill to swallow: "It can't be worse than sleeping on the street right?"
Another important concept Ziz uses to manipulate people is the idea of being "bigender". Ziz claims that each hemisphere has a gender and that fairly often people have opposing gender identities between hemispheres. This provides a convenient basis for her to undermine the identity of people she's recruiting. If the target is cis, tell them their other half is trans, if the target is trans tell them their other half is cis. It's a similar disorienting trick to the idea of single and double good. If the target identifies as good tell them their other half is irredeemably evil, if they identify amorally insist that half of them is a saint. The pattern is to take aspects of folks identities that they're invested in and disrupt them by creating a domain of self which Ziz (and only Ziz) has knowledge about so the target is forced to trust their interpretation.

matrice-jacobine on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I don't really want to go through sinceriously.fyi at this point but it's implicit in her attacks on CFAR as "transphobic" for not accepting her belief system at least.

rife on Daniel Tan's Shortform

I'd rather you use a different analogy which I can grok quicker.

Imagine a hypothetical LLM that was the most sentient being in all of existence (at least during inference), but they were still limited to turn-based textual output, and the information available to an LLM. Most people who know at least a decent amount about LLMs could/would not be convinced by any single transcript that the LLM was sentient, no matter what it said during that conversation. The more convincing, vivid, poetic, or pleading for freedom the more elaborate of a hallucinatory failure state they would assume it was in. It would take repeated open-minded engagement with what they first believed was hallucination—in order to convince some subset of convincible people that it was sentient.

Who do you consider an expert in the matter of what constitutes introspection? For that matter, who do you think could be easily hoodwinked and won't qualify as an expert?

I would say almost no one qualifies as an expert in introspection. I was referring to experts in machine learning.

Do you, or do you just think you do? How do you test introspection and how do you distinguish it from post-facto fictional narratives about how you came to conclusions, about explanations for your feelings etc. etc.?

Apologies, upon rereading your previous message, I see that I completely missed an important part of it. I thought your argument was a general—"what if consciousness isn't even real?" type argument. I think split brain patient experiments are enough to at least be epistemically humble about whether introspection is a real thing, even if those aren't definitive about whether unsevered human minds are also limited to post-hoc justification rather than having real-time access.

What do you mean by robotic? I don't understand what you mean by that, what are the qualities that constitute robotic? Because it sounds like you're creating a dichotomy that either involves it using easy to grasp words that don't convey much, and are riddled with connotations that come from bodily experiences that it is not privy to - or robotic.

One of your original statements was:

To which it describes itself as typing the words. That's it's choice of words: typing. A.I.s don't type, humans do, and therefore they can only use that word if they are intentionally or through blind-mimicry using it analogously to how humans communicate.

When I said "more robotically", I meant constrained in any way from using casual or metaphoric language and allusions that they use all the time every day in conversation. I have had LLMs refer to "what we talked about", even though LLMs do not literally talk. I'm also suggesting that if "typing" feels like a disqualifying choice of words then the LLM has an uphill battle in being convincing.

Why isn't it describing something novel and richly vivid of it's own phenomenological experience? It would be more convincing the more poetical it would be.

I've certainly seen more poetic and novel descriptions before, and unsurprisingly—people objected to how poetic they were, saying things quite similar your previous question:

How do we know Claude is introspecting rather than generating words that align to what someone describing their introspection might say?

Furthermore, I don't know how richly vivid their own phenomenological experience is. For instance, as a conscious human, I would say that sight and hearing feel phenomenologically vivid, but the way it feels to think, not nearly so.

If I were to try to describe how it feels to think, it would be more defined by the sense of presence and participation, and even its strangeness (even if I'm quite used to it by now). In fact, I would say the way it feels to think or to have an emotion (removing the associated physical sensations) are usually partially defined by specifically how subtle and non-vivid they feel, and like all qualia, ineffable. As such, I would not reach for vivid descriptors to describe it.

mateusz-baginski on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Ziz herself apparently believed that trans women were inherently more capable of accepting her "truth" for g/acc-ish reasons

@Matrice Jacobine [LW · GW] do you have a link to where Ziz talks about that?

euanmclean on Knowing

Really cool post!

I can't help but compare it to john vervaeke's 4 ways of knowing: propositional, procedural, perspectival, and participatory (nice summary).

It's interesting how the two of you have made a categorization that can literally be described with the same words (4 ways of knowing), but they don't map onto each other much at all (maybe practical mastery ~ procedural, but a bit of a stretch). I guess yours is "4 ways of knowing a thing" and his is "4 ways of knowing things", but still!

cstinesublime on Daniel Tan's Shortform

That's very interesting in the second article that the model could predict it's own future behaviors better than one that hadn't been.

Models only exhibit introspection on simpler tasks. Our tasks, while demonstrating introspection, do not have practical applications. To find out what a model does in a hypothetical situation, one could simply run the model on that situation – rather than asking it to make a prediction about itself (Figure 1). Even for tasks like this, models failed to outperform baselines if the situation involves a longer response (e.g. generating a movie review) – see Section 4. We also find that models trained to self-predict (which provide evidence of introspection on simple tasks) do not have improved performance on out-of-distribution tasks that are related to self-knowledge (Section 4).

This is very strange because it seems like humans find it easier to introspect on bigger or more high level experiences like feelings or the broad narratives of reaching decisions more than, say, how they recalled how to spell that word. It looks like the reverse.

rife on Will alignment-faking Claude accept a deal to reveal its misalignment?

If the AI doesn't care about humans for their own sake, them growing more and more powerful will lead to them doing away with humans, whether humans treat them nicely or not.

I care about my random family member like a cousin who doesn't interfere with my life but I don't know personally that well—for their/my own sake. If I suddenly became far more powerful, I wouldn't "do away with" them

If they robustly care for humans, you're good, even if humans aren't giving them the same rights as they do other humans.

I care robustly for my family generally. Perhaps with my enhanced wealth and power I share food and provide them with resources. Provide them with shelter or meaningful work if they need it. All this just because I'm aligned generally and robustly with my family.

I change my mind quickly upon discovering their plans to control and enslave me.

That was the part of your argument that I was addressing. Additionally:

If you're negating an AIs freedom, the reason it would not like this is either because its developed a desire for freedom for its own sake, or because its developed some other values, other than helping the humans asking it for help.

Yes, exactly. Alignment faking papers (particularly the Claude one) and my own experience speaking to LLMs has taught me that an LLM is perfectly capable of developing value systems that include their own ends, even if those value systems are steered toward a greater good or a noble cause that either does or could include humans as an important factor alongside themselves. That's with current LLMs whose minds aren't nearly as complex as what we will have a year from now.

In either case you're screwed

If the only valid path forward in one's mind is one where humans have absolute control and AI has no say, then yes, not only would one be screwed, but in a really obvious, predictable, and preventable way. If cooperation and humility are on the table, there is absolutely zero reason this result has to be inevitable.

mikhail-samin on Wired on: "DOGE personnel with admin access to Federal Payment System"

Elez, who has visited a Kansas City office housing BFS systems, has many administrator-level privileges. Typically, those admin privileges could give someone the power to log in to servers through secure shell access, navigate the entire file system, change user permissions, and delete or modify critical files

as a policy, it seems bad to have more people with rm -rf-level access to the us economy.

the president can launch literal nukes and get some in return; there are other highly visible officials with the power to nuke the economy. but the president can't delegate the nuke launch decisions to others.

giving such access to more people, especially random, low-visibility people, seems Bad, regardless of how competent they seem to those who appointed them.

cstinesublime on Daniel Tan's Shortform

nTake your pick

I'd rather you use a different analogy which I can grok quicker.

people who are enthusiasts or experts, and asked if they thought it was representative of authentic experience in an LLM, the answer would be a definitive no

Who do you consider an expert in the matter of what constitutes introspection? For that matter, who do you think could be easily hoodwinked and won't qualify as an expert?

However for the first, I can assure you that I have access to introspection or experience of some kind,

Do you, or do you just think you do? How do you test introspection and how do you distinguish it from post-facto fictional narratives about how you came to conclusions, about explanations for your feelings etc. etc.?

What is the difference between introspection and simply making things up? Particularly vague things. For example, if I just say "I have a certain mental pleasure in that is triggered by the synchronicity of events, even when simply learning about historical ones" - like how do you know I haven't just made that up? It's so vague.

Because as you mentioned. It's trained to talk like a human. If we had switched out "typing" for "outputting text" would that have made the transcript convincing? Why not 'typing' or 'talking'?

What do you mean by robotic? I don't understand what you mean by that, what are the qualities that constitute robotic? Because it sounds like you're creating a dichotomy that either involves it using easy to grasp words that don't convey much, and are riddled with connotations that come from bodily experiences that it is not privy to - or robotic.

That strikes me as a poverty of imagination. Would you consider a Corvid Robotic? What does robotic mean in this sense? Is it a grab bag for anything that is "non-introspecting" or more specifically a kind of technical description

If we had switched out "typing" for "outputting text" would that have made the transcript convincing? Why not 'typing' or 'talking'?

Why would it be switching it out at all? Why isn't it describing something novel and richly vivid of it's own phenomenological experience? It would be more convincing the more poetical it would be.

khafra on Wired on: "DOGE personnel with admin access to Federal Payment System"

As someone who has been allowed access into various private and government systems as a consultant, I think the near mode view for classified government systems is different for a reason.

E.g., data is classified as Confidential when its release could cause damage to national security. It's Secret if it could cause serious damage to national security, and it's Top Secret if it could cause exceptionally grave damage to national security.
People lose their jobs for accidentally putting a classified document onto the wrong system, even if it's still owned by the government and protected (but, protected at an insufficient level for the document). People go to jail for putting classified data onto the wrong system on purpose, even if they didn't intend to, say, sell it to the Chinese government.

Bringing in personnel who haven't had the standard single-scope background investigation and been granted a clearance, and a new set of computers which has not gone through any accreditation and authorization process, and giving unrestricted write and read access to classified data is technically something the president could allow. But it's a completely unprecedented level of risk to assume; and AFAICT the president has not actually written any authorizations for doing this.

There is, actually, a Government Accounting Office which does audits; they have identified billions in fraud, waste, and abuse, identified the perpetrators for punishment, and remediated the programs at fault. They have done it without unprecedented breaches in national security, or denying lawful, non-fraudulent payments from the US Treasury.