LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Steering Gemini with BiDPO
TurnTrout · 2025-01-31T02:37:55.839Z · comments (5)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (23)

Judgements: Merging Prediction & Evidence
abramdemski · 2025-02-23T19:35:51.488Z · comments (5)

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

Reviewing LessWrong: Screwtape's Basic Answer
Screwtape · 2025-02-05T04:30:34.347Z · comments (18)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (9)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (6)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (19)

[link] Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-02-06T15:46:53.024Z · comments (9)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (11)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (79)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (49)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (25)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (49)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (15)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (61)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (33)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

C'mon guys, Deliberate Practice is Real
Raemon · 2025-02-05T22:33:59.069Z · comments (25)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (60)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

Matryoshka Sparse Autoencoders
Noa Nabeshima (noa-nabeshima) · 2024-12-14T02:52:32.017Z · comments (15)

The purposeful drunkard
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-12T12:27:51.952Z · comments (13)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

brendan-long on Brendan Long's Shortform

I'd like to learn more Spanish words but have trouble sitting down to actually do language lessons, so I recently set my Claude "personal preferences" to:

Try to teach a random Spanish word in every conversation.

(This is the whole thing)

This has worked surprisingly well, and Claude usually either drops one word in Spanish with a translation midway through a response:

For your specific situation, I recommend a calibración (calibration) approach:

2. Accounting for concurrency: Ensure you're capturing all hilos (threads) involved in query execution, especially for parallel queries.

(From a conversation about benchmarking)

Or it ends the conversation with a fun fact:

¡Palabra en español! "Herramienta" - which means "tool" in Spanish, quite relevant to your search for tools to automate SSH known_hosts management.

La palabra española para hoy es "configurar" - which means "to configure" in English, fitting perfectly with our discussion about configurable thinking limits!

I don't know if this actually useful for learning, but it's fun and worked better than I expected.

My wife tried a similar prompt (although her preferences are much longer) and it made Claude sometimes respond entirely in Spanish, so this could probably be made more specific. If you run into that, maybe try "Response in English but try to teach a random Spanish word in every conversation" would work better?

robo on jacquesthibs's Shortform

Could you put a footnote or something to indicate this is riffing on a meme and isn't real? Unless you know the meme this comment looks like an amazing revelation that actually happened.

kilgoar on Unbendable Arm as Test Case for Religious Belief

Serfs were not property of any master and ideally had protection against displacement and violence. In practice this didn't always play out, but neither do liberal human rights. Equivocating serfdom to the displacement of millions of Africans as property is convenient and lazy, and completely illogical. And there is no denying the modernity in the African slave trade, the massive scale, the involvement of mechanization of the cotton gin, and on and on.

Probably just about every historian you can find is going to refer to the 1500s as the early modern or late medieval period, depending on just where in Europe you are, and it's a time when religion became remarkably more harmful than it had ever been before, along with many other changes such as a terminal disruption of the church's centralized worldly powers and the concentration of total powers into the state. And these changes are continuous with the present.

I'm not redefining modernity in some twisted way, this is all very conventional stuff. Who cares what "most people" think, they're fucking wrong!

sharmake-farah on jacquesthibs's Shortform

Which incorrect conclusions do you think they have been tied to, in your opinion?

kerrigan on We Don't Have a Utility Function

How are humans exploitable, given that they don't have utility functions?

p-joao on 8 PRIME SKILLS - A simplified construction from MaxEnt Informational Efficiency in 4 questions

I'm going to do it with the moments of peak motivation.

p-joao on Two hemispheres - I do not think it means what you think it means

don't see why we shouldn't apply the same logic to corpus callosotomy. Destroying the major connection (though not the only one: there is also the anterior commissure, posterior commissure, and hippocampal commissure) between the cerebral hemispheres damages the brain; obviously. The parts that have previously cooperated fluently now have a problem to cooperate. The split-brain syndrome is a result of the damage. However, despite that, the split-brain patients typically maintain a unified sense of self and personality, it's just that some of their information processing systems are disconnected. Which makes it even less likely that in people with intact corpus callosum the two brain hemispheres secretly act as two different personalities.

That reminds me of Descartes' Error — how Damasio argues that any damage to one system tends to affect the whole, since reason, emotion, and embodiment are deeply integrated. It feels analogous to social systems too: once a channel of integration is disrupted, even if others remain, the total coherence suffers. So yeah, even when split-brain patients seem unified, it’s still a case of partial systemic breakdown.

abandon on jacquesthibs's Shortform

I assume young, naive, and optimistic. (There's a humor element here, in that niplav is referencing a snowclone, afaik originating in this tweet which went "My neighbor told me coyotes keep eating his outdoor cats so I asked how many cats he has and he said he just goes to the shelter and gets a new cat afterwards so I said it sounds like he’s just feeding shelter cats to coyotes and then his daughter started crying.", so it may have been added to make the cadence more similar to the original tweet's).

p-joao on Two hemispheres - I do not think it means what you think it means

El cerebro izquierdo prefiere Android, el cerebro derecho prefiere iPhone
Bueno, admito que me inventé el último. El resto es una simplificación excesiva.

haha i see

p-joao on The first AI war will be in your computer

hahaha, a god point!