LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (19)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

[link] Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-02-06T15:46:53.024Z · comments (9)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (41)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (49)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (11)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (79)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (15)

How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (25)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (5)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (33)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (61)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

Matryoshka Sparse Autoencoders
Noa Nabeshima (noa-nabeshima) · 2024-12-14T02:52:32.017Z · comments (15)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (60)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

The purposeful drunkard
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-12T12:27:51.952Z · comments (13)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

C'mon guys, Deliberate Practice is Real
Raemon · 2025-02-05T22:33:59.069Z · comments (25)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (17)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

Reviewing LessWrong: Screwtape's Basic Answer
Screwtape · 2025-02-05T04:30:34.347Z · comments (18)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (18)

Timaeus in 2024
Jesse Hoogland (jhoogland) · 2025-02-20T23:54:56.939Z · comments (1)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (4)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (69)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (35)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

david-j-higgs on Why Should I Assume CCP AGI is Worse Than USG AGI?

Counter-counterpoint: big groups like bureaucracies are not composed of randomly selected individuals from their respective countries. I strongly doubt that say, 100 randomly selected Google employees (the largest plausible bureaucracy that might potentially develop AGI in the very near term future?) would answer extremely similarly to 100 randomly selected Americans.

Of course, in the only moderately near term or median future, something like a Manhatten Project for AI could produce an AGI. This would still not be identical to 100 random Americans, but averaging across the US security & intelligence apparatus, the current political facing portion of the US executive administration, and the leadership + relevant employee influence from a (mandatory?) collaboration of US frontier labs would be significantly closer on average. I think it would at least be closer to average Americans than a CCP Centralized AGI Project would be to average Chinese people, although I admit I'm not very knowledgeable on the gap between Chinese leadership and average Chinese people other than basics like (somewhat) widespread VPN usage.

josh-you on Why would AI companies use human-level AI to do alignment research?

One factor is how compute-constrained capabilities research is compared to safety research.

There's been lots of ink spilled recently on the degree to which capabilities research is compute-bottlenecked in a scenario where you have a surge of AI research labor caused by automated AI researchers, but everyone agrees that capability research requires a ton of compute. For safety research, I don't feel well informed about how important compute is, but I think a lot of safety research being done right now is not very compute-intensive. I'm not aware off the top of my head of safety work that involve doing experiments approaching the scale of training a frontier model. So you could have a scenario where the large majority of resources is going towards capabilities, but the big increase in safety labor in absolute terms differentially helps safety.

directedevolution on Alexander Gietelink Oldenziel's Shortform

I use LLMs daily yet I still am not sure they really help all that much with the core productivity bottlenecks. I worry they lower the barrier to excessive perfectionism and “vibe coding” or “vibe learning.” They seem to short-circuit the theory-practice gap by giving users instant but unreliable and often inextensible results.

My fear is that they’ll raise expectations about productivity gains (because AI-assisted workers can bring immediate results in more quickly to a higher apparent standard of polish), while drastically reducing the knowledge gain by the workers about the problem domain. For example, workers may be able to whip up a codebase more quickly but have less familiarity with it at the end of the process, making it much more difficult to make modifications efficiently. Essentially, I suspect AI will generate massive technical debt in exchange for short-term wins, and that bad incentives will tend to perpetuate this in organizations. People will quickly set up new systems using AI, take credit, and exit those projects before serious problems become apparent.

habryka4 on MichaelDickens's Shortform

I agree that there is some ontological mismatch here, but I think your position is still in pretty clear conflict to what Neel said, which is what I was objecting to:

My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc.

"Not 100% motivated by it" IMO sounds like an implication that "being motivated by reducing x-risk would make up something like 30%-70% of the motivation". I don't think that's true, and I think various things that Jaime has said make that relatively clear.

ab-2 on Zstd Window Size

You can explicitly set windowLog with --zstd=windowLog=...

It's sometimes useful to combine low-ish compression level with high window size. E.g. when the input data contains multiple similar large chunks that do not fit into the low-compression-level window.

zy on This prompt (sometimes) makes ChatGPT think about terrorist organisations

Interesting find! Thanks for sharing. Curious to see what related training data could be contributing

jwjohnston on LLMs may enable direct democracy at scale

Been a while since I posted here. https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15

jwjohnston on LLMs may enable direct democracy at scale

Fixed link: https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15

cole-wyeth on Open Thread Spring 2025

I've actually been recently thinking about randomness in ML, and I've come to a compelling case for it's specific role. The insights do seem to generalize to all problem-solving mechanisms in a way. I can expand if you want

Sure

jwjohnston on LLMs may enable direct democracy at scale

I developed this idea here: https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15. Pretty much the same byline as yours. :-)