LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Is AI Alignment Enough?
Aram Panasenco (panasenco) · 2025-01-10T18:57:48.409Z · comments (6)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

[link] The Roots of Progress 2024 in review
jasoncrawford · 2025-01-01T00:02:06.441Z · comments (0)

Early Experiments in Human Auditing for AI Control
Joey Yudelson (JosephY) · 2025-01-23T01:34:31.682Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

[link] Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain · 2024-12-24T09:57:55.111Z · comments (3)

[link] Impact in AI Safety Now Requires Specific Strategic Insight
MiloSal (milosal) · 2024-12-29T00:40:53.780Z · comments (1)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (2)

The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Sahil · 2024-11-07T05:27:20.276Z · comments (1)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

Will bird flu be the next Covid? "Little chance" says my dashboard.
Nathan Young · 2025-01-07T20:10:50.080Z · comments (0)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

Action derivatives: You’re not doing what you think you’re doing
PatrickDFarley · 2024-11-21T16:24:04.044Z · comments (0)

Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (2)

How likely is brain preservation to work?
Andy_McKenzie · 2024-11-18T16:58:54.632Z · comments (3)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

Mask and Respirator Intelligibility Comparison
jefftk (jkaufman) · 2024-12-07T03:20:01.585Z · comments (5)

Trying Bluesky
jefftk (jkaufman) · 2024-11-17T02:50:04.093Z · comments (17)

[link] Creating Interpretable Latent Spaces with Gradient Routing
Jacob G-W (g-w1) · 2024-12-14T04:00:17.249Z · comments (6)

AI #93: Happy Tuesday
Zvi · 2024-12-04T00:30:06.891Z · comments (2)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

[link] Introducing the Anthropic Fellows Program
Miranda Zhang (miranda-zhang) · 2024-11-30T23:47:29.259Z · comments (0)

Preface
Allison Duettmann (allison-duettmann) · 2025-01-02T18:59:46.290Z · comments (2)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Intranasal mRNA Vaccines?
J Bostock (Jemist) · 2025-01-01T23:46:40.524Z · comments (2)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

On The Rationalist Megameetup
Screwtape · 2024-11-23T09:08:26.897Z · comments (3)

[link] Effective Networking as Sending Hard to Fake Signals
vaishnav92 · 2024-12-12T20:32:24.113Z · comments (2)

Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely
omnizoid · 2024-11-20T16:48:44.859Z · comments (5)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

Elevating Air Purifiers
jefftk (jkaufman) · 2024-12-17T01:40:05.401Z · comments (0)

No Electricity in Manchuria
winstonBosan · 2024-11-19T01:11:58.661Z · comments (0)

[link] Social events with plausible deniability
Chipmonk · 2024-11-18T18:25:17.339Z · comments (24)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

[link] debating buying NVDA in 2019
bhauth · 2025-01-04T05:06:54.047Z · comments (0)

[link] A Theory of Equilibrium in the Offense-Defense Balance
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-15T13:51:33.376Z · comments (6)

Alternatives to Masks for Infectious Aerosols
jefftk (jkaufman) · 2024-12-08T14:00:01.670Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

steven-lee on Ten people on the inside

Not Buck but I think it does unless of course they Saw Something and decided that safety efforts weren't going to work. The essay seems to hinge on safety people being able to make models safer, which sounds plausible but I'm sure they already knew that. Given their insider information and conclusions about their ability to make a positive impact, then it seems less plausible that their safety efforts would succeed. Maybe whether or not someone has already quit is an indication of how impactful their safety work is. It also varies by lab, with OpenAI having many safety conscious quitters but other labs having much fewer (I want to say none, but maybe I just haven't heard of any).

The other thing to think about is whether or not people who quit and claimed it was due to safety reasons were being honest about that. I'd like to believe that they were, but all companies have culture/performance expectations that their employees might not want to meet and quitting for safety reasons sounds better than quitting over performance issues.

matthew-barnett on Capital Ownership Will Not Prevent Human Disempowerment

Are you suggesting that I should base my morality on whether I'll be rewarded for adhering to it? That just sounds like selfishness disguised as impersonal ethics.

To be clear, I do have some selfish/non-impartial preferences. I care about my own life and happiness, and the happiness of my friends and family. But I also have some altruistic preferences, and my commentary on AI tends to reflect that.

thane-ruthenis on What Indicators Should We Watch to Disambiguate AGI Timelines?

Have you looked at samples of CoT of o1, o3, deepseek, etc. solving hard math problems?

Certainly (experimenting with r1's CoTs right now, in fact). I agree that they're not doing the brute-force stuff I mentioned; that was just me outlining a scenario in which a system "technically" clears the bar you'd outlined, yet I end up unmoved (I don't want to end up goalpost-moving).

Though neither are they being "strategic" in the way I expect they'd need to be in order to productively use a billion-token CoT.

Anyhow, this is nice, because I do expect that probably something like this milestone will be reached before AGI

Yeah, I'm also glad to finally have something concrete-ish to watch out for. Thanks for prompting me!

raemon on Ten people on the inside

The question is "are the safety-conscious people effectual at all, and what are their opportunity costs?".

i.e. are the cheap things they can do that don't step on anyone's toes that helpful-on-the-margin, better than what they'd be able to do at another company? (I don't know the answer, depends on the people).

seth-herd on Ten people on the inside

It does seem to imply that, doesn't it? I respect the people leaving, and I think it does send a valuable message. And it seems very valuable to have safety-conscious people on the inside.

lwlw on Yudkowsky on The Trajectory podcast

"I don't think you'll need to worry about this stuff until you get really far out of distribution." I may sound like I'm just commenting for the sake of commenting but I think that's something you want to be crystal clear on. I'm pessimistic in general and this situation is probably unlikely but I guess one of my worst fears would be creating uberpsychosis. Sounding like every LWer, my relatively out of distribution capabilities made my psychotic delusions hyper-analytic/1000x more terrifying & elaborate than they would have been with worse working memory/analytic abilities (once I started ECT I didn't have the horsepower to hyperanalyze existence as much). I guess the best way to describe it was that I could feel the terror of just how bad -inf would truly be as opposed to having an abstract/detached view that -inf = bad. And I wouldn't want anyone else to go through something like that, let alone something much scarier/worse.

mitchell_porter on Those of you with lots of meditation experience: How did it influence your understanding of philosophy of mind and topics such as qualia?

if you get enough meditative insight you'll transcend the concept of a self

What is the notion of self that you transcend, what does it mean to transcend it, and how does meditation cause this to happen?

daniel-kokotajlo on What Indicators Should We Watch to Disambiguate AGI Timelines?

Have you looked at samples of CoT of o1, o3, deepseek, etc. solving hard math problems? I feel like a few examples have been shown & they seem to involve qualitative thinking, not just brute-force-proof-search (though of course they show lots of failed attempts and backtracking -- just like a human thought-chain would).

Anyhow, this is nice, because I do expect that probably something like this milestone will be reached before AGI (though I'm not sure)

scottalexander on Ten people on the inside

Does this imply that fewer safety people should quit leading labs to protest poor safety policies?

alex_altair on Announcement: Learning Theory Online Course

See also the classic LW post, The Best Textbooks on Every Subject [LW · GW].