LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)

Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)

I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)

[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)

[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (56)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

Everything Wrong with Roko's Claims about an Engineered Pandemic
WitheringWeights (EZ97) · 2024-02-22T15:59:08.439Z · comments (10)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

Meaning & Agency
abramdemski · 2023-12-19T22:27:32.123Z · comments (17)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

xpym on You are not too "irrational" to know your preferences.

Now it would certainly be tempting to define rationality as something like “only taking actions that you endorse in the long term”, but I’d be cautious of that.

Indeed, and there's another big reason for that - trying to always override your short-term "monkey brain" impulses just doesn't work that well for most people. That's the root of akrasia, which certainly isn't a problem that self-identified rationalists are immune to. What seems to be a better approach is to find compromises, where you develop workable long-term strategies which involve neither unlimited amounts of proverbial ice cream, nor total abstinence.

But I think that quite a few people who care about “health” actually care about not appearing low status by doing things that everyone knows are unhealthy.

Which is a good thing, in this particular case, yes? That's cultural evolution properly doing its job, as far as I'm concerned.

yanling-guo on How Universal Basic Income Could Help Us Build a Brighter Future

It doesn’t make sense to argue about definitions. If you define UBI so, then so does UBI mean for you. I’m actively pushing for a redefinition of UBI, or reshaping the policy as I said, because I thinks it’s the right thing to do.

Did I reply in so perfect English that it sounded like corrected by ChatGPT? Cheer to my English, which has improved so much! 🥂

the-gears-to-ascension on [bounty $100] Why are there no interesting (1D, 2-state) quantum cellular automata?

What is a concise intro that will teach me everything I need to know for understanding every expression here? I'm also asking Claude, interested in input from people with useful physics textbook taste

q-home on Making a conservative case for alignment

I think there should be more spaces where controversial ideas can be debated. I'm not against spaces without pronoun rules, just don't think every place should be like this. Also, if we create a space for political debate, we need to really make sure that the norms don't punish everyone who opposes centrism & the right. (Over-sensitive norms like "if you said that some opinion is transphobic you're uncivil/shaming/manipulative and should get banned" might do this.) Otherwise it's not free speech either. Will just produce another Grey or Red Tribe instead of Red/Blue/Grey debate platform.

I do think progressives underestimate free speech damage. To me it's the biggest issue with the Left. Though I don't think they're entirely wrong about free speech.

For example, imagine I have trans employees. Another employee (X) refuses to use pronouns, in principle (using pronouns is not the same as accepting progressive gender theories). Why? Maybe X thinks my trans employees live such a great lie that using pronouns is already an unacceptable concession. Or maybe X thinks that even trying to switch "he" & "she" is too much work, and I'm not justified in asking to do that work because of absolute free speech. Those opinions seem unnecessarily strong and they're at odds with the well-being of my employees, my work environment. So what now? Also, if pronouns are an unacceptable concession, why isn't calling a trans woman by her female name an unacceptable concession?

Imagine I don't believe something about a minority, so I start avoiding words which might suggest otherwise. If I don't believe that gay love can be as true as straight love, I avoid the word "love" (in reference to gay people or to anybody) at work. If I don't believe that women are as smart as men, I avoid the word "master" / "genius" (in reference to women or anybody) at work. It can get pretty silly. Will predictably cost me certain jobs.

sinclair-chen on Sinclair Chen's Shortform

we completely dominate dogs. society treat them well because enough humans love dogs.

I do think that cooperation between people is the origin of religion, and its moral rulesets which create tiny little societies that can hunt stags.

sinclair-chen on Sinclair Chen's Shortform

I definitely think that if I was not conscious then I would not coherently want things. But that conscious minds are the only things that can truly care, does not mean that conscious minds are the only things we should terminally care about.

The close circle composition isn't enough to justify Singerian altruism from egoist assumptions, because of the value falloff. With each degree of connection, I love the stranger less.

sinclair-chen on Sinclair Chen's Shortform

I didn't use the word "ethics" in my comment, so are you making a definitional statement, to distinguish between [universal value system] and [subjective value system] or just authoritatively saying that I'm wrong?

Are you claiming moral realism? I don't really believe that. If "ethics" is global, why should I care about "ethics"? Sorry if that sounds callous, I do actually care about the world, just trying to pin down what you mean.

shankar-sivarajan on Why Don't We Just... Shoggoth+Face+Paraphraser?

I suspect the real reason is stopping competitors fine-tuning on o1's CoT, which they also come right out and say:

Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring

anaguma on keltan's Shortform

What signal do we get from DeepSeek continuing to publish?

johnswentworth on leogao's Shortform

the number one spontaneous conversation is "what are you working on" or "what have you done so far", which forces you to re-explain what you're doing & the reasons for doing it to a skeptical & ignorant audience

I'm very curious if others also find this to be the biggest value-contributor amongst spontaneous conversations. (Also, more generally, I'm curious what kinds of spontaneous conversations people are getting so much value out of.)