LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

On the future of language models
owencb · 2023-12-20T16:58:28.433Z · comments (17)

[link] A Chess-GPT Linear Emergent World Representation
Adam Karvonen (karvonenadam) · 2024-02-08T04:25:15.222Z · comments (14)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (22)

Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)

[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (5)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (59)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (9)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (7)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (14)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

q-home on Making a conservative case for alignment

I think there should be more spaces where controversial ideas can be debated. I'm not against spaces without pronoun rules, just don't think every place should be like this. Also, if we create a space for political debate, we need to really make sure that the norms don't punish everyone who opposes centrism & the right. (Over-sensitive norms like "if you said that some opinion is transphobic you're uncivil/shaming/manipulative and should get banned" might do this.) Otherwise it's not free speech either. Will just produce another Grey or Red Tribe instead of Red/Blue/Grey debate platform.

I do think progressives underestimate free speech damage. To me it's the biggest issue with the Left. Though I don't think they're entirely wrong about free speech.

For example, imagine I have trans employees. Another employee (X) refuses to use pronouns, in principle (using pronouns is not the same as accepting progressive gender theories). Why? Maybe X thinks my trans employees live such a great lie that using pronouns is already an unacceptable concession. Or maybe X thinks that even trying to switch "he" & "she" is too much work, and I'm not justified in asking to do that work because of absolute free speech. Those opinions seem unnecessarily strong and they're at odds with the well-being of my employees, my work environment. So what now? Also, if pronouns are an unacceptable concession, why isn't calling a trans woman by her female name an unacceptable concession?

Imagine I don't believe something about a minority, so I start avoiding words which might suggest otherwise. If I don't believe that gay love can be as true as straight love, I avoid the word "love" (in reference to gay people or to anybody) at work. If I don't believe that women are as smart as men, I avoid the word "master" / "genius" (in reference to women or anybody) at work. It can get pretty silly. Will predictably cost me certain jobs.

sinclair-chen on Sinclair Chen's Shortform

we completely dominate dogs. society treat them well because enough humans love dogs.

I do think that cooperation between people is the origin of religion, and its moral rulesets which create tiny little societies that can hunt stags.

sinclair-chen on Sinclair Chen's Shortform

I definitely think that if I was not conscious then I would not coherently want things. But that conscious minds are the only things that can truly care, does not mean that conscious minds are the only things we should terminally care about.

The close circle composition isn't enough to justify Singerian altruism from egoist assumptions, because of the value falloff. With each degree of connection, I love the stranger less.

sinclair-chen on Sinclair Chen's Shortform

I didn't use the word "ethics" in my comment, so are you making a definitional statement, to distinguish between [universal value system] and [subjective value system] or just authoritatively saying that I'm wrong?

Are you claiming moral realism? I don't really believe that. If "ethics" is global, why should I care about "ethics"? Sorry if that sounds callous, I do actually care about the world, just trying to pin down what you mean.

shankar-sivarajan on Why Don't We Just... Shoggoth+Face+Paraphraser?

I suspect the real reason is stopping competitors fine-tuning on o1's CoT, which they also come right out and say:

Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring

anaguma on keltan's Shortform

What signal do we get from DeepSeek continuing to publish?

johnswentworth on leogao's Shortform

the number one spontaneous conversation is "what are you working on" or "what have you done so far", which forces you to re-explain what you're doing & the reasons for doing it to a skeptical & ignorant audience

I'm very curious if others also find this to be the biggest value-contributor amongst spontaneous conversations. (Also, more generally, I'm curious what kinds of spontaneous conversations people are getting so much value out of.)

lucid_levi_ackerman on Open Thread Fall 2024

Levi da.

I'm here to see if I can help.

I heard a few things about Elizier Yudkowsky. Saw a few LW articles while looking for previous research on my work with AI psychological influence. There isn't any so I signed up to contribute.

If you know my name, you probably know why that's a good idea. If you don't, it would be a waste of time for me to try to explain. You'd have to see for yourself. It's a whole ass rabbit hole.

thomas-kwa on Eli's shortform feed

Whether or not it would happen by default, this would be the single most useful LW feature for me. I'm often really unsure whether a post will get enough attention to be worth making it a longform, and sometimes even post shortforms like "comment if you want this to be a longform".

cstinesublime on Raemon's Shortform

Feedback loops I think are the principle bottleneck in my skill development, aside from the fact that if you're a notice you don't even know what you should be noticing (even if you have enough awareness to be cognizant of all signs and outputs of an act).

To give an example, I'm currently trying to learn how to generate client leads through video content for Instagram. Unless someone actually tells me about a video they liked and what they liked about it, figuring out how to please the algorithm to generate more engagement is hard. The only thing that "works" - tagging other people. Nothing about the type of content, the framing of the shots, the subject matter, the audio... nope... just whether or not one or more other Instagram accounts are tagged in it. (Of course since the end objective is - 'get commissioned' perhaps optimizing for Instagram engagement is not even the thing I should be optimizing at all... how would I know?)
Feedback loops are hard. A desirbale metaskill to have would be developing tight feedback loops.