LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Picking Mentors For Research Programmes
Raymond D · 2023-11-10T13:01:14.197Z · comments (8)

Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

On the future of language models
owencb · 2023-12-20T16:58:28.433Z · comments (17)

[link] My techno-optimism [By Vitalik Buterin]
habryka (habryka4) · 2023-11-27T23:53:35.859Z · comments (17)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (59)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (4)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

[link] A Chess-GPT Linear Emergent World Representation
Adam Karvonen (karvonenadam) · 2024-02-08T04:25:15.222Z · comments (14)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] The Minority Faction
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (6)

[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (14)

Learning-theoretic agenda reading list
Vanessa Kosoy (vanessa-kosoy) · 2023-11-09T17:25:35.046Z · comments (0)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (19)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (14)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

l-rudolf-l on Survival without dignity

Thanks for advertising my work, but alas, I think that's much more depressing than this one.

Could make for a good Barbie <> Oppenheimer combo though?

l-rudolf-l on Survival without dignity

Agreed! Transformative AI is hard to visualise, and concrete stories / scenarios feel very lacking (in both disasters and positive visions, but especially in positive visions [LW · GW]).

I like when people try to do this - for example, Richard Ngo has a bunch here [? · GW], and Daniel Kokotajlo has his near-prophetic scenario here [LW · GW]. I've previously tried to do it here [LW · GW] (going out with a whimper leading to Bostrom's "disneyland without children" is one of the most poetic disasters imaginable - great setting for a story), and have a bunch more ideas I hope to get to.

But overall: the LessWrong bubble has a high emphasis on radical AI futures, and an enormous amount of fiction in its canon (HPMOR, Unsong, Planecrash). I keep being surprised that so few people combine those things.

markusrobam on Explore More: A Bag of Tricks to Keep Your Life on the Rails

I'm curious about the part where You wrote: "You could raise awareness for Leukemia, Dyslexia, or Estonia."
Estonia is a country. Leukemia and Dyslexia are not countries. Was it a typo? Or did you actually want to raise awareness about Estonia?

(I'm from Estonia myself)

Nice article though, thanks!

l-rudolf-l on Survival without dignity

I did not actually consider this, but that is a very reasonable interpretation!

(I vaguely remember reading some description of explicitly flat-out anthropic immortality saving the day, but I can't seem to find it again now)

steve2152 on Inner Alignment in Salt-Starved Rats

No I don’t recommend reading this post anymore, it has some ideas with little kernels of truth but also lots of errors and confusions. ¯\_(ツ)_/¯

alfred-harwood on Abstractions are not Natural

Thanks for taking the time to explain this. This is a clears a lot of things up.

Let me see if I understand. So one reason that an agent might develop an abstraction is that it has a utility function that deals with that abstraction (if my utility function is ‘maximize the number of trees’, its helpful to have an abstraction for ‘trees’). But the NAH goes further than this and says that, even if an agent had a very ‘unnatural’ utility function which didn’t deal with abstractions (eg. it was something very fine-grained like ‘I value this atom being in this exact position and this atom being in a different position etc…’) it would still, for instrumental reasons, end up using the ‘natural’ set of abstractions because the natural abstractions are in some sense the only ‘proper’ set of abstractions for interacting with the world. Similarly, while there might be perceptual systems/brains/etc which favour using certain unnatural abstractions, once agents become capable enough to start pursuing complex goals (or rather goals requiring a high level of generality), the universe will force them to use the natural abstractions (or else fail to achieve their goals). Does this sound right?

Presumably its possible to define some ‘unnatural’ abstractions. Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?

jim-buhler on Winning isn't enough

Without an objective standard of “winning” to turn to, this leaves us searching for new principles that could guide us in the face of indeterminacy. But that’s all for another post.

First time ever I am left hanging by a LW post. Genuinely.

towards_keeperhood on Inner Alignment in Salt-Starved Rats

Hi Steve, I didn't read this post yet and just wanted to ask whether it's still worth reading or whether everything relevant is now better in "incentive learning and dead sea salt experiment"?

oxidize on We can survive

Could I get some constructive criticism about why I'm being downvoted? It would be helpful for the sake of avoiding the same mistakes in the future.

oxidize on We can survive

Correct. It lacks tactical practicality right now, but I think that from a macro-directional perspective, it's sensible to align all of my current actions to that end goal. And I believe there is a huge demand among business minded intellectuals and ambitious people for a community like this to be created.