LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (13)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (25)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (119)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (14)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (132)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (141)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (22)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (18)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (58)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (40)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (49)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sodium on Sodium's Shortform

I think people see it and think "oh boy I get to be the fat people in Wall-E"

(My friend on what happens if the general public feels the AGI)

wassname on Are the majority of your ancestors farmers or non-farmers?

By sensible, I don't indicate disagreement, but a way of interpreting the question.

jmh on Neutrality

One point I'm not sure about with the idea of neutrality is neutrality of process or of outcome. Or would that distinction not matter to your interests here?

seth-herd on OpenAI Email Archives (from Musk v. Altman)

Good suggestion, thanks and I'll do that.

I'm not commenting on those who are obviously just grinding an axe; I'm commenting on the stance toward "doomers" from otherwise reasonable people. From my limited survey the brand of x-risk concern isn't looking good, and that isn't mostly a result of the amazing rhetorical skills of the e/acc community ;)

martin-randall on Making a conservative case for alignment

I saw the the EA Forum's policy. If someone repeatedly and deliberately misgenders on the EA Forum they will be banned from that forum. But you don't need to post on the EA Forum at all in order to be part of the rationalist community. On the provided evidence, it is false that:

You are required to say certain things or you will be excluded from the community.

I want people of all political beliefs, including US conservative-coded beliefs, to feel welcome in the rationalist community. It's important to that goal to distinguish between policies and norms, because changing policies requires a different process to changing norms, and because policies and norms are unwelcoming in different ways and to different extents.

It's because of that goal that I'm encouraging you to change these incorrect/misleading/unclear statements. If newcomers incorrectly believe that they are required to say certain things or they will be excluded from the community, then they will feel less welcome, for nothing. Let's avoid that.

anthonyc on Which things were you surprised to learn are metaphors?

Plus, butter is churned, so it is a few percent air by volume when solid.

atillayasar on AtillaYasar's Shortform

Just because X describes Y in a high level abstract way, doesn't mean studying X is the best of understanding Y.

Often, the best way is to simply study Y, and studying X just makes you sound smarter when talking about Y.

pointless abstractions: cybernetics and OODA loop

This is based on my experience trying to learn stuff about cybernetics, in order to understand GUI tool design for personal use, and to understand the feedback loop that roughly looks like, build -> use -> rethink -> let it help you rethink -> rebuild, where me and any LLM instance I talk to (via the GUI) are part of the cybernetic system. Whenever I "loaded cybernetics concepts" into my mind and tried to view GUI design from that perspective, I was just spending a bunch of effort mapping the abstract ideas to concrete things, and then being like, "ok but so what?".

A similar thing happened while looking into the OODA loop, though at least its Wiki page has a nice little flowchart, and it's much more concrete than cybernetics. And you can draw more concrete inspiration about GUI design by thinking about fighter pilot interfaces.

It's also because I often see people using abstract reasoning and, whenever I dig into what they're actually saying it doesn't make that much sense. Also because of personal experience where, things become way clearer and easier to think about, after phrasing them in very concrete and basic ways.

justinpombrio on Which things were you surprised to learn are metaphors?

And English has it backwards. You can see the past, but not the future. The thing which just happened is most clear. The future comes at us from behind.

nicolas-lacombe on On The Rationalist Megameetup

[...] is anyone I met at LO looking for roommates?

there's a channel in the megameetup discord to discuss shared rooming. I think the discord might only be available to registered participants though (you could ask Screwtape in case I'm wrong)

jan-betley on Which things were you surprised to learn are not metaphors?

Oh yeah. How do I know I'm angry? My back is stiff and starts to hurt.