LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (36)

2023 Survey Results
Screwtape · 2024-02-16T22:24:28.132Z · comments (26)

[link] Vernor Vinge, who coined the term "Technological Singularity", dies at 79
Kaj_Sotala · 2024-03-21T22:14:14.699Z · comments (24)

On Devin
Zvi · 2024-03-18T13:20:04.779Z · comments (30)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (20)

Raising children on the eve of AI
juliawise · 2024-02-15T21:28:07.737Z · comments (15)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (14)

Some (problematic) aesthetics of what constitutes good work in academia
Steven Byrnes (steve2152) · 2024-03-11T17:47:28.835Z · comments (12)

The Plan - 2023 Version
johnswentworth · 2023-12-29T23:34:19.651Z · comments (39)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (34)

Tips for Empirical Alignment Research
Ethan Perez (ethan-perez) · 2024-02-29T06:04:54.481Z · comments (4)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

Leading The Parade
johnswentworth · 2024-01-31T22:39:56.499Z · comments (30)

[link] Using axis lines for good or evil
dynomight · 2024-03-06T14:47:10.989Z · comments (39)

The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (46)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)

What good is G-factor if you're dumped in the woods? A field report from a camp counselor.
Hastings (hastings-greer) · 2024-01-12T13:17:23.829Z · comments (22)

Deep atheism and AI risk
Joe Carlsmith (joekc) · 2024-01-04T18:58:47.745Z · comments (22)

Read the Roon
Zvi · 2024-03-05T13:50:04.967Z · comments (6)

Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)

The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (14)

Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (7)

An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)

Updatelessness doesn't solve most problems
Martín Soto (martinsq) · 2024-02-08T17:30:11.266Z · comments (43)

Community Notes by X
NicholasKees (nick_kees) · 2024-03-18T17:13:33.195Z · comments (15)

My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (31)

A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

[link] Steering Llama-2 with contrastive activation additions
Nina Rimsky (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)

[link] If you weren't such an idiot...
kave · 2024-03-02T00:01:37.314Z · comments (60)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (16)

[link] Anthropic release Claude 3, claims >GPT-4 Performance
LawrenceC (LawChan) · 2024-03-04T18:23:54.065Z · comments (40)

AI Alignment Metastrategy
Vanessa Kosoy (vanessa-kosoy) · 2023-12-31T12:06:11.433Z · comments (12)

Four visions of Transformative AI success
Steven Byrnes (steve2152) · 2024-01-17T20:45:46.976Z · comments (22)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

The Parable Of The Fallen Pendulum - Part 1
johnswentworth · 2024-03-01T00:25:00.111Z · comments (32)

[link] Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings)
tailcalled · 2024-01-03T17:07:13.990Z · comments (25)

[link] Gender Exploration
sapphire (deluks917) · 2024-01-14T18:57:32.893Z · comments (23)

Social status part 1/2: negotiations over object-level preferences
Steven Byrnes (steve2152) · 2024-03-05T16:29:07.143Z · comments (15)

Being nicer than Clippy
Joe Carlsmith (joekc) · 2024-01-16T19:44:23.893Z · comments (23)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (22)

' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (15)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (25)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (63)

← previous page (newer posts) · next page (older posts) →

^{^}

side note: It's orthogonal to the point of this post, but this example also makes me think: if I were working on a safe ASI project, I wouldn't mind if another group who had discreetly built safe ASI used it to shut my project down, since my goal is 'ensure the future lightcone is used in a valuable, tragedy-averse way' and not 'gain personal power' or 'have a fun time working on AI' or something. In my morality, it would be naive to be opposed to that shutdown. But to the extent humanity is naive, we can easily do something else in that future to create better present dynamics (as the maintext argues).

If there is a group for whom using ASI to make the world robust to risks and free of harm, in a way where its actions don't infringe on ongoing non-violent activities is problematic, then this post doesn't apply to them as their issue all along was not with the character of the pivotal act, but instead possibly with something like 'having my personal cosmic significance as a capabilities researcher stripped away by the success of an external alignment project'.

Another disclaimer: This post is about a world in which safely usable superintelligence has been created, but I'm not confident that anyone (myself included) currently has a safe and ready method to create it with. This post shouldn't be read as an endorsement of possible current attempts to do this. I would of course prefer if this civilization were one which could coordinate such that no groups were presently working on ASI, precluding this discourse.

^{^}

I do this I find a concept/post/book that I can mine for more thoughts or needing mastery of a conceptual framework.

LessWrong 2.0 Reader

Archive

Recent comments

On Pivotal Acts