LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)

[link] Comparative advantage and when to blow up your island
dynomight · 2020-09-12T06:20:36.622Z · comments (39)

0. CAST: Corrigibility as Singular Target
Max Harms (max-harms) · 2024-06-07T22:29:12.934Z · comments (12)

Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (18)

The Information: OpenAI shows 'Strawberry' to feds, races to launch it
Martín Soto (martinsq) · 2024-08-27T23:10:18.155Z · comments (15)

Learning By Writing
HoldenKarnofsky · 2022-02-22T15:50:19.452Z · comments (25)

Trapped Priors As A Basic Problem Of Rationality
Scott Alexander (Yvain) · 2021-03-12T20:02:28.639Z · comments (33)

Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-07-24T11:30:10.602Z · comments (12)

Supervise Process, not Outcomes
stuhlmueller · 2022-04-05T22:18:20.068Z · comments (9)

Algorithmic Improvement Is Probably Faster Than Scaling Now
johnswentworth · 2023-06-06T02:57:33.700Z · comments (25)

Updating my AI timelines
Matthew Barnett (matthew-barnett) · 2022-12-05T20:46:28.161Z · comments (50)

Redwood Research’s current project
Buck · 2021-09-21T23:30:36.989Z · comments (29)

The unexpected difficulty of comparing AlphaStar to humans
Richard Korzekwa (Grothor) · 2019-09-18T02:20:01.292Z · comments (36)

Why I Am Not in Charge
Zvi · 2021-02-07T18:20:01.333Z · comments (23)

Dark Forest Theories
Raemon · 2023-05-12T20:21:49.052Z · comments (53)

Stop posting prompt injections on Twitter and calling it "misalignment"
lc · 2023-02-19T02:21:44.061Z · comments (9)

[link] That Alien Message - The Animation
Writer · 2024-09-07T14:53:30.604Z · comments (9)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (130)

Public beliefs vs. Private beliefs
Eli Tyre (elityre) · 2022-06-01T21:33:32.661Z · comments (30)

Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi · 2022-04-15T08:57:35.502Z · comments (13)

Interpreting Neural Networks through the Polytope Lens
Sid Black (sid-black) · 2022-09-23T17:58:30.639Z · comments (29)

[link] Fields that I reference when thinking about AI takeover prevention
Buck · 2024-08-13T23:08:54.950Z · comments (16)

[link] Transformer Circuits
evhub · 2021-12-22T21:09:22.676Z · comments (4)

My side of an argument with Jacob Cannell about chip interconnect losses
Steven Byrnes (steve2152) · 2023-06-21T13:33:49.543Z · comments (11)

[question] why assume AGIs will optimize for fixed goals?
nostalgebraist · 2022-06-10T01:28:10.961Z · answers+comments (57)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

[link] Nursing doubts
dynomight · 2024-08-30T02:25:36.826Z · comments (23)

We're already in AI takeoff
Valentine · 2022-03-08T23:09:06.733Z · comments (119)

Why Not Just Outsource Alignment Research To An AI?
johnswentworth · 2023-03-09T21:49:19.774Z · comments (50)

Momentum of Light in Glass
Ben (ben-lang) · 2024-10-09T20:19:42.088Z · comments (44)

A brief collection of Hinton's recent comments on AGI risk
Kaj_Sotala · 2023-05-04T23:31:06.157Z · comments (9)

[link] We Found An Neuron in GPT-2
Joseph Miller (Josephm) · 2023-02-11T18:27:29.410Z · comments (23)

The Bayesian Tyrant
abramdemski · 2020-08-20T00:08:55.738Z · comments (21)

Conversational Cultures: Combat vs Nurture (V2)
Ruby · 2019-12-31T20:23:53.772Z · comments (92)

Takeaways from our robust injury classifier project [Redwood Research]
dmz (DMZ) · 2022-09-17T03:55:25.868Z · comments (12)

AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (5)

Sentience matters
So8res · 2023-05-29T21:25:30.638Z · comments (96)

Value Claims (In Particular) Are Usually Bullshit
johnswentworth · 2024-05-30T06:26:21.151Z · comments (18)

Twitter thread on postrationalists
Eli Tyre (elityre) · 2022-02-17T09:02:54.806Z · comments (32)

Don’t ignore bad vibes you get from people
Kaj_Sotala · 2025-01-18T09:20:17.397Z · comments (50)

The Translucent Thoughts Hypotheses and Their Implications
Fabien Roger (Fabien) · 2023-03-09T16:30:02.355Z · comments (7)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (50)

AI Views Snapshots
Rob Bensinger (RobbBB) · 2023-12-13T00:45:50.016Z · comments (61)

[link] The Checklist: What Succeeding at AI Safety Will Involve
Sam Bowman (sbowman) · 2024-09-03T18:18:34.230Z · comments (49)

The Case for Extreme Vaccine Effectiveness
Ruby · 2021-04-13T21:08:39.470Z · comments (37)

MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger (RobbBB) · 2021-03-05T23:43:54.186Z · comments (13)

Clarifying and predicting AGI
Richard_Ngo (ricraz) · 2023-05-04T15:55:26.283Z · comments (44)

[link] The Goddess of Everything Else - The Animation
Writer · 2023-07-13T16:26:25.552Z · comments (4)

Responses to apparent rationalist confusions about game / decision theory
Anthony DiGiovanni (antimonyanthony) · 2023-08-30T22:02:12.218Z · comments (20)

Irrational Modesty
Tomás B. (Bjartur Tómas) · 2021-06-20T19:38:25.320Z · comments (6)

← previous page (newer posts) · next page (older posts) →

^{^}

I claim non-orthogonality between goals and means in this case. For some community with altruistic people, its structures require learning a fair bit about people's values. For a group which wants tech companies to focus on consumers' quality-of-life more than currently, not so.

LessWrong 2.0 Reader

Archive

Recent comments