LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)
[link] Comparative advantage and when to blow up your island
dynomight · 2020-09-12T06:20:36.622Z · comments (39)
0. CAST: Corrigibility as Singular Target
Max Harms (max-harms) · 2024-06-07T22:29:12.934Z · comments (12)
Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (18)
The Information: OpenAI shows 'Strawberry' to feds, races to launch it
Martín Soto (martinsq) · 2024-08-27T23:10:18.155Z · comments (15)
Learning By Writing
HoldenKarnofsky · 2022-02-22T15:50:19.452Z · comments (25)
Trapped Priors As A Basic Problem Of Rationality
Scott Alexander (Yvain) · 2021-03-12T20:02:28.639Z · comments (33)
Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-07-24T11:30:10.602Z · comments (12)
Supervise Process, not Outcomes
stuhlmueller · 2022-04-05T22:18:20.068Z · comments (9)
Algorithmic Improvement Is Probably Faster Than Scaling Now
johnswentworth · 2023-06-06T02:57:33.700Z · comments (25)
Updating my AI timelines
Matthew Barnett (matthew-barnett) · 2022-12-05T20:46:28.161Z · comments (50)
Redwood Research’s current project
Buck · 2021-09-21T23:30:36.989Z · comments (29)
The unexpected difficulty of comparing AlphaStar to humans
Richard Korzekwa (Grothor) · 2019-09-18T02:20:01.292Z · comments (36)
Why I Am Not in Charge
Zvi · 2021-02-07T18:20:01.333Z · comments (23)
Dark Forest Theories
Raemon · 2023-05-12T20:21:49.052Z · comments (53)
Stop posting prompt injections on Twitter and calling it "misalignment"
lc · 2023-02-19T02:21:44.061Z · comments (9)
[link] That Alien Message - The Animation
Writer · 2024-09-07T14:53:30.604Z · comments (9)
When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (130)
Public beliefs vs. Private beliefs
Eli Tyre (elityre) · 2022-06-01T21:33:32.661Z · comments (30)
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi · 2022-04-15T08:57:35.502Z · comments (13)
Interpreting Neural Networks through the Polytope Lens
Sid Black (sid-black) · 2022-09-23T17:58:30.639Z · comments (29)
[link] Fields that I reference when thinking about AI takeover prevention
Buck · 2024-08-13T23:08:54.950Z · comments (16)
[link] Transformer Circuits
evhub · 2021-12-22T21:09:22.676Z · comments (4)
My side of an argument with Jacob Cannell about chip interconnect losses
Steven Byrnes (steve2152) · 2023-06-21T13:33:49.543Z · comments (11)
[question] why assume AGIs will optimize for fixed goals?
nostalgebraist · 2022-06-10T01:28:10.961Z · answers+comments (57)
“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)
[link] Nursing doubts
dynomight · 2024-08-30T02:25:36.826Z · comments (23)
We're already in AI takeoff
Valentine · 2022-03-08T23:09:06.733Z · comments (119)
Why Not Just Outsource Alignment Research To An AI?
johnswentworth · 2023-03-09T21:49:19.774Z · comments (50)
Momentum of Light in Glass
Ben (ben-lang) · 2024-10-09T20:19:42.088Z · comments (44)
A brief collection of Hinton's recent comments on AGI risk
Kaj_Sotala · 2023-05-04T23:31:06.157Z · comments (9)
[link] We Found An Neuron in GPT-2
Joseph Miller (Josephm) · 2023-02-11T18:27:29.410Z · comments (23)
The Bayesian Tyrant
abramdemski · 2020-08-20T00:08:55.738Z · comments (21)
Conversational Cultures: Combat vs Nurture (V2)
Ruby · 2019-12-31T20:23:53.772Z · comments (92)
Takeaways from our robust injury classifier project [Redwood Research]
dmz (DMZ) · 2022-09-17T03:55:25.868Z · comments (12)
AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (5)
Sentience matters
So8res · 2023-05-29T21:25:30.638Z · comments (96)
Value Claims (In Particular) Are Usually Bullshit
johnswentworth · 2024-05-30T06:26:21.151Z · comments (18)
Twitter thread on postrationalists
Eli Tyre (elityre) · 2022-02-17T09:02:54.806Z · comments (32)
Don’t ignore bad vibes you get from people
Kaj_Sotala · 2025-01-18T09:20:17.397Z · comments (50)
The Translucent Thoughts Hypotheses and Their Implications
Fabien Roger (Fabien) · 2023-03-09T16:30:02.355Z · comments (7)
[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (50)
AI Views Snapshots
Rob Bensinger (RobbBB) · 2023-12-13T00:45:50.016Z · comments (61)
[link] The Checklist: What Succeeding at AI Safety Will Involve
Sam Bowman (sbowman) · 2024-09-03T18:18:34.230Z · comments (49)
The Case for Extreme Vaccine Effectiveness
Ruby · 2021-04-13T21:08:39.470Z · comments (37)
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger (RobbBB) · 2021-03-05T23:43:54.186Z · comments (13)
Clarifying and predicting AGI
Richard_Ngo (ricraz) · 2023-05-04T15:55:26.283Z · comments (44)
[link] The Goddess of Everything Else - The Animation
Writer · 2023-07-13T16:26:25.552Z · comments (4)
Responses to apparent rationalist confusions about game / decision theory
Anthony DiGiovanni (antimonyanthony) · 2023-08-30T22:02:12.218Z · comments (20)
Irrational Modesty
Tomás B. (Bjartur Tómas) · 2021-06-20T19:38:25.320Z · comments (6)
← previous page (newer posts) · next page (older posts) →