LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A hundredth of a bit of extra entropy
Adam Scherlis (adam-scherlis) · 2022-12-24T21:12:41.517Z · comments (4)
Reflections on my 5-month alignment upskilling grant
Jay Bailey · 2022-12-27T10:51:49.872Z · comments (4)
Three reasons to cooperate
paulfchristiano · 2022-12-24T17:40:01.114Z · comments (14)
Results from a survey on tool use and workflows in alignment research
jacquesthibs (jacques-thibodeau) · 2022-12-19T15:19:52.560Z · comments (2)
An Open Agency Architecture for Safe Transformative AI
davidad · 2022-12-20T13:04:06.409Z · comments (22)
Probably good projects for the AI safety ecosystem
Ryan Kidd (ryankidd44) · 2022-12-05T02:26:41.623Z · comments (31)
MrBeast's Squid Game Tricked Me
lsusr · 2022-12-03T05:50:02.339Z · comments (1)
10 Years of LessWrong
JohnBuridan · 2022-12-30T17:15:17.498Z · comments (2)
«Boundaries», Part 3b: Alignment problems in terms of boundaries
Andrew_Critch · 2022-12-14T22:34:41.443Z · comments (7)
[question] Who are some prominent reasonable people who are confident that AI won't kill everyone?
Optimization Process · 2022-12-05T09:12:41.797Z · answers+comments (54)
AI Safety Seems Hard to Measure
HoldenKarnofsky · 2022-12-08T19:50:07.352Z · comments (6)
On sincerity
Joe Carlsmith (joekc) · 2022-12-23T17:13:09.478Z · comments (6)
The True Spirit of Solstice?
Raemon · 2022-12-19T08:00:30.273Z · comments (31)
[link] Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC (LawChan) · 2022-12-16T22:12:54.461Z · comments (11)
Proper scoring rules don’t guarantee predicting fixed points
Johannes Treutlein (Johannes_Treutlein) · 2022-12-16T18:22:23.547Z · comments (8)
It's time to worry about online privacy again
Malmesbury (Elmer of Malmesbury) · 2022-12-25T21:05:30.977Z · comments (23)
AI Neorealism: a threat model & success criterion for existential safety
davidad · 2022-12-15T13:42:11.072Z · comments (1)
Can we efficiently explain model behaviors?
paulfchristiano · 2022-12-16T19:40:06.327Z · comments (3)
AGI Timelines in Governance: Different Strategies for Different Timeframes
simeon_c (WayZ) · 2022-12-19T21:31:25.746Z · comments (28)
Systems of Survival
Vaniver · 2022-12-09T05:13:53.064Z · comments (5)
Key Mostly Outward-Facing Facts From the Story of VaccinateCA
Zvi · 2022-12-14T13:30:00.831Z · comments (2)
[link] Summary of a new study on out-group hate (and how to fix it)
DirectedEvolution (AllAmericanBreakfast) · 2022-12-04T01:53:32.490Z · comments (30)
Update on Harvard AI Safety Team and MIT AI Alignment
Xander Davies (xanderdavies) · 2022-12-02T00:56:45.596Z · comments (4)
Verification Is Not Easier Than Generation In General
johnswentworth · 2022-12-06T05:20:48.744Z · comments (27)
[link] Predicting GPU performance
Marius Hobbhahn (marius-hobbhahn) · 2022-12-14T16:27:23.923Z · comments (26)
Notice when you stop reading right before you understand
just_browsing · 2022-12-20T05:09:43.224Z · comments (6)
The Meditation on Winter
Raemon · 2022-12-25T16:12:10.039Z · comments (3)
CIRL Corrigibility is Fragile
Rachel Freedman (rachelAF) · 2022-12-21T01:40:50.232Z · comments (9)
High-level hopes for AI alignment
HoldenKarnofsky · 2022-12-15T18:00:15.625Z · comments (3)
YCombinator fraud rates
Xodarap · 2022-12-25T19:21:52.829Z · comments (3)
[link] Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda (neel-nanda-1) · 2022-12-25T22:21:49.686Z · comments (7)
MIRI's "Death with Dignity" in 60 seconds.
Cleo Nardo (strawberry calm) · 2022-12-06T17:18:58.387Z · comments (4)
My thoughts on OpenAI's alignment plan
Akash (akash-wasil) · 2022-12-30T19:33:15.019Z · comments (3)
[link] Formalization as suspension of intuition
adamShimi · 2022-12-11T15:16:44.319Z · comments (18)
In defense of probably wrong mechanistic models
evhub · 2022-12-06T23:24:20.707Z · comments (10)
Reframing inner alignment
davidad · 2022-12-11T13:53:23.195Z · comments (13)
Air-gapping evaluation and support
Ryan Kidd (ryankidd44) · 2022-12-26T22:52:29.881Z · comments (1)
Announcing: The Independent AI Safety Registry
Shoshannah Tekofsky (DarkSym) · 2022-12-26T21:22:18.381Z · comments (9)
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth · 2022-12-20T01:22:25.101Z · comments (24)
Take 13: RLHF bad, conditioning good.
Charlie Steiner · 2022-12-22T10:44:06.359Z · comments (4)
Nook Nature
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2022-12-05T04:10:37.797Z · comments (18)
My AGI safety research—2022 review, ’23 plans
Steven Byrnes (steve2152) · 2022-12-14T15:15:52.473Z · comments (10)
Positive values seem more robust and lasting than prohibitions
TurnTrout · 2022-12-17T21:43:31.627Z · comments (13)
[link] My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)
Robert_AIZI · 2022-12-27T17:27:02.225Z · comments (0)
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner · 2022-12-08T08:14:17.275Z · comments (22)
Next Level Seinfeld
Zvi · 2022-12-19T13:30:00.538Z · comments (8)
China Covid #4
Zvi · 2022-12-22T16:30:00.919Z · comments (2)
[link] Think wider about the root causes of progress
jasoncrawford · 2022-12-21T20:05:46.986Z · comments (11)
Looking Back on Posts From 2022
Zvi · 2022-12-26T13:20:00.745Z · comments (8)
Applications open for AGI Safety Fundamentals: Alignment Course
Richard_Ngo (ricraz) · 2022-12-13T18:31:55.068Z · comments (0)
← previous page (newer posts) · next page (older posts) →