LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Waluigi Effect (mega-post)
Cleo Nardo (strawberry calm) · 2023-03-03T03:22:08.619Z · comments (188)
My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"
Quintin Pope (quintin-pope) · 2023-03-21T00:06:07.889Z · comments (225)
Shutting Down the Lightcone Offices
habryka (habryka4) · 2023-03-14T22:47:51.539Z · comments (93)
Understanding and controlling a maze-solving policy network
TurnTrout · 2023-03-11T18:59:56.223Z · comments (22)
[link] Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky
jacquesthibs (jacques-thibodeau) · 2023-03-29T23:16:19.431Z · comments (296)
The Parable of the King and the Random Process
moridinamael · 2023-03-01T22:18:59.734Z · comments (22)
"Carefully Bootstrapped Alignment" is organizationally hard
Raemon · 2023-03-17T18:00:09.943Z · comments (22)
Discussion with Nate Soares on a key alignment difficulty
HoldenKarnofsky · 2023-03-13T21:20:02.976Z · comments (38)
[link] More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Beth Barnes (beth-barnes) · 2023-03-19T00:25:39.707Z · comments (54)
Deep Deceptiveness
So8res · 2023-03-21T02:51:52.794Z · comments (58)
[link] Actually, Othello-GPT Has A Linear Emergent World Representation
Neel Nanda (neel-nanda-1) · 2023-03-29T22:13:14.878Z · comments (24)
Natural Abstractions: Key claims, Theorems, and Critiques
LawrenceC (LawChan) · 2023-03-16T16:37:40.181Z · comments (20)
An AI risk argument that resonates with NYTimes readers
Julian Bradshaw · 2023-03-12T23:09:20.458Z · comments (14)
GPT-4 Plugs In
Zvi · 2023-03-27T12:10:00.926Z · comments (47)
[link] Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-03-09T16:55:15.311Z · comments (39)
ChatGPT (and now GPT4) is very easily distracted from its rules
dmcs (dmcsh) · 2023-03-15T17:55:04.356Z · comments (41)
A rough and incomplete review of some of John Wentworth's research
So8res · 2023-03-28T18:52:50.553Z · comments (17)
Acausal normalcy
Andrew_Critch · 2023-03-03T23:34:33.971Z · comments (30)
What Discovering Latent Knowledge Did and Did Not Find
Fabien Roger (Fabien) · 2023-03-13T19:29:45.601Z · comments (16)
A stylized dialogue on John Wentworth's claims about markets and optimization
So8res · 2023-03-25T22:32:53.216Z · comments (22)
The salt in pasta water fallacy
Thomas Sepulchre · 2023-03-27T14:53:07.718Z · comments (38)
[link] What would a compute monitoring plan look like? [Linkpost]
Akash (akash-wasil) · 2023-03-26T19:33:46.896Z · comments (9)
Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
Haoxing Du (haoxing-du) · 2023-03-01T01:47:20.660Z · comments (8)
Why Not Just... Build Weak AI Tools For AI Alignment Research?
johnswentworth · 2023-03-05T00:12:33.651Z · comments (17)
AI: Practical Advice for the Worried
Zvi · 2023-03-01T12:30:00.703Z · comments (43)
Towards understanding-based safety evaluations
evhub · 2023-03-15T18:18:01.259Z · comments (16)
[link] GPT-4
nz · 2023-03-14T17:02:02.276Z · comments (149)
Comments on OpenAI's "Planning for AGI and beyond"
So8res · 2023-03-03T23:01:29.665Z · comments (2)
[link] Dan Luu on "You can only communicate one top priority"
Raemon · 2023-03-18T18:55:09.998Z · comments (18)
Remarks 1–18 on GPT (compressed)
Cleo Nardo (strawberry calm) · 2023-03-20T22:27:26.277Z · comments (35)
POC || GTFO culture as partial antidote to alignment wordcelism
lc · 2023-03-15T10:21:47.037Z · comments (11)
Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent
ArthurB · 2023-03-09T09:26:25.383Z · comments (32)
Why I’m not into the Free Energy Principle
Steven Byrnes (steve2152) · 2023-03-02T19:27:52.309Z · comments (48)
[link] Against LLM Reductionism
Erich_Grunewald · 2023-03-08T15:52:38.741Z · comments (17)
The Translucent Thoughts Hypotheses and Their Implications
Fabien Roger (Fabien) · 2023-03-09T16:30:02.355Z · comments (7)
Good News, Everyone!
jbash · 2023-03-25T13:48:22.499Z · comments (23)
Conceding a short timelines bet early
Matthew Barnett (matthew-barnett) · 2023-03-16T21:49:35.903Z · comments (16)
[link] [Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Vika · 2023-03-07T11:55:01.131Z · comments (13)
We have to Upgrade
Jed McCaleb (jed-mccaleb) · 2023-03-23T17:53:32.222Z · comments (35)
[link] FLI open letter: Pause giant AI experiments
Zach Stein-Perlman · 2023-03-29T04:04:23.333Z · comments (123)
Why Not Just Outsource Alignment Research To An AI?
johnswentworth · 2023-03-09T21:49:19.774Z · comments (47)
High Status Eschews Quantification of Performance
niplav · 2023-03-19T22:14:16.523Z · comments (36)
How bad a future do ML researchers expect?
KatjaGrace · 2023-03-09T04:50:05.122Z · comments (7)
ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Christopher King (christopher-king) · 2023-03-15T00:29:23.523Z · comments (22)
[link] Manifold: If okay AGI, why?
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-03-25T22:43:53.820Z · comments (37)
[link] Here, have a calmness video
Kaj_Sotala · 2023-03-16T10:00:42.511Z · comments (15)
GPT can write Quines now (GPT-4)
Andrew_Critch · 2023-03-14T19:18:51.903Z · comments (30)
"Liquidity" vs "solvency" in bank runs (and some notes on Silicon Valley Bank)
rossry · 2023-03-12T09:16:45.630Z · comments (27)
The Overton Window widens: Examples of AI risk in the media
Akash (akash-wasil) · 2023-03-23T17:10:14.616Z · comments (24)
Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.
Cleo Nardo (strawberry calm) · 2023-03-16T03:08:52.618Z · comments (26)
next page (older posts) →