LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)
UDT shows that decision theory is more puzzling than ever
Wei Dai (Wei_Dai) · 2023-09-13T12:26:09.739Z · comments (51)
Thoughts on sharing information about language model capabilities
paulfchristiano · 2023-07-31T16:04:21.396Z · comments (34)
If interpretability research goes well, it may get dangerous
So8res · 2023-04-03T21:48:18.752Z · comments (10)
[link] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
likenneth · 2023-06-11T05:38:35.284Z · comments (4)
Evolution provides no evidence for the sharp left turn
Quintin Pope (quintin-pope) · 2023-04-11T18:43:07.776Z · comments (62)
[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)
Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (79)
Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds
1a3orn · 2023-04-04T17:39:39.720Z · comments (35)
Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)
Consciousness as a conflationary alliance term for intrinsically valued internal experiences
Andrew_Critch · 2023-07-10T08:09:48.881Z · comments (46)
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (55)
My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (186)
Twiblings, four-parent babies and other reproductive technology
GeneSmith · 2023-05-20T17:11:23.726Z · comments (32)
Grant applications and grand narratives
Elizabeth (pktechgirl) · 2023-07-02T00:16:25.129Z · comments (20)
The basic reasons I expect AGI ruin
Rob Bensinger (RobbBB) · 2023-04-18T03:37:01.496Z · comments (73)
Updates and Reflections on Optimal Exercise after Nearly a Decade
romeostevensit · 2023-06-08T23:02:14.761Z · comments (55)
Labs should be explicit about why they are building AGI
peterbarnett · 2023-10-17T21:09:20.711Z · comments (16)
Transcript and Brief Response to Twitter Conversation between Yann LeCunn and Eliezer Yudkowsky
Zvi · 2023-04-26T13:10:01.233Z · comments (50)
Announcing Timaeus
Jesse Hoogland (jhoogland) · 2023-10-22T11:59:03.938Z · comments (15)
The other side of the tidal wave
KatjaGrace · 2023-11-03T05:40:05.363Z · comments (79)
Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)
[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)
[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)
[link] Large Language Models will be Great for Censorship
Ethan Edwards · 2023-08-21T19:03:55.323Z · comments (14)
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB (JanBrauner) · 2023-09-28T18:53:58.896Z · comments (37)
Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (17)
What will GPT-2030 look like?
jsteinhardt · 2023-06-07T23:40:02.925Z · comments (42)
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
jacobjacob · 2023-09-01T04:03:41.067Z · comments (23)
This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (33)
There should be more AI safety orgs
Marius Hobbhahn (marius-hobbhahn) · 2023-09-21T14:53:52.779Z · comments (25)
The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (63)
The ‘ petertodd’ phenomenon
mwatkins · 2023-04-15T00:59:47.142Z · comments (50)
My tentative best guess on how EAs and Rationalists sometimes turn crazy
habryka (habryka4) · 2023-06-21T04:11:28.518Z · comments (106)
Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)
re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)
"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)
[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (98)
[link] OpenAI API base models are not sycophantic, at any size
nostalgebraist · 2023-08-29T00:58:29.007Z · comments (19)
OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)
Another medical miracle
Dentin · 2023-06-25T20:43:45.493Z · comments (45)
A report about LessWrong karma volatility from a different universe
Ben Pace (Benito) · 2023-04-01T21:48:32.503Z · comments (7)
[link] I still think it's very unlikely we're observing alien aircraft
dynomight · 2023-06-15T13:01:27.734Z · comments (68)
LLMs Sometimes Generate Purely Negatively-Reinforced Text
Fabien Roger (Fabien) · 2023-06-16T16:31:32.848Z · comments (11)
Feedbackloop-first Rationality
Raemon · 2023-08-07T17:58:56.349Z · comments (65)
[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)
AI as a science, and three obstacles to alignment strategies
So8res · 2023-10-25T21:00:16.003Z · comments (79)
Architects of Our Own Demise: We Should Stop Developing AI
Roko · 2023-10-26T00:36:05.126Z · comments (74)
On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (17)
[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)
← previous page (newer posts) · next page (older posts) →