LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

We're Not Ready: thoughts on "pausing" and responsible scaling policies
HoldenKarnofsky · 2023-10-27T15:19:33.757Z · comments (33)
[link] The Lighthaven Campus is open for bookings
habryka (habryka4) · 2023-09-30T01:08:12.664Z · comments (18)
GPT-4 Plugs In
Zvi · 2023-03-27T12:10:00.926Z · comments (47)
AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)
Cognitive Emulation: A Naive AI Safety Proposal
Connor Leahy (NPCollapse) · 2023-02-25T19:35:02.409Z · comments (45)
If interpretability research goes well, it may get dangerous
So8res · 2023-04-03T21:48:18.752Z · comments (10)
Thoughts on sharing information about language model capabilities
paulfchristiano · 2023-07-31T16:04:21.396Z · comments (34)
UDT shows that decision theory is more puzzling than ever
Wei Dai (Wei_Dai) · 2023-09-13T12:26:09.739Z · comments (51)
[link] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
likenneth · 2023-06-11T05:38:35.284Z · comments (4)
Evolution provides no evidence for the sharp left turn
Quintin Pope (quintin-pope) · 2023-04-11T18:43:07.776Z · comments (62)
[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)
Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds
1a3orn · 2023-04-04T17:39:39.720Z · comments (35)
Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)
Consciousness as a conflationary alliance term for intrinsically valued internal experiences
Andrew_Critch · 2023-07-10T08:09:48.881Z · comments (46)
AI alignment researchers don't (seem to) stack
So8res · 2023-02-21T00:48:25.186Z · comments (40)
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (55)
Grant applications and grand narratives
Elizabeth (pktechgirl) · 2023-07-02T00:16:25.129Z · comments (20)
Twiblings, four-parent babies and other reproductive technology
GeneSmith · 2023-05-20T17:11:23.726Z · comments (32)
The basic reasons I expect AGI ruin
Rob Bensinger (RobbBB) · 2023-04-18T03:37:01.496Z · comments (73)
Transcript and Brief Response to Twitter Conversation between Yann LeCunn and Eliezer Yudkowsky
Zvi · 2023-04-26T13:10:01.233Z · comments (50)
Updates and Reflections on Optimal Exercise after Nearly a Decade
romeostevensit · 2023-06-08T23:02:14.761Z · comments (55)
Labs should be explicit about why they are building AGI
peterbarnett · 2023-10-17T21:09:20.711Z · comments (16)
Announcing Timaeus
Jesse Hoogland (jhoogland) · 2023-10-22T11:59:03.938Z · comments (15)
Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)
The other side of the tidal wave
KatjaGrace · 2023-11-03T05:40:05.363Z · comments (79)
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB (JanBrauner) · 2023-09-28T18:53:58.896Z · comments (37)
[link] Large Language Models will be Great for Censorship
Ethan Edwards · 2023-08-21T19:03:55.323Z · comments (14)
What will GPT-2030 look like?
jsteinhardt · 2023-06-07T23:40:02.925Z · comments (42)
EigenKarma: trust at scale
Henrik Karlsson (henrik-karlsson) · 2023-02-08T18:52:24.490Z · comments (50)
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
jacobjacob · 2023-09-01T04:03:41.067Z · comments (23)
[link] Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-03-09T16:55:15.311Z · comments (39)
The ‘ petertodd’ phenomenon
mwatkins · 2023-04-15T00:59:47.142Z · comments (50)
My tentative best guess on how EAs and Rationalists sometimes turn crazy
habryka (habryka4) · 2023-06-21T04:11:28.518Z · comments (106)
There should be more AI safety orgs
Marius Hobbhahn (marius-hobbhahn) · 2023-09-21T14:53:52.779Z · comments (25)
"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)
What a compute-centric framework says about AI takeoff speeds
Tom Davidson (tom-davidson-1) · 2023-01-23T04:02:07.672Z · comments (29)
re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)
[link] OpenAI API base models are not sycophantic, at any size
nostalgebraist · 2023-08-29T00:58:29.007Z · comments (19)
ChatGPT (and now GPT4) is very easily distracted from its rules
dmcs (dmcsh) · 2023-03-15T17:55:04.356Z · comments (41)
Another medical miracle
Dentin · 2023-06-25T20:43:45.493Z · comments (45)
[link] I still think it's very unlikely we're observing alien aircraft
dynomight · 2023-06-15T13:01:27.734Z · comments (68)
LLMs Sometimes Generate Purely Negatively-Reinforced Text
Fabien Roger (Fabien) · 2023-06-16T16:31:32.848Z · comments (11)
A report about LessWrong karma volatility from a different universe
Ben Pace (Benito) · 2023-04-01T21:48:32.503Z · comments (7)
Feedbackloop-first Rationality
Raemon · 2023-08-07T17:58:56.349Z · comments (65)
A rough and incomplete review of some of John Wentworth's research
So8res · 2023-03-28T18:52:50.553Z · comments (17)
AI as a science, and three obstacles to alignment strategies
So8res · 2023-10-25T21:00:16.003Z · comments (79)
Acausal normalcy
Andrew_Critch · 2023-03-03T23:34:33.971Z · comments (30)
Architects of Our Own Demise: We Should Stop Developing AI
Roko · 2023-10-26T00:36:05.126Z · comments (74)
Decision Theory with the Magic Parts Highlighted
moridinamael · 2023-05-16T17:39:55.038Z · comments (24)
Alexander and Yudkowsky on AGI goals
Scott Alexander (Yvain) · 2023-01-24T21:09:16.938Z · comments (52)
← previous page (newer posts) · next page (older posts) →