LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Instrumental convergence is what makes general intelligence possible
tailcalled · 2022-11-11T16:38:14.390Z · comments (11)
Intuitions about solving hard problems
Richard_Ngo (ricraz) · 2022-04-25T15:29:04.253Z · comments (23)
Betting with Mandatory Post-Mortem
abramdemski · 2020-06-24T20:04:34.177Z · comments (14)
[question] ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant?
jacobjacob · 2021-07-22T01:26:26.117Z · answers+comments (73)
Consider Joining the UK Foundation Model Taskforce
Zvi · 2023-07-10T13:50:05.097Z · comments (12)
A transcript of the TED talk by Eliezer Yudkowsky
Mikhail Samin (mikhail-samin) · 2023-07-12T12:12:34.399Z · comments (13)
Language models are nearly AGIs but we don't notice it because we keep shifting the bar
philosophybear · 2022-12-30T05:15:15.625Z · comments (13)
Charbel-Raphaël and Lucius discuss interpretability
Mateusz Bagiński (mateusz-baginski) · 2023-10-30T05:50:34.589Z · comments (7)
How could you possibly choose what an AI wants?
So8res · 2023-04-19T17:08:54.694Z · comments (19)
Should we publish mechanistic interpretability research?
Marius Hobbhahn (marius-hobbhahn) · 2023-04-21T16:19:40.514Z · comments (40)
[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)
[link] Cohabitive Games so Far
mako yass (MakoYass) · 2023-09-28T15:41:27.986Z · comments (129)
I don't think MIRI "gave up"
Raemon · 2023-02-03T00:26:07.552Z · comments (64)
Predictions for shard theory mechanistic interpretability results
TurnTrout · 2023-03-01T05:16:48.043Z · comments (10)
How to Play a Support Role in Research Conversations
johnswentworth · 2021-04-23T20:57:50.075Z · comments (4)
[link] My techno-optimism [By Vitalik Buterin]
habryka (habryka4) · 2023-11-27T23:53:35.859Z · comments (17)
[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)
[link] My emotional reaction to the current funding situation
Sam F. Brown (sam-4) · 2022-09-09T22:02:46.301Z · comments (36)
Summary of and Thoughts on the Hotz/Yudkowsky Debate
Zvi · 2023-08-16T16:50:02.808Z · comments (47)
Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell?
Karl von Wendt · 2023-06-25T16:59:49.173Z · comments (53)
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)
Call for research on evaluating alignment (funding + advice available)
Beth Barnes (beth-barnes) · 2021-08-31T23:28:49.121Z · comments (11)
[link] Priorities for the UK Foundation Models Taskforce
Andrea_Miotti (AndreaM) · 2023-07-21T15:23:34.029Z · comments (4)
Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)
What Would A Fight Between Humanity And AGI Look Like?
johnswentworth · 2022-04-05T20:03:30.232Z · comments (20)
But is it really in Rome? An investigation of the ROME model editing technique
jacquesthibs (jacques-thibodeau) · 2022-12-30T02:40:36.713Z · comments (2)
Apply for MATS Winter 2023-24!
utilistrutil · 2023-10-21T02:27:34.350Z · comments (6)
[link] Why did we wait so long for the threshing machine?
jasoncrawford · 2021-06-29T19:55:38.883Z · comments (20)
[link] Direct effects matter!
Aaron Bergman (aaronb50) · 2021-03-14T04:33:11.493Z · comments (28)
'simulator' framing and confusions about LLMs
Beth Barnes (beth-barnes) · 2022-12-31T23:38:57.474Z · comments (11)
Anthropic Observations
Zvi · 2023-07-25T12:50:03.178Z · comments (1)
Deception Chess: Game #1
Zane · 2023-11-03T21:13:55.777Z · comments (19)
Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)
Ukraine Post #12
Zvi · 2022-09-22T14:40:03.753Z · comments (3)
Cultivating And Destroying Agency
hath · 2022-06-30T03:59:27.239Z · comments (11)
[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)
Retrospective: Lessons from the Failed Alignment Startup AISafety.com
Søren Elverlin (soren-elverlin-1) · 2023-05-12T18:07:20.857Z · comments (9)
Making AIs less likely to be spiteful
Nicolas Macé (NicolasMace) · 2023-09-26T14:12:06.202Z · comments (4)
What can the principal-agent literature tell us about AI risk?
apc (alexis-carlier) · 2020-02-08T21:28:09.800Z · comments (29)
History's Biggest Natural Experiment
jimrandomh · 2020-03-24T02:56:30.070Z · comments (7)
A mostly critical review of infra-Bayesianism
David Matolcsi (matolcsid) · 2023-02-28T18:37:58.448Z · comments (9)
[link] Model, Care, Execution
Ricki Heicklen (bayesshammai) · 2023-06-26T04:05:50.065Z · comments (9)
Clarifying inner alignment terminology
evhub · 2020-11-09T20:40:27.043Z · comments (17)
Slightly against aligning with neo-luddites
Matthew Barnett (matthew-barnett) · 2022-12-26T22:46:42.693Z · comments (31)
Short Remark on the (subjective) mathematical 'naturalness' of the Nanda--Lieberum addition modulo 113 algorithm
carboniferous_umbraculum (Spencer Becker-Kahn) · 2023-06-01T11:31:37.796Z · comments (12)
Effective Evil
lsusr · 2021-11-02T00:26:29.910Z · comments (7)
Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (1)
[link] Sam Altman: "Planning for AGI and beyond"
LawrenceC (LawChan) · 2023-02-24T20:28:00.430Z · comments (54)
Bayeswatch 10: Spyware
lsusr · 2021-09-29T07:01:25.529Z · comments (7)
[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)
← previous page (newer posts) · next page (older posts) →