LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How to Play a Support Role in Research Conversations
johnswentworth · 2021-04-23T20:57:50.075Z · comments (4)
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)
[link] My emotional reaction to the current funding situation
Sam F. Brown (sam-4) · 2022-09-09T22:02:46.301Z · comments (36)
Consider Joining the UK Foundation Model Taskforce
Zvi · 2023-07-10T13:50:05.097Z · comments (12)
Summary of and Thoughts on the Hotz/Yudkowsky Debate
Zvi · 2023-08-16T16:50:02.808Z · comments (47)
A transcript of the TED talk by Eliezer Yudkowsky
Mikhail Samin (mikhail-samin) · 2023-07-12T12:12:34.399Z · comments (13)
Caution when interpreting Deepmind's In-context RL paper
Sam Marks (samuel-marks) · 2022-11-01T02:42:06.766Z · comments (8)
Instrumental convergence is what makes general intelligence possible
tailcalled · 2022-11-11T16:38:14.390Z · comments (11)
Picking Mentors For Research Programmes
Raymond D · 2023-11-10T13:01:14.197Z · comments (8)
Call for research on evaluating alignment (funding + advice available)
Beth Barnes (beth-barnes) · 2021-08-31T23:28:49.121Z · comments (11)
[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (58)
Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (15)
[link] Priorities for the UK Foundation Models Taskforce
Andrea_Miotti (AndreaM) · 2023-07-21T15:23:34.029Z · comments (4)
[link] ActAdd: Steering Language Models without Optimization
technicalities · 2023-09-06T17:21:56.214Z · comments (3)
[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)
In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)
Betting with Mandatory Post-Mortem
abramdemski · 2020-06-24T20:04:34.177Z · comments (14)
TOMORROW: the largest AI Safety protest ever!
Holly_Elmore · 2023-10-20T18:15:18.276Z · comments (26)
SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)
[link] Book review: WEIRDest People
PeterMcCluskey · 2020-11-30T03:33:17.510Z · comments (57)
Language models are nearly AGIs but we don't notice it because we keep shifting the bar
philosophybear · 2022-12-30T05:15:15.625Z · comments (13)
Another Way to Be Okay
Gretta Duleba (gretta-duleba) · 2023-02-19T20:49:31.895Z · comments (15)
[question] ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant?
jacobjacob · 2021-07-22T01:26:26.117Z · answers+comments (73)
Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.
Cleo Nardo (strawberry calm) · 2023-03-16T03:08:52.618Z · comments (26)
Book review: "Feeling Great" by David Burns
Steven Byrnes (steve2152) · 2021-06-09T13:17:59.411Z · comments (12)
Predictions for shard theory mechanistic interpretability results
TurnTrout · 2023-03-01T05:16:48.043Z · comments (10)
Improving on the Karma System
Raelifin · 2021-11-14T18:01:30.049Z · comments (36)
[link] Sam Altman: "Planning for AGI and beyond"
LawrenceC (LawChan) · 2023-02-24T20:28:00.430Z · comments (54)
How likely is deceptive alignment?
evhub · 2022-08-30T19:34:25.997Z · comments (28)
Cultivating And Destroying Agency
hath · 2022-06-30T03:59:27.239Z · comments (11)
Anthropic Observations
Zvi · 2023-07-25T12:50:03.178Z · comments (1)
Retrospective: Lessons from the Failed Alignment Startup AISafety.com
Søren Elverlin (soren-elverlin-1) · 2023-05-12T18:07:20.857Z · comments (9)
History's Biggest Natural Experiment
jimrandomh · 2020-03-24T02:56:30.070Z · comments (7)
[link] Why did we wait so long for the threshing machine?
jasoncrawford · 2021-06-29T19:55:38.883Z · comments (20)
I Don’t Know How To Count That Low
Elizabeth (pktechgirl) · 2021-10-22T22:00:02.708Z · comments (10)
Ukraine Post #12
Zvi · 2022-09-22T14:40:03.753Z · comments (3)
What can the principal-agent literature tell us about AI risk?
apc (alexis-carlier) · 2020-02-08T21:28:09.800Z · comments (29)
[link] Direct effects matter!
Aaron Bergman (aaronb50) · 2021-03-14T04:33:11.493Z · comments (28)
PSA: The community is in Berkeley/Oakland, not "the Bay Area"
maia · 2023-09-11T15:59:47.132Z · comments (7)
I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (23)
[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)
But is it really in Rome? An investigation of the ROME model editing technique
jacquesthibs (jacques-thibodeau) · 2022-12-30T02:40:36.713Z · comments (2)
Bayeswatch 10: Spyware
lsusr · 2021-09-29T07:01:25.529Z · comments (7)
Apply for MATS Winter 2023-24!
utilistrutil · 2023-10-21T02:27:34.350Z · comments (6)
Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)
A mostly critical review of infra-Bayesianism
David Matolcsi (matolcsid) · 2023-02-28T18:37:58.448Z · comments (9)
Deception Chess: Game #1
Zane · 2023-11-03T21:13:55.777Z · comments (21)
Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)
Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (7)
[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (5)
← previous page (newer posts) · next page (older posts) →