LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (245)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (85)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (36)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (40)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (42)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (19)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (153)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (59)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (99)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (45)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (34)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (23)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (25)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (33)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (65)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (155)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (33)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (18)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (6)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (28)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (7)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (15)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (48)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (55)

Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (11)

next page (older posts) →

Archive

Recent comments

raemon on Shutting Down the Lightcone Offices

i.e. the question "what sort of community institutions are good to build?" is a timeless question. Why should we artificially limit our ability to reflect on that sort of thing during the Review, given that we set the Review up in an openended way that allows us to do that on the margin?

raemon on Shutting Down the Lightcone Offices

Fwiw I disagree, I think the Review is deliberately openended.

Yes there's a specific goal of find the top 50 posts, and to identify important timeless intellectual contributions. But, part of the whole point of the review (as I originally envisioned it) is also to help reflect in a more general sense on "what happened on LessWrong and what can we learn from it?".

I think rather than trying to say "no, don't reflect on particular things that don't fit the most central use case of the Review", it seems actively good to me to take advantage of the openended nature of it to think about less central things. We can learn timeless lessons from posts that weren't, themselves, particularly timeless.

steve2152 on Applying traditional economic thinking to AGI: a trilemma

The point I’m trying to make here is a really obvious one. Like, suppose that Bob is a really great, top-percentile employee. But suppose that Bob’s roommate Alice is an obviously better employee than Bob along every possible axis. Clearly, Bob will still be able to get a well-paying job—the existence of Alice doesn’t prevent that, because the local economy can use more than one employee.

peter-berggren on Yoda Timers 3: Speed

My most recent post on LessWrong (https://www.lesswrong.com/posts/yj2hyrcGMwpPooqfZ/a-proposal-for-iterated-interpretability-with-known), which is also my first post proposing a novel avenue for AI alignment research, took me a total of 30 minutes.

zyansheep on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles

I think I agree with Feynman being a straight talker, but I just want to caution on inferences on Feynman from books about Feynman. See:

TLDW: Feynman didn't actually write any of the books that use his name, and his influence over them is pretty tenuous. (e.g. Surely Your Joking was written by a young friend of Feynman's, and the book wasn't written until like at least 10 years after the stories were originally told, allegedly)

parker-conley on No one has the ball on 1500 Russian olympiad winners who've received HPMOR

You've probably thought of this and have reasons for and against it, but maybe some hotels (bedside) and restaurants (on tables) would be willing to take copies too? Seems much less likely that libraries though.

peter-berggren on Bug Hunt 3

Probably for me, the main thing that helped was Yoda Timers. Then again, that was probably just a function of getting to practice it much more than anything else. Next up is probably TAPs.

benquo on Parkinson's Law and the Ideology of Statistics

I agree that even if the book turned out to be entirely accurate we should not assume that this case is representative of the average development project, but we could still learn from it. Many hours from highly trained and well-paid people are allocated to planning and evaluating such projects, which expenditure is ostensibly to ensure quality. Even looking at worst cases helps us see what sort of quality is or is not being ensured.

ryan_greenblatt on How will we update about scheming?

I agree that if an AI is incapable of competently scheming (i.e., alignment faking and sabotaging safety work without being caught), but is capable of massively accelerating safety work, then doing huge amounts of safety work with this AI is very promising.

(I put this aside in this post as I was trying to have a more narrow focus on how we'll update about scheming independent of how easily scheming will be handled and without talking about methods that don't currently exist.)

(The specific directions you mentioned of "fancy behavioral red teaming and interp" may not be that promising, but I think there are a moderate number of relatively empirical bets that look decently promising.)

It seems like the first AIs capable of massively accelerating safety work might also scheme pretty competently (it will depend on the architecture). However, we might be able to compensate with sufficient control measures such that the AI is forced to be very helpful (or is caught). Correspondingly, I'm excited about AI control.

(More generally, rapid takeoff might mean that we have to control AIs that are capable of competent scheming without having already obsoleted prior work.)

I'm reasonably optimistic about bootstrapping if the relevant AI company could afford several years of delay due to misalignment, was generally competent, and considered mitigating risk from scheming to be a top priority. You might be able to get away with less delay (especially if you heavily prep in advance). I don't really expect any of these to hold, at least across all the relevant AI companies and in short timelines.

evhub on Human takeover might be worse than AI takeover

I think it affects both, since alignment difficulty determines both the probability that the AI will have values that cause it to take over, as well as the expected badness of those values conditional on it taking over.