LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (136)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (45)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (45)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (65)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (18)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (79)

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (28)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (104)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (47)

Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (51)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (43)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes (john-hughes) · 2025-04-08T17:32:55.315Z · comments (17)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (38)

OpenAI #12: Battle of the Board Redux
Zvi · 2025-03-31T15:50:02.156Z · comments (1)

The Pando Problem: Rethinking AI Individuality
Jan_Kulveit · 2025-03-28T21:03:28.374Z · comments (13)

AI-enabled coups: a small group could use AI to seize power
Tom Davidson (tom-davidson-1) · 2025-04-16T16:51:29.561Z · comments (16)

Learned pain as a leading cause of chronic pain
SoerenMind · 2025-04-09T11:57:58.523Z · comments (13)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

Do models say what they learn?
Andy Arditi (andy-arditi) · 2025-03-22T15:19:18.800Z · comments (12)

New Cause Area Proposal
CallumMcDougall (TheMcDouglas) · 2025-04-01T07:12:34.360Z · comments (4)

Downstream applications as validation of interpretability progress
Sam Marks (samuel-marks) · 2025-03-31T01:35:02.722Z · comments (1)

[link] Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith (lsgos) · 2025-03-26T19:07:48.710Z · comments (15)

[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (5)

AI 2027: Responses
Zvi · 2025-04-08T12:50:02.197Z · comments (3)

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)

How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (41)

Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-18T12:27:35.795Z · comments (2)

Three Months In, Evaluating Three Rationalist Cases for Trump
Arjun Panickssery (arjun-panickssery) · 2025-04-18T08:27:27.257Z · comments (13)

How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)

[link] Towards a scale-free theory of intelligent agency
Richard_Ngo (ricraz) · 2025-03-21T01:39:42.251Z · comments (22)

[link] Elite Coordination via the Consensus of Power
Richard_Ngo (ricraz) · 2025-03-19T06:56:44.825Z · comments (15)

How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (10)

How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)

A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)

Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (4)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)

Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)

You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-03T17:01:06.004Z · comments (1)

next page (older posts) →

Archive

Recent comments

romeostevensit on Three Months In, Evaluating Three Rationalist Cases for Trump

I think the major impacts that matter are on war, pandemic risk, and x-risk. I rarely see anyone try to figure those out, perhaps the sign is too uncertain due to complexity.

jkaufman on Risers for Foot Percussion

I did see your comment on FB! I'm still thinking about what I want to try next. I'm worried that silicone with your method would tear, though.

hpcfung on Rationalist Should Win. Not Dying with Dignity and Funding WBE.

I'm also interested, have you made any progress since your comment?

lc on Three Months In, Evaluating Three Rationalist Cases for Trump

The doubling down is delusional but I think you're simplifying the failure of projection a bit. The inability of markets and forecasters to predict Trump's second term is quite interesting. A lot of different models of politics failed.

gjm on o3 Will Use Its Tools For You

Pedantic note: there are many instances of "syncopathy" that I am fairly sure should be "sycophancy".

(It's an understandable mistake -- "syncopathy" is composed of familiar components, which could plausibly be put together to mean something like "the disease of agreeing too much" which is, at least in the context of AI, not far off what sycophancy in fact means. Whereas if you can parse "sycophancy" at all you might work out that it means "fig-showing" which obviously has nothing to do with anything. So far as I can tell, no one actually knows how "fig-showing" came to be the term for servile flattery.)

michaeldickens on Planning for Extreme AI Risks

I think the right way to self-destruct isn't to shut down entirely. It's to spend all your remaining assets on safety (whether that be lobbying for regulations, or research, or whatever). This would greatly increase the total amount of money spent on safety efforts so it might help quite a lot.

I do believe shutting down does have a decent chance, although not a comfortingly large one, of scaring government and/or other AI companies into taking the risks seriously.

anthonyc on What Makes an AI Startup "Net Positive" for Safety?

I won't comment on your specific startup, but I wonder in general how an AI Safety startup becomes a successful business. What's the business model? Who is the target customer? Why do they buy? Unless the goal is to get acquired by one of the big labs, in which case, sure, but again, why or when do they buy, and at what price? Especially since they already don't seem to be putting much effort into solving the problem themselves despite having better tools and more money to do so than any new entrant startup.

anthonyc on Three Months In, Evaluating Three Rationalist Cases for Trump

I really, really hope at some point the Democrats will acknowledge the reason they lost is that they failed to persuade the median voter of their ideas, and/or adopt ideas that appeal to said voters. At least among those I interact with, there seems to be a denial of the idea that this is how you win elections, which is a prerequisite for governing.

saidachmiz on A Dissent on Honesty

The hard cases are much more interesting. What about lying to my landlord about renting a room on airbnb? What about saying your class will make people millionaires for the low low price of $1,000 (hey, it could happen)? What about hiding the rats from the health inspector?

None of these seem like hard cases to me. Lying is wrong (and pretty obviously so) in all three of these cases.

anthonyc on Why Does It Feel Like Something? An Evolutionary Path to Subjectivity

That seems very possible to me, and if and when we can show whether something like that is the case, I do think it would represent significant progress. If nothing else, it would help tell us what the thing we need to be examining actually is, in a way we don't currently have an easy way to specify.