LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (205)

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (332)

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (137)

A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis · 2025-03-05T16:41:37.639Z · comments (155)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (45)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley (jan-betley) · 2025-02-25T17:39:31.059Z · comments (91)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (45)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (25)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (65)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (79)

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (44)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (28)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (42)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (30)

Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (72)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (104)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (41)

[link] Trojan Sky
Richard_Ngo (ricraz) · 2025-03-11T03:14:00.681Z · comments (39)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (48)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (9)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (26)

Why Should I Assume CCP AGI is Worse Than USG AGI?
Tomás B. (Bjartur Tómas) · 2025-04-19T14:47:52.167Z · comments (63)

Accountability Sinks
Martin Sustrik (sustrik) · 2025-04-22T05:00:02.617Z · comments (10)

[link] Power Lies Trembling: a three-book review
Richard_Ngo (ricraz) · 2025-02-22T22:57:59.720Z · comments (24)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (28)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

[link] OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-11T02:17:21.026Z · comments (25)

Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (17)

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (61)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

So how well is Claude playing Pokémon?
Julian Bradshaw · 2025-03-07T05:54:45.357Z · comments (74)

[link] On the Rationality of Deterring ASI
Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · comments (34)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (48)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (52)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

[question] Have LLMs Generated Novel Insights?
abramdemski · 2025-02-23T18:22:12.763Z · answers+comments (36)

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (27)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (3)

Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (8)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (38)

next page (older posts) →

Archive

Recent comments

cole-wyeth on o3 Is a Lying Liar

This sounds like both an alignment and a capabilities problem.

AI 2027-style takeoffs do not look plausible when you can't extract reliable work from models.

leogao on Richard Ngo's Shortform

are you saying something like: you can't actually do more of everything except one thing, because you'll never do everything. so there's a lot of variance that comes from exploration that multiplies with your variance from having a suboptimal zero point. so in practice your $k$ needs to be very close to optimal. so my thing is true but not useful in practice.

i feel people do empirically shift $k$ quite a lot throughout life and it does seem to change how effectively they learn. if you're mildly depressed your $k$ is slightly too low and you learn a little bit slower. if you're mildly manic your $k$ is too high and you also learn a little bit slower. therapy, medications, and meditations shift $k$ mildly.

daniel-kokotajlo on o3 Is a Lying Liar

“In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings.”

Yeah tbh these misalignments are more blatant/visible and worse than I expected for 2025. I think they'll be hastily-patched one way or another by this time next year probably.

daniel-kokotajlo on AI 2027: What Superintelligence Looks Like

InverseGaussian[7.97413, 1.315]

Can you elaborate on where these numbers coming from? Eli's screenshot for the inversegaussian had parameters of 5.3743 and 18.9427.

arjun-panickssery on To Understand History, Keep Former Population Distributions In Mind

Regarding the Russians and East Slavs more broadly, Anatoly Karlin has some napkin math that at the very least shows the huge toll that the world wars had on their populations, which barely grow or s:

(8a) Russia just within its current borders, assuming otherwise analogous fertility and migration trends, would have had 261.8 million people by 2017 without the triple demographic disasters of Bolshevism, WW2, and the 1990s – that’s double its actual population of 146 million.
Source: Демографические итоги послереволюционного столетия & Демографические катастрофы ХХ века by Anatoly Vishnevsky
(8b) According to my very rough calculations, based on various sources, the population change for each of the following in their current borders between 1913/14 and 1945/46 was about as follows:
Russia – 91M/97M
Ukraine – 35M/34M
Belarus – 7.5M/7.7M
Assuming a threefold expansion in all of these populations, we could have been looking to a Russian Empire or Republic with a further ~120M fully Russified Belorussians and largely Russified Ukrainians, for a total Slavic population of almost 400M.
That’s twice bigger than the number of White Americans today, the most populous single European ethnicity, and almost as much as all of today’s Western Europe.
(8c) Total population of a hypothetical Russian Empire that also retained Central Asia and the Caucasus, and that hadn’t been bled white by commies, Nazis, and Westernizers during the course of the 20th century, would likely have been not that far off from Dmitry Mendeleev’s 1906 projection of 594 million for 2000.

gunnar_zarncke on Gunnar_Zarncke's Shortform

Can somebody get me in touch with somebody from the Center for AI Safety (safe.ai)? Their page for applying for compute resources seems broken. I have used their contact form to report the issue on April 7th, but received no reply.

This is how the application page looks like at least since then (linked from their Compute Cluster page):

As you can see, there is no form field to enter and only a lone "Absenden" button, which is German and means "submit" (which is strange because my system and browser are set to English). If I click that button, I get this message:

Looks like this form is empty. Try filling it out before submitting.

My guess is that there is a problem with their Airtable integration.

If you wonder what I'm trying to apply for:

The project Reducing LLM deception at scale with self-other overlap fine-tuning [LW · GW] (SOO) I am working with at AE Studio is in urgent need for more compute to run SOO experiments with Mistral Large 2 (or even larger).
The aintelope project [LW · GW] (sorry, not many updates recently) needs compute for running more evaluations of our benchmark From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks and we wanted to apply at CAIS too (having run out of funding, more on that later).

brendan-long on Why Should I Assume CCP AGI is Worse Than USG AGI?

Consider instead that Trump was elected with over 50% of the popular vote. Perhaps there are more fundamental cultural factors at play than the method used to count ballots.

Winning the popular vote in the current system doesn't tell you what would happen in a different system. This is the same mistake people make when they talk about who would have won if we didn't have an electoral college: If we had a different system, candidates would campaign differently and voters would vote differently.

romeostevensit on Accountability Sinks

Good timing.

Jesus: "I just got done trying to fix this!"

Less jokingly, scapegoating, accountability sinks, liability laundering, declining trust, kakonomics, form an interesting constellation that I feel is under explored for understanding human behavior when part of large systems.

nathan-helm-burger on Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt

Yes, to be clear I'm mostly agreeing with your theory, I just got thrown off by the future extrapolation clashing with my own expectations. I think your theory is a valuable addition to the dialogue around value alignment! Thank you for writing it!

sammyboiz on Why Should I Assume CCP AGI is Worse Than USG AGI?

Speaking to post-labor futures, I feel that CCP AGI would be more likely to redistribute resources in an equitable manner when compared to the US.

Over the last 50 years or so, productivity growth in the US has translated to the ultra-wealthy growing in wealth while the wages for the working class has stagnated. Coupled with oligarchy growing in US, I don't expect the USG to have the interest of the people first and foremost. If USG has AGI, I expect that the trend of rising inequality will continue: billionaires will reap the benefits and the rest of people will be economically powerless... at best surviving on UBI.

As for China, I think that less corporate interests and power-seeking pressures have plagued the CCP. I don't know much about Xi and his administration but I assume that they are less corrupt and more caring about their people. China has their capitalism under control and I believe that are more likely to create a fully automated luxury communism utopia rather than a hyper-capitalist hell. As for lacking American free-speech, I think equitable resource distribution is at least 100x more important.

As long as the US stays staunchly capitalist, I fear they will not be able/willing to redistribute AGI abundance.