LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Export Surplusses
lsusr · 2025-02-24T05:53:23.422Z · comments (21)

[link] New Report: Multi-Agent Risks from Advanced AI
Lewis Hammond (lewis-hammond-1) · 2025-02-23T00:32:29.534Z · comments (0)

Why would AI companies use human-level AI to do alignment research?
MichaelDickens · 2025-04-25T19:12:56.202Z · comments (8)

Whether governments will control AGI is important and neglected
Seth Herd · 2025-03-14T09:48:34.062Z · comments (2)

The present perfect tense is ruining your life
PatrickDFarley · 2025-01-27T16:14:48.843Z · comments (14)

Bounded AI might be viable
Mateusz Bagiński (mateusz-baginski) · 2025-03-06T12:55:46.224Z · comments (0)

I grade every NBA basketball game I watch based on enjoyability
proshowersinger · 2025-03-12T21:46:26.791Z · comments (2)

Medical Roundup #4
Zvi · 2025-02-18T13:40:06.574Z · comments (3)

Explaining the Joke: Pausing is The Way
WillPetillo · 2025-04-04T09:04:38.847Z · comments (2)

Come join Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-15T22:10:02.166Z · comments (0)

MATS Spring 2024 Extension Retrospective
HenningB (HenningBlue) · 2025-02-12T22:43:58.193Z · comments (1)

Logical Correlation
niplav · 2025-02-10T23:29:10.518Z · comments (6)

The Last Light
Bridgett Kay (bridgett-kay) · 2025-04-14T15:41:02.745Z · comments (2)

The non-tribal tribes
PatrickDFarley · 2025-02-26T17:22:59.949Z · comments (4)

Saving Zest
jefftk (jkaufman) · 2025-03-02T12:00:41.732Z · comments (1)

How to mitigate sandbagging
Teun van der Weij (teun-van-der-weij) · 2025-03-23T17:19:07.452Z · comments (0)

Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)

[link] Currency Collapse
prue (prue0) · 2025-04-11T03:48:01.469Z · comments (3)

Training-time schemers vs behavioral schemers
Alex Mallen (alex-mallen) · 2025-04-24T19:07:55.256Z · comments (0)

What is a circuit? [in interpretability]
Yudhister Kumar (randomwalks) · 2025-02-14T04:40:42.978Z · comments (1)

[link] Notes on the Presidential Election of 1836
Arjun Panickssery (arjun-panickssery) · 2025-02-13T23:40:23.224Z · comments (0)

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability
DanielFilan · 2025-03-28T18:40:01.856Z · comments (0)

Two flaws in the Machiavelli Benchmark
TheManxLoiner · 2025-02-12T19:34:35.241Z · comments (0)

Interesting ACX 2024 Book Review Entries
jenn (pixx) · 2025-04-20T18:10:04.973Z · comments (1)

[question] LessWrong merch?
Brendan Long (korin43) · 2025-04-03T21:51:47.190Z · answers+comments (2)

[link] Forging A New AGI Social Contract
Deric Cheng (deric-cheng) · 2025-04-10T13:41:11.817Z · comments (3)

A model of the final phase: the current frontier AIs as de facto CEOs of their own companies
Mitchell_Porter · 2025-03-08T22:15:35.260Z · comments (2)

A Bunch of Matryoshka SAEs
chanind · 2025-04-04T14:53:56.805Z · comments (0)

Review: The Lathe of Heaven
dr_s · 2025-01-31T08:10:58.673Z · comments (0)

Prodromes and Biomarkers in Chronic Disease
sarahconstantin · 2025-04-16T21:30:02.978Z · comments (2)

Spending on Ourselves
jefftk (jkaufman) · 2025-04-20T18:40:07.988Z · comments (0)

The Internal Model Principle: A Straightforward Explanation
Alfred Harwood · 2025-04-12T10:58:51.479Z · comments (1)

Doing principle-of-charity better
Sniffnoy · 2025-03-27T05:19:52.195Z · comments (1)

[link] Published report: Pathways to short TAI timelines
Zershaaneh Qureshi (zershaaneh-qureshi) · 2025-02-20T22:10:12.276Z · comments (0)

The Leapfrogging Terminus and the Fuzzy Cut
Jim Pivarski (jim-pivarski) · 2025-03-31T04:08:24.023Z · comments (6)

Navigation by Moonlight
Jacob Falkovich (Jacobian) · 2025-04-07T15:32:17.353Z · comments (39)

Not All Beliefs Are Created Equal: Diagnosing Toxic Ideologies
Big_friendly_kiwi (marginality4life) · 2025-04-21T03:18:48.677Z · comments (7)

Ruling Out Lookup Tables
Alfred Harwood · 2025-02-04T10:39:34.899Z · comments (11)

Notes on handling non-concentrated failures with AI control: high level methods and different regimes
ryan_greenblatt · 2025-03-24T01:00:38.222Z · comments (3)

[link] The Peeperi (unfinished) - By Katja Grace
Nathan Young · 2025-02-17T19:33:29.894Z · comments (0)

Understanding Trust: Overview Presentations
abramdemski · 2025-04-16T18:08:31.064Z · comments (0)

[link] Why People Commit White Collar Fraud (Ozy linkpost)
sapphire (deluks917) · 2025-03-03T19:33:15.609Z · comments (1)

Grok3 On Kant On AI Slavery
JenniferRM · 2025-04-01T04:10:48.093Z · comments (3)

[question] Does the AI control agenda broadly rely on no FOOM being possible?
Noosphere89 (sharmake-farah) · 2025-03-29T19:38:23.971Z · answers+comments (3)

so you have a chronic health issue
agencypilled · 2025-01-26T19:00:29.972Z · comments (9)

[link] AI Tools for Existential Security
Lizka · 2025-03-14T18:38:06.110Z · comments (4)

Opportunity Space: Renormalization for AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:55:52.155Z · comments (0)

Seven sources of goals in LLM agents
Seth Herd · 2025-02-08T21:54:20.186Z · comments (3)

[question] Examples of self-fulfilling prophecies in AI alignment?
Chipmonk · 2025-03-03T02:45:51.619Z · answers+comments (6)

Introduction to Representing Sentences as Logical Statements
Towards_Keeperhood (Simon Skade) · 2025-04-05T20:35:31.422Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on faul_sname's Shortform

On the plus side, it should be pretty easy to collect a lot of negative examples now of 'code that solves the problem, but in a gross way'. Having a large dataset of such is the first step to using these negative examples to train models not to do this.

nathan-helm-burger on faul_sname's Shortform

Who watches the watchers? Who grades the graders? If the RL graders are upvoting slop, seems like we need to one level more meta and upgrade the RL graders. This seems like a straightforward engineering problem, and I suspect the negative outcomes we've been seeing recently aren't so much due to the inherent intractability of doing this well, but due to the companies racing and cutting corners on quality control.

Contrast with something like: Problem of Human Limitations: how do we get the model to do things so hard no human can do them? How do we rate the quality of their outputs when no human is qualified to judge them?

Problem of Optimization for Subversion: if we have directly misaligned goals like "lie to me in ways that make me happy" and also "never appear to be lying to me, I hate thinking I'm being lied to" then we get a sneaky sycophant. Our reward process actively selects for this problem, straightforwardly improving the reward process would make the problem worse rather than better.

bunthut on Why Have Sentence Lengths Decreased?

>Sentence lengths have declined.

Data: I looked for similar data on sentence lengths in german, and the first result I found covering a similar timeframe was wikipedia referencing Kurt Möslein: Einige Entwicklungstendenzen in der Syntax der wissenschaftlich-technischen Literatur seit dem Ende des 18. Jahrhunderts. (1974), which does not find the same trend:

Year	wps
1770	24,50
1800	25,54
1850	32,00
1900	23,58
1920	22,72
1940	19,60
1960	19,90

This data on scientific writing starts lower than any of your english examples from that time, and increases initially, but arrives in the same place (insofar as wps are comparably across languages, which I think is fine for english and german).

1a3orn on "The Urgency of Interpretability" (Dario Amodei)

has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter)

I have the courage of my convictions; you ignore the opinions of others; he takes reckless unilateral action.

michaeldickens on "The Urgency of Interpretability" (Dario Amodei)

prosaic alignment is clearly not scalable to the types of systems they are actively planning to build

Why do you believe this?

(FWIW I think it's foolish that all (?) frontier companies are all-in on prosaic alignment, but I am not convinced that it "clearly" won't work.)

michaeldickens on "The Urgency of Interpretability" (Dario Amodei)

Just my personal opinion:

My sense is that Anthropic is somewhat more safety-focused than the other frontier AI companies, in that most of the companies only care maybe 10% as much about safety as they should, and Anthropic cares 15% as much as it should.

What numbers would you give to these labs?

My median guess is that if an average company is -100 per dollar then Anthropic is -75. I believe Anthropic is making things worse on net by pushing more competition, but an Anthropic-controlled ASI is a bit less likely to kill everyone than an ASI controlled by anyone else.

But I also have significant (< 50%) probability on Anthropic being the worst company in terms of actual consequences because its larger-but-still-insufficient focus on safety may create a false sense of security that ends up preventing good regulations from being implemented.

You may also be interested in SaferAI's risk management ratings.

I used to think Anthropic was [...] quite in sync with the AI x-risk community.

I think Anthropic leadership respects the x-risk community in their words but not in their actions. Anthropic says safety is important, and invests a decent amount into safety research; but also opposes coordination, supports arms races, and has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter).

mrcheeze on Recent AI model progress feels mostly like bullshit

But you have to be careful here, since the results heavily depend on details of the harness, as well as on how thoroughly they have memorized walkthroughs of the game.

wei-dai on Our Reality: A Simulation Run by a Paperclip Maximizer

But as you suggested in the post, the apparently vast amount of suffering isn't necessarily real? "most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities"

(However I take the point that doing such simulations can be risky or problematic, e g. if one's current ideas about consciousness is wrong, or if doing philosophy correctly requires having experienced real suffering.)

mis-understandings on O O's Shortform

Global GDP growth over the same period was around 3 percent.

The question is how did equities outperform gdp growth.

I think that this has to do with changes in asset prices in general.

alexander-howell on Why Should I Assume CCP AGI is Worse Than USG AGI?

Yes, my mistake. I meant Trump votes > Harris votes and forgot about 3rd parties. On the other hand 49.8% vs 50% + 1 feels semi trivial when compared to say the UK where Labour received 33.7% of the vote.