LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

[question] What do coherence arguments actually prove about agentic behavior?
[deleted] · 2024-06-01T09:37:28.451Z · answers+comments (39)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)

Awakening
lsusr · 2024-05-30T07:03:00.821Z · comments (79)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (14)

[link] Investigating the Chart of the Century: Why is food so expensive?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-16T13:21:23.596Z · comments (26)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (22)

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner (ejenner) · 2024-06-04T15:50:47.475Z · comments (14)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (48)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

AI catastrophes and rogue deployments
Buck · 2024-06-03T17:04:51.206Z · comments (16)

[link] Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
Fabien Roger (Fabien) · 2025-03-11T11:52:38.994Z · comments (23)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (9)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide (anish-mudide) · 2024-07-22T18:45:53.502Z · comments (20)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (32)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

Impact, agency, and taste
benkuhn · 2025-04-19T21:10:06.960Z · comments (3)

2024 in AI predictions
jessicata (jessica.liu.taylor) · 2025-01-01T20:29:49.132Z · comments (3)

Talent Needs of Technical AI Safety Teams
yams (william-brewer) · 2024-05-24T00:36:40.486Z · comments (65)

The Plan - 2024 Update
johnswentworth · 2024-12-31T13:29:53.888Z · comments (28)

Learned pain as a leading cause of chronic pain
SoerenMind · 2025-04-09T11:57:58.523Z · comments (13)

How I've run major projects
benkuhn · 2025-03-16T18:40:04.223Z · comments (10)

The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)

[link] Research directions Open Phil wants to fund in technical AI safety
jake_mendel · 2025-02-08T01:40:00.968Z · comments (21)

AI 2027 is a Bet Against Amdahl's Law
snewman · 2025-04-21T03:09:40.751Z · comments (46)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

Anthropic's Certificate of Incorporation
Zach Stein-Perlman · 2024-06-12T13:00:30.806Z · comments (7)

Do models say what they learn?
Andy Arditi (andy-arditi) · 2025-03-22T15:19:18.800Z · comments (12)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (21)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red
Julian Bradshaw · 2025-04-21T03:52:34.759Z · comments (17)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

Why I'm Moving from Mechanistic to Prosaic Interpretability
Daniel Tan (dtch1997) · 2024-12-30T06:35:43.417Z · comments (34)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (5)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (42)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (26)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

You should go to ML conferences
Jan_Kulveit · 2024-07-24T11:47:52.214Z · comments (13)

Downstream applications as validation of interpretability progress
Sam Marks (samuel-marks) · 2025-03-31T01:35:02.722Z · comments (3)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

The Game Board has been Flipped: Now is a good time to rethink what you’re doing
LintzA (alex-lintz) · 2025-01-28T23:36:18.106Z · comments (30)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (3)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (21)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on LordWesquire's Shortform

The usual naive pro-trans argument goes like this: "some people are intersex, therefore sex is arbitrary, therefore it is okay for people to identify as whatever they want".

But if we take sex and gender as two dimensions, then it's like: "on the sex dimension, most people are at one end, but a few people are in between... and on the gender dimension, traditionally it matches the sex, but some people (importantly: not necessarily those who are in between on the sex dimension) identify as the opposite".

I guess this would be considered an anti-trans argument these days, because it allows people to express positions such as "I am attracted to the female sex, regardless of gender", or "I think bathrooms should be separated by sex (without having a strong opinion on intersex people)".

huera on Open Thread Spring 2025

The most recent comments show up in 'Recent Discussion' on the main page, regardless of article age. But, of course, though some people may see them, you are still more likely to get engagement if you comment on recent articles.
Don't know about wikitags comments.

viliam on xpostah's Shortform

With coin, the options are "head" and "tails", so "head" moves you in one direction.

With LLMs, the options are "worse than expected", "just as expected", "better than expected", so "just as expected" does not have to move you in a specific direction.

kenku on This prompt (sometimes) makes ChatGPT think about terrorist organisations

Polish responses to Polish users are strong evidence against the “responses to other people's requests” hypothesis.

steve-m-2 on This prompt (sometimes) makes ChatGPT think about terrorist organisations

Interesting weak hypothesis but it makes me wonder why it keeps swapping coding <-> terrorism responses?

Maybe certain responses get 'bucketed' or 'flagged' together into the same high risk category and reviewed before being returned, and they're getting accidentally swapped at the returning stage?

That doesn't explain the public holiday example though.

brambleboy on MichaelDickens's Shortform

I still think it's weird that many AI safety advocates will criticize labs for putting humanity at risk while simultaneously being paid users of their products and writing reviews of their capabilities. Like, I get it, we think AI is great as long as it's safe, we're not anti-tech, etc.... but is "don't give money to the company that's doing horrible things" such a bad principle?

"I find Lockheed Martin's continued production of cluster munitions to be absolutely abhorrent. Anyway, I just unboxed their latest M270 rocket system and I have to say I'm quite impressed..."

jaan on Jaan Tallinn's 2024 Philanthropy Overview

no plan, my timelines are quite uncertain (and even if i knew for sure that money will stop mattering in 2 years, it’s not obvious at all what to spend it on).

denkenberger on Jaan Tallinn's 2024 Philanthropy Overview

Thanks for all your philanthropy! Do you have a giving end game plan in light of your timelines?

catherine-caldwell-harris on Theme and Variations on the Prisoner's Dilemma

HI Everyone. I'm a psychology professor and have been thinking about the prisoners' dilemma since I read Axelrod's book, The Evolution of Cooperation, in the 1990s. Axelrod appeared to have the last word on strategies back then (cooperate on the first move, then tit-for-tat plus occasional forgiveness to avoid getting off track if a partner makes a mistake). I do not know too much about variations, although the TV show Friend or Foe made some waves 20+ years ago. In the few excerpts I've seen of that show, the naked manipulation made my skin crawl. I just checked the wiki for the show and it turns out there have been at least 3 other games shows that involve a cooperate vs. defect component. This will be my first time attend a LessWrong event, and I'm interested in the variations Ben Thompson has developed and learning from all of you.

shankar-sivarajan on Accountability Sinks

Formal processes are mostly beneficial

It feels like you wrote this line first, and nothing you wrote above it was going to change that conclusion. You recite a litany of horrors, grand and mundane, and then ignore them all.

and they’re not going anywhere.

This may be true, but that's no reason to embrace cope the equivalent of "death is what gives life meaning."