LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (19)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (5)

[link] The machine has no mouth and it must scream
zef (uzpg) · 2025-03-08T16:40:46.755Z · comments (1)

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (10)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

The principle of genomic liberty
TsviBT · 2025-03-19T14:27:57.175Z · comments (48)

100+ concrete projects and open problems in evals
Marius Hobbhahn (marius-hobbhahn) · 2025-03-22T15:21:40.970Z · comments (1)

Introducing 11 New AI Safety Organizations - Catalyze's Winter 24/25 London Incubation Program Cohort
Alexandra Bos (AlexandraB) · 2025-03-10T19:26:11.017Z · comments (0)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (2)

[link] Phoenix Rising
Metacelsus · 2025-03-09T11:53:52.618Z · comments (7)

Selective modularity: a research agenda
cloud · 2025-03-24T04:12:44.822Z · comments (2)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (8)

Going Nova
Zvi · 2025-03-19T13:30:01.293Z · comments (14)

Will compute bottlenecks prevent a software intelligence explosion?
Tom Davidson (tom-davidson-1) · 2025-04-04T17:41:37.088Z · comments (2)

Book Review: Affective Neuroscience
sarahconstantin · 2025-03-10T06:50:04.602Z · comments (8)

Feedback loops for exercise (VO2Max)
Elizabeth (pktechgirl) · 2025-03-18T00:10:06.827Z · comments (9)

Apply to MATS 8.0!
Ryan Kidd (ryankidd44) · 2025-03-20T02:17:58.018Z · comments (2)

[link] Sentinel's Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.
NunoSempere (Radamantis) · 2025-03-17T19:34:01.850Z · comments (3)

[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)

[link] DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (6)

Socially Graceful Degradation
Screwtape · 2025-03-20T04:03:41.213Z · comments (9)

AI CoT Reasoning Is Often Unfaithful
Zvi · 2025-04-04T14:50:05.538Z · comments (4)

Renormalization Roadmap
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:34:16.352Z · comments (3)

The Manus Marketing Madness
Zvi · 2025-03-10T20:10:07.845Z · comments (0)

Housing Roundup #11
Zvi · 2025-04-01T16:30:03.694Z · comments (1)

LLM AGI will have memory, and memory changes alignment
Seth Herd · 2025-04-04T14:59:13.070Z · comments (5)

Solving willpower seems easier than solving aging
Yair Halberstadt (yair-halberstadt) · 2025-03-23T15:25:40.861Z · comments (28)

My "infohazards small working group" Signal Chat may have encountered minor leaks
Linch · 2025-04-02T01:03:05.311Z · comments (0)

Reframing AI Safety as a Neverending Institutional Challenge
scasper · 2025-03-23T00:13:48.614Z · comments (12)

Gemini 2.5 is the New SoTA
Zvi · 2025-03-28T14:20:03.176Z · comments (1)

HPMOR Anniversary Parties: Coordination, Resources, and Discussion
Screwtape · 2025-03-11T01:30:41.177Z · comments (6)

Don't over-update on FrontierMath results
David Matolcsi (matolcsid) · 2025-03-11T20:44:04.459Z · comments (5)

On MAIM and Superintelligence Strategy
Zvi · 2025-03-14T12:30:07.451Z · comments (2)

AI #110: Of Course You Know…
Zvi · 2025-04-03T13:10:05.674Z · comments (8)

[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (25)

FrontierMath Score of o3-mini Much Lower Than Claimed
YafahEdelman (yafah-edelman-1) · 2025-03-17T22:41:06.527Z · comments (7)

Introducing BenchBench: An Industry Standard Benchmark for AI Strength
Jozdien · 2025-04-02T02:11:41.555Z · comments (0)

Against Yudkowsky's evolution analogy for AI x-risk [unfinished]
Fiora Sunshine (Fiora from Rosebloom) · 2025-03-18T01:41:06.453Z · comments (18)

The vision of Bill Thurston
TsviBT · 2025-03-28T11:45:14.297Z · comments (34)

Consider showering
bohaska (Bohaska) · 2025-04-01T23:54:26.714Z · comments (15)

Prioritizing threats for AI control
ryan_greenblatt · 2025-03-19T17:09:45.044Z · comments (2)

23andMe potentially for sale for <$50M
lemonhope (lcmgcd) · 2025-03-25T04:34:28.388Z · comments (2)

AI "Deep Research" Tools Reviewed
sarahconstantin · 2025-03-24T18:40:03.864Z · comments (5)

[link] Habermas Machine
NicholasKees (nick_kees) · 2025-03-13T18:16:50.453Z · comments (7)

We’re not prepared for an AI market crash
Remmelt (remmelt-ellen) · 2025-04-01T04:33:55.040Z · comments (11)

AI #107: The Misplaced Hype Machine
Zvi · 2025-03-13T14:40:05.318Z · comments (10)

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Czynski (JacobKopczynski) · 2025-03-29T02:51:29.786Z · comments (36)

Equations Mean Things
abstractapplic · 2025-03-19T08:16:35.312Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on Are there any (semi-)detailed future scenarios where we win?

https://neuromorph365105.substack.com/p/the-ai-academy

arjun-panickssery on American College Admissions Doesn't Need to Be So Competitive

You're phrasing this as though it's rebutting some remark I made; if so, I'm not sure what remark that is. I know that admissions offices are admitting students according to an intentional system.

mis-understandings on American College Admissions Doesn't Need to Be So Competitive

University cohorts are basically setup to maximally benefit the people who do get admitted, not to admit the most qualified. For this purpose universities would rather have students that make the school more rewarding for other students, and not the smartest possible students. This is combined with a general tendency to do prestigous/donor wanted things. And donors want to have gone to a college that is hard to get into (even though they did not like applying). The challenge of application (and with that admit rates over yield rates) is a signal.

mis-understandings on American College Admissions Doesn't Need to Be So Competitive

I think you might fundamentally misunderstand the purpose of admission systems. To be frank, admissions is set up to benefit the university and the university alone. If getting good test scores was the bottleneck, you would see shifts in strategic behaviour until the test became mostly meaningless. For instance, you can freely retake the SAT, so if you just selected based on that people would just retake till they got a good result.

The university has strong preferences about the distribution of students in classes. They have decided that they want different things from their applicants than "just" being good at tests.

They get this exactly through account race-based affirmative action, athletic recruitment, “Dean’s Interest List”-type tracking systems for children of donors or notable persons, and legacy preference for children of alumni, and a bunch of ill articulated selection actions in admissions offices and in other various places.

stable-marriage system would require a national system, which would require universities as distinct and competing organizations (mostly for prestige) to coordinate for the benefit of students. They obviously should do things wiht that general description, but they tend not to.

simcha on Simcha's Shortform

I miss the days where when someone asked me for my political opinion I could say something interesting, maybe even something theoretical or abstract.
Nowadays my opinions are all "tariffs bad" "free trade good" "due process good"
I'm involuntarily being NPC'd.

Things that used to be obvious as fact are now 'opinions' that must therefore be Debated, leaving true opinions in the realm of farfetched theory best left to quiet personal conversations - certainly not to real political commentary.

nick_tarleton on Well-foundedness as an organizing principle of healthy minds and societies

Not sure what Richard would say, but off the cuff, I'd distinguish 'imparting information that happens to induce guilt' from 'guilting', based on intent to cooperatively inform vs. psychologically attack.

My read of the post is that some degree of "being virtuously willing to contend with guilt as a fair emergent consequence of hearing carefully considered and selected information" is required for being well-founded or part of being a well-founded system (receive criticism without generating internal conflict, etc).

ryan_greenblatt on DeepMind: An Approach to Technical AGI Safety and Security

A "no" to either would mean this work falls under milling behavior, and will not meaningfully contribute toward keeping humanity safe from DeepMind's own actions.

I think it's probably possible greatly improve safety given a moderate budget for safety and not nearly enough buy in for (1) and (2). (At least not enough buy-in prior to a large incident which threatens to be very costly for the organization.)

Overall, I think high quality thinking about AI safety seems quite useful even if this level of buy-in is unlikely.

(I don't think this report should update us much about having buy-in needed for (1)/(2), but the fact that it could be published at all in it's current form is still encouraging.)

rba on How Gay is the Vatican?

A sufficiently high band in the CCP could work.

zach-stein-perlman on DeepMind: An Approach to Technical AGI Safety and Security

I don't know, maybe nothing. (I just meant that on current margins, maybe the quality of the safety team's plans isn't super important.)

sohaib-imran on Among Us: A Sandbox for Agentic Deception

Cool work. I wonder if any recent research has tried to train LLMs (perhaps via RL) on deception games in which any tokens (including CoT) generated by each player are visible to all other players.

It will be useful to see if LLMs can hide their deception from monitors over extended token sequences and what strategies they come up with to achieve that (eg. steganography).