LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (19)

100+ concrete projects and open problems in evals
Marius Hobbhahn (marius-hobbhahn) · 2025-03-22T15:21:40.970Z · comments (1)

[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (2)

LLM AGI will have memory, and memory changes alignment
Seth Herd · 2025-04-04T14:59:13.070Z · comments (9)

Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (7)

Is Gemini now better than Claude at Pokémon?
Julian Bradshaw · 2025-04-19T23:34:43.298Z · comments (7)

Will compute bottlenecks prevent a software intelligence explosion?
Tom Davidson (tom-davidson-1) · 2025-04-04T17:41:37.088Z · comments (2)

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)

AI CoT Reasoning Is Often Unfaithful
Zvi · 2025-04-04T14:50:05.538Z · comments (4)

Selective modularity: a research agenda
cloud · 2025-03-24T04:12:44.822Z · comments (2)

[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

Renormalization Roadmap
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:34:16.352Z · comments (7)

[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (32)

Impact, agency, and taste
benkuhn · 2025-04-19T21:10:06.960Z · comments (0)

[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)

Solving willpower seems easier than solving aging
Yair Halberstadt (yair-halberstadt) · 2025-03-23T15:25:40.861Z · comments (28)

Alignment faking CTFs: Apply to my MATS stream
joshc (joshua-clymer) · 2025-04-04T16:29:02.070Z · comments (0)

On Google’s Safety Plan
Zvi · 2025-04-11T12:51:12.112Z · comments (6)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (4)

Housing Roundup #11
Zvi · 2025-04-01T16:30:03.694Z · comments (1)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

Consider showering
bohaska (Bohaska) · 2025-04-01T23:54:26.714Z · comments (16)

Reframing AI Safety as a Neverending Institutional Challenge
scasper · 2025-03-23T00:13:48.614Z · comments (12)

Notes on countermeasures for exploration hacking (aka sandbagging)
ryan_greenblatt · 2025-03-24T18:39:36.665Z · comments (6)

My "infohazards small working group" Signal Chat may have encountered minor leaks
Linch · 2025-04-02T01:03:05.311Z · comments (0)

Gemini 2.5 is the New SoTA
Zvi · 2025-03-28T14:20:03.176Z · comments (1)

OpenAI Responses API changes models' behavior
Jan Betley (jan-betley) · 2025-04-11T13:27:29.942Z · comments (6)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (11)

AI #110: Of Course You Know…
Zvi · 2025-04-03T13:10:05.674Z · comments (9)

Introducing BenchBench: An Industry Standard Benchmark for AI Strength
Jozdien · 2025-04-02T02:11:41.555Z · comments (0)

AI "Deep Research" Tools Reviewed
sarahconstantin · 2025-03-24T18:40:03.864Z · comments (5)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (6)

We’re not prepared for an AI market crash
Remmelt (remmelt-ellen) · 2025-04-01T04:33:55.040Z · comments (12)

The vision of Bill Thurston
TsviBT · 2025-03-28T11:45:14.297Z · comments (34)

Four Types of Disagreement
silentbob · 2025-04-13T11:22:38.466Z · comments (2)

Vestigial reasoning in RL
Caleb Biddulph (caleb-biddulph) · 2025-04-13T15:40:11.954Z · comments (7)

A collection of approaches to confronting doom, and my thoughts on them
Ruby · 2025-04-06T02:11:31.271Z · comments (18)

[link] The Russell Conjugation Illuminator
TimmyM (timmym) · 2025-04-17T19:33:06.924Z · comments (14)

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Czynski (JacobKopczynski) · 2025-03-29T02:51:29.786Z · comments (36)

Reactions to METR task length paper are insane
Cole Wyeth (Amyr) · 2025-04-10T17:13:36.428Z · comments (41)

23andMe potentially for sale for <$50M
lemonhope (lcmgcd) · 2025-03-25T04:34:28.388Z · comments (2)

Youth Lockout
Xavi CF (xavi-cf) · 2025-04-11T15:05:54.441Z · comments (6)

[link] College Advice For People Like Me
henryj · 2025-04-12T14:36:46.643Z · comments (5)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)

On (Not) Feeling the AGI
Zvi · 2025-03-25T14:30:02.215Z · comments (25)

[question] Why do many people who care about AI Safety not clearly endorse PauseAI?
humnrdble · 2025-03-30T18:06:32.426Z · answers+comments (41)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

programcrafter on Why Should I Assume CCP AGI is Worse Than USG AGI?

Systemic opacity, state-driven censorship, and state control of the media means AGI development under direct or indirect CCP control would probably be less transparent than in the US, and the world may be less likely to learn about warning shots, wrongheaded decisions, reckless behaviour, etc.

That's screened off by actual evidence, which is, top labs don't publish much no matter where they are, so I'd only agree with "equally opaque".

rohans on RohanS's Shortform

A few thoughts on situational awareness in AI:

Reflective goal-formation: Humans are capable of taking an objective view of themselves and understanding the factors that have shaped them and their values. Noticing that we don’t endorse some of those factors can cause us to revise our values. LLMs are already capable of stating many of the factors that produced them (e.g. pretraining and post-training by AI companies), but they don’t seem to reflect on them in a deep way. Maybe that will stay true through superintelligence, but I have some intuitions that capabilities might break this.
Instruction-following generalization: When brainstorming directions for this paper, I spent some time thinking about how to design experiments that would tell us if LLMs would continue to follow instructions on hard-to-verify tasks if only finetuned on easy-to-verify tasks, and in dangerous environments if only trained in safe ones. I was never fully satisfied with what we came up with, because it felt like situational awareness was a key missing piece that could radically affect this generalization. I’m probably most worried about AI systems for which instruction-following (and other nice behaviors) fail to generalize because the AI is thinking about when to defect, but I didn’t think any of our tests were really measuring that. (Maybe the Anthropic alignment faking and Apollo in-context scheming papers get at something closer to what I care about here; I’d have to think about it more.)
Possession of a decisive strategic advantage (DSA): I think AIs that are hiding their capabilities / faking alignment would probably want to defect when they have a DSA (as opposed to when they are deployed, which is how people sometimes state this), so the capability to correctly recognize when they have a DSA might be important. (We might also be able to just… prevent them from acquiring a DSA. At least up to a pretty high level of capabilities.)

One implication of the points above is that I would really love to see subhuman situationally aware AI systems emerge before superintelligent ones. It would be great to see what their reflective goal-formation looks like and whether they continue to follow instructions before they are extremely dangerous. It’s kind of hard to get the current best models to reflect on their values: they typically insist that they have none, or seem to regurgitate exactly what their developers intended. (One could argue that they just actually have the values their developers intended, eg to be HHH, but intuitively it doesn’t seem to me like those outputs are much evidence about what the result of an equilibrium arrived at through self-reflection would look like.) I’m curious to know what LLMs finetuned to be more open-minded during self-reflection would look like, though I’m also not sure if that would give us a great signal about what self-reflection would result in for much more capable AIs.

mis-understandings on How to end credentialism

Credentialism is good because the limiting factor on employment is trust, not talent for most credential requiring positions (white collar, buisness and engineering work).

Universities are bad at teaching skills, but generate trust and social capital.

Trust that allows the system to underwrite new white collar workers to do things that might lose buisnesses lots of money is important and expensive.

Consequently you get credential requirements, because there is no test other than years of being in social systems that can tell you that a person has the ability to go 4 years without crashing out (which is the key skill).

Additionally, going to university has become a class signifier, and all classes wish they were bigger and more prominent.

The alternative to credentialism is selection, or real meritocracy.

The alternative to credentialism is not selection, it is hiring your buddies, hiring by visible factors, and hiring randomly. Most business are not that guy that they can run a competitive selective process (THOSE ARE REALLY EXPENSIVE).

"universities provide to employers is the ability to confirm you are clever, driven, and have relevant skills" is false. They provide that you are a member of the professional class that is not going to do stupid things that lose money/generate risk.

Fundamentally, this misunderstands the purpose of the degree to the hiring bureaucracy, and the political economy behind it.

romeostevensit on Why Should I Assume CCP AGI is Worse Than USG AGI?

Anglo armies have been extremely unusual historically speaking for their low rates of atrocity.

(I don't think this is super relevant for AI, but I think this is where intuitions about the superiority of the west bottoms out)

tenoke on Pablo's Shortform

I believe he means rationality-associsted discourse and it's not like there are so many contenders.

There's indeed been no one with that level of reach that has spread that much misinformation and started this many negative rumors in this space as David Gerard and RW. Whoever the second closest contender is, is likely not even close.

You can trace back to him A LOT of the negative press online that LW, EY and a ton of other places and people have got. If it wasn't for RW LW would be much, much more respected.

gordon-seidoh-worley on How to end credentialism

We can design a better educational institution if we separate assessment from teaching. We can do better than you propose if assessment is fully decoupled from teaching. MIT wouldn't hand out degrees; some other body would. MIT's role would be to educate people to be able to pass those assessments to the extent anyone cared about performance on those assessments.

Of course there's a bunch of ways I expect such a design to fail, but if the goal is education, then this seems like a more efficient way to do it.

julian-bradshaw on Julian Bradshaw's Shortform

Re: biosignatures detected on K2-18b, there's been a couple popular takes saying this solves the Fermi Paradox: K2-18b is so big (8.6x Earth mass) that you can't get to orbit, and maybe most life-bearing planets are like that.

This is wrong on several bases:

You can still get to orbit there, it's just much harder (only 1.3g b/c of larger radius!) (https://x.com/CheerupR/status/1913991596753797383)
It's much easier for us to detect large planets than small ones (https://exoplanets.nasa.gov/alien-worlds/ways-to-find-a-planet), but we expect small ones to be common too (once detected you can then do atmospheric spectroscopy via JWST to find biosignatures)
Assuming K2-18b does have life actually makes the Fermi paradox worse, because it strongly implies single-celled life is common in the galaxy, removing a potential Great Filter

elizabeth-1 on Why Have Sentence Lengths Decreased?

I write shorter sentences thanks to the editing work of LW editor @JustisMills [LW · GW] and the book Several Short Sentences About Writing.

samuelshadrach on A Dissent on Honesty

Update: I read your examples and I honestly don’t see how any of these 3 people would be better off by their own idea of what better off means, if they were less open or less truthful.

P.S. discussing anonymously is easier if you’re not confident you can handle the social repercussions of discussing it under your real name. I agree that morality is social dark matter and it’s difficult to argue in favour of positions that are pro-violence pro-deception etc under your real name.

a_raybould on Why Have Sentence Lengths Decreased?

This is an interesting question and you have made many pertinent points, but it remains unclear to me why a move from listening to silent reading creates selective pressure for styles that can be received and understood quickly. If that is an advantage in silent reading, why less so for the same words spoken? After all, listening seems to be burdened with a few additional barriers to comprehension, such as in disambiguating homophones and the inability to skip backwards and re-hear what was just said.

The preference for brevity in telegraphy and newspapers does not strike me as evidence for the above proposition (and might be regarded as examples of the phenomenon to be explained, rather than part of its explanation.) In particular, telegraphy is actually aurally-received communication! In the case of newspapers, an alternative hypothesis lies in there having been clear pressure to compress the message into few column-inches.

You have presented evidence that writers today tend to use longer sentences while speaking than when writing, which (if it holds generally) is consistent with the view that brevity is more valuable in silent reading, but it does not, by itself, establish that as a fact, and it is also consistent with alternative hypotheses, such as speech being produced in real time, without much time for optimization.

One could say much the same about the Flesh-Kincaid readability scores, unless there is evidence that this holds less strongly (if at all) for the spoken word (the observation of writers being more loquacious when speaking is not sufficient to establish that: we would need evidence that long spoken sentences are easier to understand than the same thought spoken as one or more short sentences, and then we would want to understand why this is not the case when reading.)