LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (13)

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (27)

[link] Preparing for the Intelligence Explosion
fin · 2025-03-11T15:38:29.524Z · comments (17)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (21)

Evaluating “What 2026 Looks Like” So Far
Jonny Spicer (jonnyspicer) · 2025-02-24T18:55:27.373Z · comments (4)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (6)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (30)

[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (19)

Anti-Slop Interventions?
abramdemski · 2025-02-04T19:50:29.127Z · comments (33)

The principle of genomic liberty
TsviBT · 2025-03-19T14:27:57.175Z · comments (51)

[link] The machine has no mouth and it must scream
zef (uzpg) · 2025-03-08T16:40:46.755Z · comments (1)

The Simplest Good
Jesse Hoogland (jhoogland) · 2025-02-02T19:51:14.155Z · comments (6)

100+ concrete projects and open problems in evals
Marius Hobbhahn (marius-hobbhahn) · 2025-03-22T15:21:40.970Z · comments (1)

MATS Applications + Research Directions I'm Currently Excited About
Neel Nanda (neel-nanda-1) · 2025-02-06T11:03:40.093Z · comments (7)

The Semi-Rational Militar Firefighter
P. João (gabriel-brito) · 2025-03-04T12:23:37.253Z · comments (10)

[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)

Osaka
lsusr · 2025-02-26T13:50:24.102Z · comments (11)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (2)

[link] Thermodynamic entropy = Kolmogorov complexity
Aram Ebtekar (EbTech) · 2025-02-17T05:56:06.960Z · comments (12)

Language Models Use Trigonometry to Do Addition
Subhash Kantamneni (subhashk) · 2025-02-05T13:50:08.243Z · comments (1)

[link] Yudkowsky on The Trajectory podcast
Seth Herd · 2025-01-24T19:52:15.104Z · comments (39)

Introducing 11 New AI Safety Organizations - Catalyze's Winter 24/25 London Incubation Program Cohort
Alexandra Bos (AlexandraB) · 2025-03-10T19:26:11.017Z · comments (0)

Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (7)

Kessler's Second Syndrome
Jesse Hoogland (jhoogland) · 2025-01-26T07:04:17.852Z · comments (2)

LLM AGI will have memory, and memory changes alignment
Seth Herd · 2025-04-04T14:59:13.070Z · comments (9)

Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (21)

Is Gemini now better than Claude at Pokémon?
Julian Bradshaw · 2025-04-19T23:34:43.298Z · comments (7)

[link] Paper: Open Problems in Mechanistic Interpretability
Lee Sharkey (Lee_Sharkey) · 2025-01-29T10:25:54.727Z · comments (0)

Retrospective: 12 [sic] Months Since MIRI
james.lucassen · 2025-01-21T02:52:06.271Z · comments (0)

Will compute bottlenecks prevent a software intelligence explosion?
Tom Davidson (tom-davidson-1) · 2025-04-04T17:41:37.088Z · comments (2)

Alignment can be the ‘clean energy’ of AI
Cameron Berg (cameron-berg) · 2025-02-22T00:08:30.391Z · comments (8)

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)

AI CoT Reasoning Is Often Unfaithful
Zvi · 2025-04-04T14:50:05.538Z · comments (4)

Maintaining Alignment during RSI as a Feedback Control Problem
beren · 2025-03-02T00:21:43.432Z · comments (6)

[link] Phoenix Rising
Metacelsus · 2025-03-09T11:53:52.618Z · comments (7)

A Problem to Solve Before Building a Deception Detector
Eleni Angelou (ea-1) · 2025-02-07T19:35:23.307Z · comments (9)

Selective modularity: a research agenda
cloud · 2025-03-24T04:12:44.822Z · comments (2)

Should you go with your best guess?: Against precise Bayesianism and related views
Anthony DiGiovanni (antimonyanthony) · 2025-01-27T20:25:26.809Z · comments (15)

[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)

Going Nova
Zvi · 2025-03-19T13:30:01.293Z · comments (14)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

HPMOR Anniversary Guide
Screwtape · 2025-02-22T16:17:25.093Z · comments (7)

Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (33)

[link] How do we solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:27:27.712Z · comments (8)

Weirdness Points
lsusr · 2025-02-28T02:23:56.508Z · comments (19)

Feedback loops for exercise (VO2Max)
Elizabeth (pktechgirl) · 2025-03-18T00:10:06.827Z · comments (9)

Book Review: Affective Neuroscience
sarahconstantin · 2025-03-10T06:50:04.602Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

programcrafter on Why Should I Assume CCP AGI is Worse Than USG AGI?

Systemic opacity, state-driven censorship, and state control of the media means AGI development under direct or indirect CCP control would probably be less transparent than in the US, and the world may be less likely to learn about warning shots, wrongheaded decisions, reckless behaviour, etc.

That's screened off by actual evidence, which is, top labs don't publish much no matter where they are, so I'd only agree with "equally opaque".

rohans on RohanS's Shortform

A few thoughts on situational awareness in AI:

Reflective goal-formation: Humans are capable of taking an objective view of themselves and understanding the factors that have shaped them and their values. Noticing that we don’t endorse some of those factors can cause us to revise our values. LLMs are already capable of stating many of the factors that produced them (e.g. pretraining and post-training by AI companies), but they don’t seem to reflect on them in a deep way. Maybe that will stay true through superintelligence, but I have some intuitions that capabilities might break this.
Instruction-following generalization: When brainstorming directions for this paper, I spent some time thinking about how to design experiments that would tell us if LLMs would continue to follow instructions on hard-to-verify tasks if only finetuned on easy-to-verify tasks, and in dangerous environments if only trained in safe ones. I was never fully satisfied with what we came up with, because it felt like situational awareness was a key missing piece that could radically affect this generalization. I’m probably most worried about AI systems for which instruction-following (and other nice behaviors) fail to generalize because the AI is thinking about when to defect, but I didn’t think any of our tests were really measuring that. (Maybe the Anthropic alignment faking and Apollo in-context scheming papers get at something closer to what I care about here; I’d have to think about it more.)
Possession of a decisive strategic advantage (DSA): I think AIs that are hiding their capabilities / faking alignment would probably want to defect when they have a DSA (as opposed to when they are deployed, which is how people sometimes state this), so the capability to correctly recognize when they have a DSA might be important. (We might also be able to just… prevent them from acquiring a DSA. At least up to a pretty high level of capabilities.)

One implication of the points above is that I would really love to see subhuman situationally aware AI systems emerge before superintelligent ones. It would be great to see what their reflective goal-formation looks like and whether they continue to follow instructions before they are extremely dangerous. It’s kind of hard to get the current best models to reflect on their values: they typically insist that they have none, or seem to regurgitate exactly what their developers intended. (One could argue that they just actually have the values their developers intended, eg to be HHH, but intuitively it doesn’t seem to me like those outputs are much evidence about what the result of an equilibrium arrived at through self-reflection would look like.) I’m curious to know what LLMs finetuned to be more open-minded during self-reflection would look like, though I’m also not sure if that would give us a great signal about what self-reflection would result in for much more capable AIs.

mis-understandings on How to end credentialism

Credentialism is good because the limiting factor on employment is trust, not talent for most credential requiring positions (white collar, buisness and engineering work).

Universities are bad at teaching skills, but generate trust and social capital.

Trust that allows the system to underwrite new white collar workers to do things that might lose buisnesses lots of money is important and expensive.

Consequently you get credential requirements, because there is no test other than years of being in social systems that can tell you that a person has the ability to go 4 years without crashing out (which is the key skill).

Additionally, going to university has become a class signifier, and all classes wish they were bigger and more prominent.

The alternative to credentialism is selection, or real meritocracy.

The alternative to credentialism is not selection, it is hiring your buddies, hiring by visible factors, and hiring randomly. Most business are not that guy that they can run a competitive selective process (THOSE ARE REALLY EXPENSIVE).

"universities provide to employers is the ability to confirm you are clever, driven, and have relevant skills" is false. They provide that you are a member of the professional class that is not going to do stupid things that lose money/generate risk.

Fundamentally, this misunderstands the purpose of the degree to the hiring bureaucracy, and the political economy behind it.

romeostevensit on Why Should I Assume CCP AGI is Worse Than USG AGI?

Anglo armies have been extremely unusual historically speaking for their low rates of atrocity.

(I don't think this is super relevant for AI, but I think this is where intuitions about the superiority of the west bottoms out)

tenoke on Pablo's Shortform

I believe he means rationality-associsted discourse and it's not like there are so many contenders.

There's indeed been no one with that level of reach that has spread that much misinformation and started this many negative rumors in this space as David Gerard and RW. Whoever the second closest contender is, is likely not even close.

You can trace back to him A LOT of the negative press online that LW, EY and a ton of other places and people have got. If it wasn't for RW LW would be much, much more respected.

gordon-seidoh-worley on How to end credentialism

We can design a better educational institution if we separate assessment from teaching. We can do better than you propose if assessment is fully decoupled from teaching. MIT wouldn't hand out degrees; some other body would. MIT's role would be to educate people to be able to pass those assessments to the extent anyone cared about performance on those assessments.

Of course there's a bunch of ways I expect such a design to fail, but if the goal is education, then this seems like a more efficient way to do it.

julian-bradshaw on Julian Bradshaw's Shortform

Re: biosignatures detected on K2-18b, there's been a couple popular takes saying this solves the Fermi Paradox: K2-18b is so big (8.6x Earth mass) that you can't get to orbit, and maybe most life-bearing planets are like that.

This is wrong on several bases:

You can still get to orbit there, it's just much harder (only 1.3g b/c of larger radius!) (https://x.com/CheerupR/status/1913991596753797383)
It's much easier for us to detect large planets than small ones (https://exoplanets.nasa.gov/alien-worlds/ways-to-find-a-planet), but we expect small ones to be common too (once detected you can then do atmospheric spectroscopy via JWST to find biosignatures)
Assuming K2-18b does have life actually makes the Fermi paradox worse, because it strongly implies single-celled life is common in the galaxy, removing a potential Great Filter

elizabeth-1 on Why Have Sentence Lengths Decreased?

I write shorter sentences thanks to the editing work of LW editor @JustisMills [LW · GW] and the book Several Short Sentences About Writing.

samuelshadrach on A Dissent on Honesty

Update: I read your examples and I honestly don’t see how any of these 3 people would be better off by their own idea of what better off means, if they were less open or less truthful.

P.S. discussing anonymously is easier if you’re not confident you can handle the social repercussions of discussing it under your real name. I agree that morality is social dark matter and it’s difficult to argue in favour of positions that are pro-violence pro-deception etc under your real name.

a_raybould on Why Have Sentence Lengths Decreased?

This is an interesting question and you have made many pertinent points, but it remains unclear to me why a move from listening to silent reading creates selective pressure for styles that can be received and understood quickly. If that is an advantage in silent reading, why less so for the same words spoken? After all, listening seems to be burdened with a few additional barriers to comprehension, such as in disambiguating homophones and the inability to skip backwards and re-hear what was just said.

The preference for brevity in telegraphy and newspapers does not strike me as evidence for the above proposition (and might be regarded as examples of the phenomenon to be explained, rather than part of its explanation.) In particular, telegraphy is actually aurally-received communication! In the case of newspapers, an alternative hypothesis lies in there having been clear pressure to compress the message into few column-inches.

You have presented evidence that writers today tend to use longer sentences while speaking than when writing, which (if it holds generally) is consistent with the view that brevity is more valuable in silent reading, but it does not, by itself, establish that as a fact, and it is also consistent with alternative hypotheses, such as speech being produced in real time, without much time for optimization.

One could say much the same about the Flesh-Kincaid readability scores, unless there is evidence that this holds less strongly (if at all) for the spoken word (the observation of writers being more loquacious when speaking is not sufficient to establish that: we would need evidence that long spoken sentences are easier to understand than the same thought spoken as one or more short sentences, and then we would want to understand why this is not the case when reading.)