LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-03T17:01:06.004Z · comments (1)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (29)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (6)

The principle of genomic liberty
TsviBT · 2025-03-19T14:27:57.175Z · comments (51)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (14)

100+ concrete projects and open problems in evals
Marius Hobbhahn (marius-hobbhahn) · 2025-03-22T15:21:40.970Z · comments (1)

[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (2)

Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (7)

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)

Will compute bottlenecks prevent a software intelligence explosion?
Tom Davidson (tom-davidson-1) · 2025-04-04T17:41:37.088Z · comments (2)

AI CoT Reasoning Is Often Unfaithful
Zvi · 2025-04-04T14:50:05.538Z · comments (4)

Selective modularity: a research agenda
cloud · 2025-03-24T04:12:44.822Z · comments (2)

Going Nova
Zvi · 2025-03-19T13:30:01.293Z · comments (14)

[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)

LLM AGI will have memory, and memory changes alignment
Seth Herd · 2025-04-04T14:59:13.070Z · comments (9)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

Apply to MATS 8.0!
Ryan Kidd (ryankidd44) · 2025-03-20T02:17:58.018Z · comments (4)

Renormalization Roadmap
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:34:16.352Z · comments (7)

Feedback loops for exercise (VO2Max)
Elizabeth (pktechgirl) · 2025-03-18T00:10:06.827Z · comments (9)

FrontierMath Score of o3-mini Much Lower Than Claimed
YafahEdelman (yafah-edelman-1) · 2025-03-17T22:41:06.527Z · comments (7)

[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (32)

[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)

[link] Sentinel's Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.
NunoSempere (Radamantis) · 2025-03-17T19:34:01.850Z · comments (3)

Alignment faking CTFs: Apply to my MATS stream
joshc (joshua-clymer) · 2025-04-04T16:29:02.070Z · comments (0)

Solving willpower seems easier than solving aging
Yair Halberstadt (yair-halberstadt) · 2025-03-23T15:25:40.861Z · comments (28)

Socially Graceful Degradation
Screwtape · 2025-03-20T04:03:41.213Z · comments (9)

On Google’s Safety Plan
Zvi · 2025-04-11T12:51:12.112Z · comments (6)

Housing Roundup #11
Zvi · 2025-04-01T16:30:03.694Z · comments (1)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

Consider showering
bohaska (Bohaska) · 2025-04-01T23:54:26.714Z · comments (15)

My "infohazards small working group" Signal Chat may have encountered minor leaks
Linch · 2025-04-02T01:03:05.311Z · comments (0)

OpenAI Responses API changes models' behavior
Jan Betley (jan-betley) · 2025-04-11T13:27:29.942Z · comments (6)

Notes on countermeasures for exploration hacking (aka sandbagging)
ryan_greenblatt · 2025-03-24T18:39:36.665Z · comments (6)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (4)

Reframing AI Safety as a Neverending Institutional Challenge
scasper · 2025-03-23T00:13:48.614Z · comments (12)

Gemini 2.5 is the New SoTA
Zvi · 2025-03-28T14:20:03.176Z · comments (1)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (11)

AI #110: Of Course You Know…
Zvi · 2025-04-03T13:10:05.674Z · comments (9)

We’re not prepared for an AI market crash
Remmelt (remmelt-ellen) · 2025-04-01T04:33:55.040Z · comments (12)

The vision of Bill Thurston
TsviBT · 2025-03-28T11:45:14.297Z · comments (34)

Against Yudkowsky's evolution analogy for AI x-risk [unfinished]
Fiora Sunshine (Fiora from Rosebloom) · 2025-03-18T01:41:06.453Z · comments (18)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (6)

AI "Deep Research" Tools Reviewed
sarahconstantin · 2025-03-24T18:40:03.864Z · comments (5)

Introducing BenchBench: An Industry Standard Benchmark for AI Strength
Jozdien · 2025-04-02T02:11:41.555Z · comments (0)

Four Types of Disagreement
silentbob · 2025-04-13T11:22:38.466Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

stephen-mcaleese on What Makes an AI Startup "Net Positive" for Safety?

Some quick thoughts on which types of companies are net positive, negative or neutral:

Net positive:
- Focuses on interpretability, alignment, evals, security (e.g. Goodfire, Gray Swan, Conjecture, Deep Eval).
Net negative:
- Directly intends to build AGI without a significant commitment to invest in safety (e.g. Keene Technologies).
- Shortens timelines or worsens race dynamics without any other upside to compensate for that.
Neutral / debatable:
- Applies AI capabilities to a specific problem without generally increasing capabilities (e.g. Midjourney, Suno, Replit, Cursor).
- Keeps up with the capabilities frontier while also having a strong safety culture and investing substantially in alignment (e.g. Anthropic).

tenoke on Why Should I Assume CCP AGI is Worse Than USG AGI?

Democratic in the 'favouring or characterized by social equality; egalitarian.' sense (one of the definitions from Google), rather than about Elections or whatever.

For example, I recently wrote a Short Story of my Day in 2035 in the scenario where things continue mostly like that and we get positive AGI that's similarish enough to current trends. There, people influenced the initial values - mainly via The Spec, and can in theory vote to make some changes to The Spec that governs the general AI values, but in practice by that point AGI controls everything and it's more or less set in stone. Still, it overall mostly tries to fulfil people's desires (overly optimistic that we go this route, I know).

I'd call that more democratic than one that upholds CCP values specifically.

thane-ruthenis on Why Should I Assume CCP AGI is Worse Than USG AGI?

Since the US government is expected to treat other stakeholders in its previous block better than China treats members of it's block

At the risk of getting too into politics...

IMO, this was maybe-true for the previous administrations, but is completely false for the current one. All people making the argument based on something like this reasoning need to update.

Previous administrations were more or less dead inertial bureaucracies. Those actually might have carried on acting in democracy-ish ways even when facing outside-context events/situations, such as suddenly having access to overwhelming ASI power. Not necessarily because were particularly "nice", as such, but because they weren't agenty enough to do something too out-of-character compared to their previous democracy-LARP behavior.

I still wouldn't have bet on them acting in pro-humanity ways (I would've expected some more agenty/power-hungry governmental subsystem to grab the power, circumventing e. g. the inertial low-agency Presidential administration). But there was at least a reasonable story there.

The current administration seems much more agenty: much more willing to push the boundaries of what's allowed and deliberately erode the constraints on what it can do. I think it doesn't generalize to boring democracy-ish behavior out-of-distribution, I think it eagerly grabs and exploits the overwhelming power. It's already chomping at the bit to do so.

alexander-gietelink-oldenziel on Dalcy's Shortform

@Fernando Rosas [LW · GW]

bridgett-kay on The Last Light

If it helps anyone suffering from existential fear, I'm very glad I put it out here.

haiku-1 on Why Should I Assume CCP AGI is Worse Than USG AGI?

I don't know what it would mean for AI to "be democratic." People in a democratic system can use tool AI, but if ASI is created, there will be no room for human decision-making on any level of abstraction that the AI cares about. I suppose it's possible for an ASI to focus its efforts solely on maintaining a democratic system, without making any object-level decisions itself. But I don't think anyone is even trying to build such a thing.

If intent-aligned ASI is successfully created, the first step is always "take over the world," which isn't a very democratic thing to do. That doesn't necessarily mean there is a better alternative, but I do so wish that AI industry leaders would stop making overtures to democracy out of the other side of their mouth. For most singularitarians, this is and always has been about securing or summoning ultimate power and ushering in a permanent galactic utopia.

anthonyc on Why Should I Assume CCP AGI is Worse Than USG AGI?

As things stand today, if AGI is created (aligned or not) in the US, it won't be by the USG or agents of the USG. I'll be by a private or public company. Depending on the path to get there, there will be more or less USG influence of some sort. But if we're going to assume the AGI is aligned to something deliberate, I wouldn't assume AGI built in the US is aligned to the current administration, or at least significantly less so than the degree to which I'd assume AGI built in China by a Chinese company would be aligned to the current CCP.

For more concrete reasons regarding national ideals, the US has a stronger tradition of self-determination and shifting values over time, plausibly reducing risk of lock-in. It has a stronger tradition (modern conservative politics notwithstanding) of immigration and openness.

In other words, it matters a lot whether the aligned US-built AGI is aligned to the Trump administration, the Constitution, the combined writings of the US founding fathers and renowned leaders and thinkers, the current consensus of the leadership at Google or OpenAI, the overall gestalt opinions of the English-language internet, or something else. I don't have enough understanding to make a similar list of possibilities for China, but some of the things I'd expect it would include don't seem terrible. For example, I don't think a genuinely-aligned Confucian sovereign AGI is anywhere near the worst outcome we could get.

expertium on AI 2027: What Superintelligence Looks Like

I don't have one knock-down counterargument why the timelines will be much longer, so here's a whole lot of convincing-but-not-super-convincing counterarguments:

This contradicts METR timelines [LW · GW], which, IMO, is the best piece of info we currently have to predict when AGI will arrive.
Microsoft is not going to fund Stargate. At best, it means Stargate will be delayed. At worst, it means it will be axed. For the timelines in this post to be accurate, Stargate would have to be half-finished right now, today. Even if OpenAI could literally print money, large data centers take 2-6 years to build.
"Moreover, it [Agent-1] could offer substantial help to terrorists designing bioweapons, thanks to its PhD-level knowledge of every field and ability to browse the web."
"Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research."
Considering that frontier LLMs of today can solve at most 20% of problems on Humanity's Last Exam, both of these predictions appear overly optimistic to me. And HLE isn't even about autonomous research, it's about "closed-ended, verifiable questions". Even if some LLM scored >90% on HLE by late 2025 (I bet this won't happen), that wouldn't automatically imply that it's good at open-ended problems with no known answer. Present-day LLMs have so little agency that it's not even worth talking about.
A memory module that can be stored externally (on a hard drive) is handwaved as something that Just Works™, I don't expect it to be so easy.
As of today, there is no robust anti-hallucination/error correction mechanism for LLMs. It seems like another thing that is handwaved as something that Just Works™: just beat the neural net with the RLHF stick until the outputs look about right.
Imagine writing a similar piece for videogames in 2016, after AlphaGo became news. If someone wrote that by 2018-2019 all mainstream videogames would use neural nets trained using RL, they would be laughably wrong. Hell, if someone wrote that but replaced 2018-2019 with 2024-2025, they would still be wrong.
This is the least convincing argument of all of these, it's just my way of saying, "I don't feel like reality actually, really works this way". On that nore, I also expect that recursive self-improvement requires a completely new architecture that not only doesn't look like Transformers but doesn't even look like a neural network.

Not a criticism, but I think you overlooked a very interesting possibility: developing a near-perfect speech-to-text transcription AI and transcribing the entire YouTube. The biggest issue with training multi-modal models is acquiring the right ("paired") training data. If YouTube had 99.9% accurate subtitles for every video, this would no longer be a problem.

raemon on What Makes an AI Startup "Net Positive" for Safety?

Well yeah, but the question here is "what should be community guidelines on specifically how to approach startups that are aimed at specifically helping with AI safety" (which may or may not include AI), not "what kinds of AI startups should people start, if any?"

haiku-1 on The Last Light

This is beautiful, and I have been soothed and uplifted by it. Thank you for sharing the gift of gratitude with me.