LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (129)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?
johnswentworth · 2025-04-21T20:19:30.808Z · comments (12)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (25)

OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (40)

Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)

We should try to automate AI safety work asap
Marius Hobbhahn (marius-hobbhahn) · 2025-04-26T16:35:43.770Z · comments (9)

A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)

How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (58)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (60)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (7)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (20)

Effective Evil's AI Misalignment Plan
lsusr · 2024-12-15T07:39:34.046Z · comments (9)

[link] SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can (Can Rager) · 2024-12-11T06:30:37.076Z · comments (6)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (54)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)

o3 Is a Lying Liar
Zvi · 2025-04-23T20:00:05.429Z · comments (19)

Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (5)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

On the OpenAI Economic Blueprint
Zvi · 2025-01-15T14:30:06.773Z · comments (2)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

Microplastics: Much Less Than You Wanted To Know
jenn (pixx) · 2025-02-15T19:08:14.561Z · comments (8)

You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)

Open problems in emergent misalignment
Jan Betley (jan-betley) · 2025-03-01T09:47:58.889Z · comments (13)

I'm offering free math consultations!
Gurkenglas · 2025-01-14T16:30:40.115Z · comments (7)

MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (29)

No one has the ball on 1500 Russian olympiad winners who've received HPMOR
Mikhail Samin (mikhail-samin) · 2025-01-12T11:43:36.560Z · comments (21)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (12)

What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (23)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

1a3orn on "The Urgency of Interpretability" (Dario Amodei)

has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter)

I have the courage of my convictions; you ignore the opinions of others; he takes reckless unilateral action.

michaeldickens on "The Urgency of Interpretability" (Dario Amodei)

prosaic alignment is clearly not scalable to the types of systems they are actively planning to build

Why do you believe this?

(FWIW I think it's foolish that all (?) frontier companies are all-in on prosaic alignment, but I am not convinced that it "clearly" won't work.)

michaeldickens on "The Urgency of Interpretability" (Dario Amodei)

Just my personal opinion:

My sense is that Anthropic is somewhat more safety-focused than the other frontier AI companies, in that most of the companies only care maybe 10% as much about safety as they should, and Anthropic cares 15% as much as it should.

What numbers would you give to these labs?

My median guess is that if an average company is -100 per dollar then Anthropic is -75. I believe Anthropic is making things worse on net by pushing more competition, but an Anthropic-controlled ASI is a bit less likely to kill everyone than an ASI controlled by anyone else.

But I also have significant (< 50%) probability on Anthropic being the worst company in terms of actual consequences because its larger-but-still-insufficient focus on safety may create a false sense of security that ends up preventing good regulations from being implemented.

You may also be interested in SaferAI's risk management ratings.

I used to think Anthropic was [...] quite in sync with the AI x-risk community.

I think Anthropic leadership respects the x-risk community in their words but not in their actions. Anthropic says safety is important, and invests a decent amount into safety research; but also opposes coordination, supports arms races, and has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter).

mrcheeze on Recent AI model progress feels mostly like bullshit

But you have to be careful here, since the results heavily depend on details of the harness, as well as on how thoroughly they have memorized walkthroughs of the game.

wei-dai on Our Reality: A Simulation Run by a Paperclip Maximizer

But as you suggested in the post, the apparently vast amount of suffering isn't necessarily real? "most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities"

(However I take the point that doing such simulations can be risky or problematic, e g. if one's current ideas about consciousness is wrong, or if doing philosophy correctly requires having experienced real suffering.)

mis-understandings on O O's Shortform

Global GDP growth over the same period was around 3 percent.

The question is how did equities outperform gdp growth.

I think that this has to do with changes in asset prices in general.

alexander-howell on Why Should I Assume CCP AGI is Worse Than USG AGI?

Yes, my mistake. I meant Trump votes > Harris votes and forgot about 3rd parties. On the other hand 49.8% vs 50% + 1 feels semi trivial when compared to say the UK where Labour received 33.7% of the vote.

towards_keeperhood on Keltham on Becoming more Truth-Oriented

Stuff I noticed so far from thinking about this:

Sensation of desire for closure.
Desire to appear smart (mostly in front of people with very good epistemics, where incentives are relatively aligned to truth-oriented thinking and criticizing others and changing one's mind is incentivized but not overincentivized, but still).
When I think of a (new) piece of evidence/argument, my mind often initially over-updates into that direction for a minute or so, until I have integrated it into my overall model. (This happens in both directions. Aka I think my intuitive beliefs fluctuate more than makes sense from a Bayesian perspective, though I keep track on the meta level that I might not endorse my current intuitive pessimism/optimism about something and still need to evaluate it more neurally later.)

j-bostock on The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as *evidence*

Under this formulation, FEP is very similar to RL-as-inference. But RL-as-inference is a generalization of a huge number of RL algorithms from Q-learning to LLM fine-tuning. This does kind of make sense if we think of FEP as a just a different way of looking at things, but it doesn't really help us narrow down the algorithms that the brain is actually using. Perhaps that's actually all FEP is trying to do though, and Friston has IIRC said things to that effect---that FEP is just a reframing/generalization and not an actual model of the underlying algorithms being employed.

mitchell_porter on Why Should I Assume CCP AGI is Worse Than USG AGI?

I can imagine an argument analogous to Eliezer's old graphic illustrating that it's a mistake to think of a superintelligence as Einstein in a box. I'm referring to the graphic where you have a line running from left to right, on the left you have chimp, ordinary person, Einstein all clustered together, and then far away on the other side, "superintelligence", the point being that superintelligence far transcends all three.

In the same way, the nature of the world when you have a power that great is so different that the differences among all human political systems diminish to almost nothing by comparison, they are just trivial reorderings of power relations among beings so puny as to be almost powerless. Neither the Chinese nor the American system is built to include intelligent agents with the power of a god, that's "out of distribution" for both the Communist Manifesto and the Federalist Papers.

Because of that, I find it genuinely difficult to infer from the nature of the political system, what the likely character of a superintelligence interested in humanity could be. I feel like contingencies of culture and individual psychology could end up being more important. So long as you have elements of humaneness and philosophical reflection in a culture, maybe you have a chance of human-friendly superintelligence emerging.