LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (209)

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (333)

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (137)

A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis · 2025-03-05T16:41:37.639Z · comments (155)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (47)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (26)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (45)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley (jan-betley) · 2025-02-25T17:39:31.059Z · comments (91)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (65)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (79)

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (44)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (28)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (42)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (30)

Accountability Sinks
Martin Sustrik (sustrik) · 2025-04-22T05:00:02.617Z · comments (35)

[link] Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (82)

[link] Trojan Sky
Richard_Ngo (ricraz) · 2025-03-11T03:14:00.681Z · comments (39)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (104)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (41)

Why Should I Assume CCP AGI is Worse Than USG AGI?
Tomás B. (Bjartur Tómas) · 2025-04-19T14:47:52.167Z · comments (73)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (49)

[link] Power Lies Trembling: a three-book review
Richard_Ngo (ricraz) · 2025-02-22T22:57:59.720Z · comments (27)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (9)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (26)

[link] To Understand History, Keep Former Population Distributions In Mind
Arjun Panickssery (arjun-panickssery) · 2025-04-23T04:51:26.936Z · comments (12)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (28)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (17)

[link] OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-11T02:17:21.026Z · comments (25)

[link] Jaan Tallinn's 2024 Philanthropy Overview
jaan · 2025-04-23T11:06:11.779Z · comments (6)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

So how well is Claude playing Pokémon?
Julian Bradshaw · 2025-03-07T05:54:45.357Z · comments (74)

[link] On the Rationality of Deterring ASI
Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · comments (34)

Impact, agency, and taste
benkuhn · 2025-04-19T21:10:06.960Z · comments (9)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (48)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (52)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

[question] Have LLMs Generated Novel Insights?
abramdemski · 2025-02-23T18:22:12.763Z · answers+comments (36)

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (27)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (3)

Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (8)

next page (older posts) →

Archive

Recent comments

sanyer on lillybaeum's Shortform

What exactly is worrying about AI developing a comprehensive picture of your life? (I can think of at least a couple problems, e.g. privacy, but I'm curious how you think about it)

luan-fletcher on CstineSublime's Shortform

I think when explaining it to non-technical people, just saying something like “it’s a big next word predictor” is close enough to the truth to work.

tenoke on Tenoke's Shortform

One of the reasons why it's plausible that today's or tomorrow's LLMs can result in brief simulations of consciousness or even qualia is that it happens with dreams in humans. Dreams are likely some sort of processing of information/compression/garbage collection, yet they still result in (badly) simulated experiences as a clear side-effect of trying to work with human experience data.

luan-fletcher on sam's Shortform

To be clear, I think it’s very unlikely they are conscious etc., this is a comment on a reflexive process going on in my head

luan-fletcher on sam's Shortform

I find that the new personalities of 4o trigger my “person” detectors too much, and I feel uncomfortable extracting work from them.

faul_sname on Our Reality: A Simulation Run by a Paperclip Maximizer

I don't know about "by a paperclip maximizer", but one thing that stands out to me:

If we're in a simulation, we could be in a simulation where the simulator did 1e100 rollouts from the big bang forward, and then collected statistics from all those runs.

But we could also be in a simulation where the simulator is doing importance sampling - that is, doing fewer rollouts from states that tend to have very similar trajectories given mild perturbations, and doing more rollouts from states that tend to have very different trajectories given mild perturbations.

If that's the case, we should find ourselves living in a world where events seem to be driven by coincidences and particularly by things which are downstream of chaotic dynamics and which had around a 50/50 chance of happening vs not. We should find more such coincidences for important things than for unimportant things.

Hey, maybe it's not our fault we live in the clown world. Maybe the clown world is a statistical inevitability.

d0themath on Our Reality: A Simulation Run by a Paperclip Maximizer

Yet the universe runs on strikingly simple math (relativity, quantum mechanics); such elegance is exactly what an efficient simulation would use. Physics is unreasonably effective, reducing the computational cost of the simulation. This cuts against the last point.

This does not seem so consistent, and if the primary piece of evidence for me against such simulation arguments. I would imagine simulations targeting, eg, a particular purpose would have their physics tailored to that purpose much more than ours seems to (for any purpose, given the vast computational complexity of our physics, and the vast number of objects such a physics engine needs to keep track of). For example, I'd expect most simulations physics to look more like Greg Egan's Crystal Nights (incidentally this story is what first convinced me the simulation hypothesis was false).

One may argue its all there just to convince us we're not in a simulation. Perhaps, but two points:

Given the discourse on the simulation hypothesis, most seem to take our physics as evidence in favor of it, as you do here. So I don't think most think clearly enough about this for our civilizational decisions to be so dependent on this.
The simulators will have trade-offs and resource constraints too. Perhaps they simulate few highly detailed simulations, and many highly simplified simulations. If this is exponential, in the sense that as the detail decreases the number of simulations exponentially increases, we should expect to be in the least detailed world consistent with the existence of sentiences and for which its not blatantly obvious we're in a simulation.

Of course this argument would break given sufficiently different physics from ours, enabling perhaps our world to be simulated in as much depth as it is very cheaply. But then that seems intuitively at least very unlikely & complex a hypothesis.

7vik on Among Us: A Sandbox for Agentic Deception

Thanks! Yep, makes sense - that's one of the things we'll be working on and hope to share some results soon!

purple-fire on Kodo and Din

I mean this is a summary of a talk I didn't see so I want to reserve some judgement. But at the same time I just can't imagine being in a situation where I don't want to use the word signal because the other person might think I'm talking about cell signal, so instead I bust out "kodo"

In general, I think we should be pretty wary of taking basic ideas and dressing them up in fancy words. It serves no purpose other than in-group signaling.

johannes_treutlein on Modifying LLM Beliefs with Synthetic Document Finetuning

I think there is a difference between finetuning and prompting in that in the prompting case, the LLM is aware that it's taking part in a role playing scenario. With finetuning on synthetic documents, it is possible to make the LLM more deeply believe something. Maybe one could make the finetuning more sample efficient by instead distilling a prompted model. Another option could be using steering vectors, though I'm not sure that would work better than prompting.