LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (3)

DeepSeek Panic at the App Store
Zvi · 2025-01-28T19:30:07.555Z · comments (14)

[link] Preference Inversion
Benquo · 2025-01-02T18:15:52.938Z · comments (46)

[link] Just one more exposure bro
Chipmonk · 2024-12-12T21:37:07.069Z · comments (6)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)

[link] Dario Amodei: On DeepSeek and Export Controls
Zach Stein-Perlman · 2025-01-29T17:15:18.986Z · comments (2)

Introducing the WeirdML Benchmark
Håvard Tveit Ihle (havard-tveit-ihle) · 2025-01-16T11:38:17.056Z · comments (13)

AI #100: Meet the New Boss
Zvi · 2025-01-23T15:40:07.473Z · comments (4)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (29)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

A sketch of an AI control safety case
Tomek Korbak (tomek-korbak) · 2025-01-30T17:28:47.992Z · comments (0)

Logits, log-odds, and loss for parallel circuits
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-20T09:56:26.031Z · comments (3)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer · 2024-12-14T04:30:55.656Z · comments (16)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Analysis of Global AI Governance Strategies
Sammy Martin (SDM) · 2024-12-04T10:45:25.311Z · comments (10)

[link] Two interviews with the founder of DeepSeek
Cosmia_Nebula · 2024-11-29T03:18:47.246Z · comments (4)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

Finding Features Causally Upstream of Refusal
Daniel Lee (daniel-lee) · 2025-01-14T02:30:04.321Z · comments (5)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

[link] How quickly could robots scale up?
Benjamin_Todd · 2025-01-12T17:01:04.927Z · comments (22)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (5)

Meta Pivots on Content Moderation
Zvi · 2025-01-17T14:20:06.727Z · comments (3)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (11)

[link] Oppression and production are competing explanations for wealth inequality.
Benquo · 2025-01-05T14:13:15.398Z · comments (15)

AI #97: 4
Zvi · 2025-01-02T14:10:06.505Z · comments (4)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

Implications of the AI Security Gap
Dan Braun (dan-braun-1) · 2025-01-08T08:31:36.789Z · comments (0)

Against blanket arguments against interpretability
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-22T09:46:23.486Z · comments (4)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen
Zvi · 2025-01-10T13:50:05.563Z · comments (7)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

Things I have been using LLMs for
Kaj_Sotala · 2025-01-20T14:20:02.600Z · comments (5)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (15)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

Zvi’s 2024 In Movies
Zvi · 2025-01-13T13:40:05.488Z · comments (4)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

ARENA 4.0 Impact Report
Chloe Li (chloe-li-1) · 2024-11-27T20:51:54.844Z · comments (3)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chirpingsubagent on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Let's list some mysteries in this case! Here are 3 of mine.

Why did 2 killings happen within the span of one week?
How did so many people get radicalized i.e. going from being essentially nerds to people seemingly willing to commit pre-meditated murder?
What happened with Zajko and LaSota between Zajko writing about LaSota threatening to murder Zajko, and then Zajko's parent’s killings (which she is a suspect for)?

chirpingsubagent on Thread for Sense-Making on Recent Murders and How to Sanely Respond

There are a lot of journalists and documentarians who are inquiring about this, wanting to write articles and make documentaries.

What are people's heuristics for how to speak well with journalists and to choose whether and which journalists to talk with? Here are two that I've heard:

Small-town journalists tend to be much less politicized than those who write for a national outlet.
It's good to always be "off the record" when speaking, and then say you're happy to provide written quotes on-the-record via email afterward. This means you can (a) be deliberate in your wording, and (b) have proof if they misquote you!

ori-nagel on Can someone, anyone, make superintelligence a more concrete concept?

Well I appreciate your comment but I think something's missing as far as conveying the emotions of the situation. I can imagine a death, a car crash for example, or imagine death on an even bigger scale like a nuclear weapon. I can imagine a disaster movie before it resolves on a happy ending. But I think those conceptions don't convey much, because I acknowledge that superintelligence can be destructive and can even envision what the end state of destruction would look like. Just envisioning that end state without explaining superintelligence that caused us to get to that end state doesn't do much for me though.

thane-ruthenis on CBiddulph's Shortform

The problem with this neat picture is reward-hacking. This process wouldn't optimize for better performance on fuzzy tasks, it would optimize for performance on fuzzy tasks that looks better to the underlying model. And much like RLHF doesn't scale to superintelligence, this doesn't scale to superhuman fuzzy-task performance.

It can improve the performance a bit. But once you ramp up the optimization pressure, "better performance" and "looks like better performance" would decouple from each other and the model would train itself into idiosyncratic uselessness. (Indeed: if it were this easy, doesn't this mean you should be able to self-modify into a master tactician or martial artist by running some simulated scenarios in your mind, improving without bound, and without any need to contact reality?)

... Or so my intuition goes. It's possible that this totally works for some dumb reason. But I don't think so. RL has a long-standing history of problems with reward-hacking, and LLMs' judgement is one of the most easily hackable things out there.

(Note that I'm not arguing that recursive self-improvement is impossible in general. But RLAIF, specifically, just doesn't look like the way.)

rana-dexsin on What's Behind the SynBio Bust?

Small but repeated error: you mean “Ginkgo Bioworks”, right?

alex-semendinger on Attribution-based parameter decomposition

Thanks, that's a very helpful way of putting it!

Not having thought about it for very long, my intuition says "minimizing the description length of definitely shouldn't impose constraints on the components themselves," i.e. "Alice has no use for the rank-1 attributions." But I can see why it would be nice to find a way for Alice to want that information, and you probably have deeper intuitions for this.

kabir-kumar on Steering Gemini with BiDPO

Thank you for sharing negative results!!

david-duvenaud on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Thanks for this. Discussions of things like "one time shifts in power between humans via mechanisms like states becoming more powerful" and personal AI representatives is exactly the sort of thing I'd like to hear more about. I'm happy to have finally found someone who has something substantial to say about this transition!

But over the last 2 years I asked a lot of people at the major labs about for any kind of details about a positive post-AGI future and almost no one had put anywhere close to as much thought into it as you have, and no one mentioned the things above. Most people clearly hadn't put much thought into it at all. If anyone at the labs had much more of plan than "we'll solve alignment while avoiding an arms race", I managed to fail to even hear about its existence despite many conversations, including with founders.

The closest thing to a plan was Sam Bowman's checklist:
https://sleepinyourhat.github.io/checklist/
which is exactly the sort of thing I was hoping for, except it's almost silent on issues of power, the state, and the role of post-AGI humans.

If you have any more related reading for the main "things might go OK" plan in your eyes, I'm all ears.

archimedes on You should read Hobbes, Locke, Hume, and Mill via EarlyModernTexts.com

Unlike the Hobbes snippet, I didn’t feel like the Hume excerpt needed much translation to be accessible. I think I would decide on a case-by-case basis whether to read the translated version or the original rather than defaulting to one or the other.

raemon on The Gentle Romance

Nod. I'm just answering your question of why I consider it optimistic.