LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Last week of the Discussion Phase
Raemon · 2025-01-09T19:26:59.136Z · comments (0)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

First Solo Bus Ride
jefftk (jkaufman) · 2024-12-03T12:20:02.344Z · comments (1)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Sahil · 2024-11-07T05:27:20.276Z · comments (1)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (1)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

Will bird flu be the next Covid? "Little chance" says my dashboard.
Nathan Young · 2025-01-07T20:10:50.080Z · comments (0)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (2)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Gratitudes: Rational Thanks Giving
Seth Herd · 2024-11-29T03:09:47.410Z · comments (2)

Trying Bluesky
jefftk (jkaufman) · 2024-11-17T02:50:04.093Z · comments (17)

[link] Introducing the Anthropic Fellows Program
Miranda Zhang (miranda-zhang) · 2024-11-30T23:47:29.259Z · comments (0)

AI #93: Happy Tuesday
Zvi · 2024-12-04T00:30:06.891Z · comments (2)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

Mask and Respirator Intelligibility Comparison
jefftk (jkaufman) · 2024-12-07T03:20:01.585Z · comments (5)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Preface
Allison Duettmann (allison-duettmann) · 2025-01-02T18:59:46.290Z · comments (1)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

Action derivatives: You’re not doing what you think you’re doing
PatrickDFarley · 2024-11-21T16:24:04.044Z · comments (0)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

Intranasal mRNA Vaccines?
J Bostock (Jemist) · 2025-01-01T23:46:40.524Z · comments (2)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

[link] Teaching My Younger Self to Program: A case study of how I'd pass on my skill at self-learning
Shoshannah Tekofsky (DarkSym) · 2024-12-01T21:05:15.602Z · comments (1)

[link] Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

No Electricity in Manchuria
winstonBosan · 2024-11-19T01:11:58.661Z · comments (0)

[link] The Roots of Progress 2024 in review
jasoncrawford · 2025-01-01T00:02:06.441Z · comments (0)

[link] Social events with plausible deniability
Chipmonk · 2024-11-18T18:25:17.339Z · comments (24)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lunatic_at_large on johnswentworth's Shortform

My initial reaction is that at least some of these points would be covered by the Guaranteed Safe AI agenda if that works out, right? Though the "AGIs act much like a colonizing civilization" situation does scare me because it's the kind of thing which locally looks harmless but collectively is highly dangerous. It would require no misalignment on the part of any individual AI.

lorec on Don't fall for ontology pyramid schemes

Relevant to whether the hot/cold/wet/dry system is a good or a bad idea, from our perspective, is that doctors don't currently use people's star signs for diagnosis. Bogus ontologies can be identified by how they promise to usefully replace more detailed levels of description - i.e., provide a useful abstraction that carves reality at the joints - and yet don't actually do this, from the standpoint of the cultures they're being sold to.

anthonyc on On Eating the Sun

I think that's very probably true, yes. I'm not certain that will continue to be true indefinitely, or that it will or should continue to be the deciding factor for future decision making. I'm just pointing out that we're actually discussing a small subset of a very large space of options, that there are ways of "eating the sun" that allow life to continue unaltered on Earth, and so on. TBH even if we don't do anything like this, I wouldn't be terribly surprised if future humans end up someday building a full or partial shell around Earth anyway, long before there's anything like large-scale starlifting under discussion. Living space, power generation, asteroid deflection, off-world industry, just to name a few reasons we might do something like that. It could end up being easier and cheaper to increase available surface area and energy by orders of magnitude doing something like this than by colonizing other planets and moons.

jmh on Subskills of "Listening to Wisdom"

I like the point about the need for some type of external competitive measure but as you say, they might not be a MMA gym where you need one.

Shifting the metaphor, I think your observation of the sucker punch fits well with the insight that for those with only a hammer, all problems look like nails. The gym would be someone with a screwdriver or riveter as well as the hammer. But even lacking the external check, we should always ask ourselves "Is this really a nail?" I might only have a hammer but if this isn't a nail while the results might be better than nothing (maybe?) I'm not likely to achieve the best that could be done some other way. And,of course, watching just what happens went the other hammer-only people solve the problem and compare results to those when we know the problem is a nail we might learn something from their mistakes.

johnswentworth on Fabien's Shortform

Yeah, I'm aware of that model. I personally generally expect the "science on model organisms"-style path to contribute basically zero value to aligning advanced AI, because (a) the "model organisms" in question are terrible models, in the sense that findings on them will predictably not generalize to even moderately different/stronger systems (like e.g. this story [LW(p) · GW(p)]), and (b) in practice IIUC that sort of work is almost exclusively focused on the prototypical failure story of strategic deception and scheming, which is a very narrow slice [LW(p) · GW(p)] of the AI extinction probability mass.

johnswentworth on johnswentworth's Shortform

Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it's relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that's mostly a bad thing; it's a form of streetlighting.

johnswentworth on johnswentworth's Shortform

I think a very common problem in alignment research today is that people focus almost exclusively on a specific story about strategic deception/scheming, and that story is a very narrow slice of the AI extinction probability mass. At some point I should probably write a proper post on this, but for now here are few off-the-cuff example AI extinction stories which don't look like the prototypical scheming story. (These are copied from a Facebook thread.)

Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren't smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).
The "Getting What We Measure" scenario from Paul's old "What Failure Looks Like [LW · GW]" post.
The "fusion power generator scenario [LW · GW]".
Perhaps someone trains a STEM-AGI, which can't think about humans much at all. In the course of its work, that AGI reasons that an oxygen-rich atmosphere is very inconvenient for manufacturing, and aims to get rid of it. It doesn't think about humans at all, but the human operators can't understand most of the AI's plans anyway, so the plan goes through. As an added bonus, nobody can figure out why the atmosphere is losing oxygen until it's far too late, because the world is complicated and becomes more so with a bunch of AIs running around and no one AI has a big-picture understanding of anything either (much like today's humans have no big-picture understanding of the whole human economy/society).
People try to do the whole "outsource alignment research to early AGI" thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they're already on the more-powerful next gen, so it's too late.
(At least some) AGIs act much like a colonizing civilization. Plenty of humans ally with it, trade with it, try to get it to fight their outgroup, etc, and the AGIs locally respect the agreements with the humans and cooperate with their allies, but the end result is humanity gradually losing all control and eventually dying out.
Perhaps early AGI involves lots of moderately-intelligent subagents. The AI as a whole mostly seems pretty aligned most of the time, but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight. (Think cancer, but more agentic.)
Perhaps the path to superintelligence looks like scaling up o1-style runtime reasoning to the point where we're using an LLM to simulate a whole society. But the effects of a whole society (or parts of a society) on the world are relatively decoupled from the things-individual-people-say-taken-at-face-value. For instance, lots of people talk a lot about reducing poverty, yet have basically-no effect on poverty. So developers attempt to rely on chain-of-thought transparency, and shoot themselves in the foot.

zac-hatfield-dodds on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible

I remain both skeptical some core claims in this post, and convinced of its importance. GeneSmith is one of few people with such a big-picture, fresh, wildly ambitious angle on beneficial biotechnology, and I'd love to see more of this genre.

One one hand on the object level, I basically don't buy the argument that in-vivo editing could lead to substantial cognitive enhancement in adults. Brain development is incredibly important for adult cognition, and in the maybe 1%--20% residual you're going well off-distribution for any predictors trained on unedited individuals. I too would prefer bets that pay off before my median AI timelines, but biology does not always allow us to have nice things.

On the other, gene therapy does indeed work in adults for some (much simpler) issues, and there might be valuable interventions which are narrower but still valuable. Plus, of course, there's the nineteen-ish year pathway [LW · GW] to adults, building on current practice [LW · GW]. There's no shortage of practical difficulties, but the strong or general objections I've seen seem ill-founded, and that makes me more optimistic about eventual feasibility of something drawing on this tech tree.

I've been paying closer attention to the space thanks to Gene's posts, to the point of making some related investments, and look forward to watching how these ideas fare on contact with biological and engineering reality over the next few years.

jeremy-gillen on Please don't throw your mind away

Tsvi has many underrated posts. This one was rated correctly.

I didn't previously have a crisp conceptual handle for the category that Tsvi calls Playful Thinking. Initially it seemed a slightly unnatural category. Now it's such a natural category that perhaps it should be called "Thinking", and other kinds should be the ones with a modifier (e.g. maybe Directed Thinking?).

Tsvi gives many theoretical justifications for engaging in Playful Thinking. I want to talk about one because it was only briefly mentioned in the post:

Your sense of fun decorrelates you from brain worms / egregores / systems of deference, avoiding the dangers of those.

For me, engaging in intellectual play is an antidote to political mindkilledness. It's not perfect. It doesn't work for very long. But it does help.

When I switch from intellectual play to a politically charged topic, there's a brief period where I'm just.. better at thinking about it. Perhaps it increases open-mindedness. But that's not it. It's more like increased ability to run down object-level thoughts without higher-level interference. A very valuable state of mind.

But this isn't why I play. I play because it's fun. And because it's natural? It's in our nature.

It's easy to throw this away under pressure, and I've sometimes done so. This post is a good reminder of why I shouldn't.

lunatic_at_large on How can humanity survive a multipolar AGI scenario?

Totalitarian dictatorship

I'm unclear why this risk is specific to multipolar scenarios? Even if you have a single AGI/ASI you could end up with a totalitarian dictatorship, no? In fact I would imagine that having multiple AGI/ASI's would mitigate this risk as, optimistically, every domestic actor in possession of an AGI/ASI should be counterbalanced by another domestic actor with divergent interests also in possession of an AGI/ASI.