LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (17)

[question] What could a policy banning AGI look like?
TsviBT · 2024-03-13T14:19:07.783Z · answers+comments (23)

Grief is a fire sale
Nathan Young · 2024-03-04T01:11:06.882Z · comments (1)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (9)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (9)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (5)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (15)

On Claude 3.0
Zvi · 2024-03-06T18:50:04.766Z · comments (5)

Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (21)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (18)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

[link] Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
mattmacdermott · 2024-02-29T13:59:34.959Z · comments (19)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (15)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

My guess at Conjecture's vision: triggering a narrative bifurcation
Alexandre Variengien (alexandre-variengien) · 2024-02-06T19:10:42.690Z · comments (12)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (7)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (13)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (19)

[link] Video lectures on the learning-theoretic agenda
Vanessa Kosoy (vanessa-kosoy) · 2024-10-27T12:01:32.777Z · comments (0)

Vote on Anthropic Topics to Discuss
Ben Pace (Benito) · 2024-03-06T19:43:47.194Z · comments (55)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (79)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

[link] Claude 3.5 Sonnet
Zach Stein-Perlman · 2024-06-20T18:00:35.443Z · comments (41)

Analogies between scaling labs and misaligned superintelligent AI
scasper · 2024-02-21T19:29:39.033Z · comments (5)

“Artificial General Intelligence”: an extremely brief FAQ
Steven Byrnes (steve2152) · 2024-03-11T17:49:02.496Z · comments (6)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (20)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (22)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

[link] Cost, Not Sacrifice
Joe Rogero · 2024-11-20T21:32:26.281Z · comments (13)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

MATS AI Safety Strategy Curriculum
Ronny Fernandez (ronny-fernandez) · 2024-03-07T19:59:37.434Z · comments (2)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (12)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

simon on Rebuttals for ~all criticisms of AIXI

The biggest problem about AIXI in my view is the reward system - it cares about the future directly, whereas to have any reasonable hope of alignment an AI in my view needs to care about the future only via what humans would want about the future (so that any reference to the future is encapsulated in the "what do humans want?" aspect).

I.e. the question it needs to be answering is something like "all things considered (including the consequences of my current action on the future, as well as taking into account my possible future actions) what would humans, as they exist now, want me to do at the present moment?"

Now maybe you can take that question and try to slice it up into rewards at particular timesteps, which change over time as what is known about what humans want changes, without introducing corrigibility issues, but the AIXI reward framework isn't really buying you anything imo even if that works, relative to directly trying to get an AI to solve the question.

On the other hand approximating Solomonoff induction might afaik be a fruitful approach, though the approximations are going to have to be very aggressive for practical performance. I do agree embeddding/self-reference can probably be patched in (and it needs to be to answer the right question, so better get it right though).

niplav on The Power to Teach Concepts Better

A superintelligent mind with a reasonable amount of working memory could process generic statements all day long and never whine about dangling concepts. (I feel like the really smart people on LessWrong and Math Overflow also exhibit this behavior to some degree.) But as humans with tragically limited short-term memories, we need all the help we can get. We need our authors and teachers to give us mind-hangers.

I think we can do substantially better than four items in working memory, but not have a working memory with thousands of slots. That is because working memory is the state in which all things in memory are related to each other in one (or two) particular ways, e.g. social relationships between different people or variables in a mathematical system, and that setup creates a combinatorial explosion. Sometimes such variables can be partitioned that the combinatorial explosion isn't as big of a problem because partitions can be dealt with independently (cf. companies as such Coasean partitions vs. the global optimization in a planned economy), but then it's several different working memories (or a hierarchy of those). If we say a main sequence star makes available (at Landauer limit) bit erasures per second. If we want to track all pairwise relationships between WM elements once per second, that gives us $\approx 10^{23}$ elements available. If we want to track all subsets/their relations, the level is far lower, at ~157 elements.

(I'm playing a bit fast and loose here with the exact numbers, but in the spirit of speed…)

I think that superintelligences are probably going to come up with very clever ways of partitioning elements in reasoning tasks, but for very gnarly and inter-related problems working memory slots might become scarce.

buck on johnswentworth's Shortform

Most of the problems you discussed here more easily permit hacky solutions than scheming does.

buck on johnswentworth's Shortform

IMO the main argument for focusing on scheming risk is that scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful (as I discuss here [AF · GW]). These other problems all seem like they require the models to be way smarter in order for them to be a big problem. Though as I said here [LW(p) · GW(p)], I'm excited for work on some non-scheming misalignment risks.

raemon on On Eating the Sun

It sounds like there's actually like 3-5 different object level places where we're talking about slightly different things. I also updated on the practical aspect from Ryan's comment. So, idk here's a bunch of distinct points.

1.

Ryan Greenblatt's comment [LW · GW] updated me that the energy requirements here are minimal enough that "eating the sun" isn't really going to come up as a consideration for astronomical waste. (Eating the Earth or most of the solar system seems like it still might be. But, I agree we shouldn't Eat the Earth)

2.

I'd interpreted most past comments for nearterm (i.e. measured in decades) crazy shit to be about building Dyson spheres, not Star Lifting. (i.e. I expected the '20 years from now in some big ol' computer' in the solstice song to be about dyson spheres and voluntary uploads). I think many people will still freak out about Dyson Sphering the sun (not sure if you would). I would personally argue "it's just pretty damn important to Dyson Sphere the sun even if it makes people uncomfortable (while designing it such that Earth still gets enough light)."

3.

I agree in 1000 years it won't much matter whether you Starlift, for astronomical waste reasons. But I do expect in 1000 years, even assuming a maximally consent-oriented / conservative-with-regards-to-bio-human-values, and all around "good" outcome, most people will have shifted to running on computronium and experienced much more than 1000 years of subjective time and their intuitions about what's good will just be real different. There may be small groups of people who continue living in bio-world but most of them will still probably be pretty alien by our lights.

I think I do personally hope they preserve the Earth as sanctuary and/or historical relic. But I think there's a lot of compromises like "starlift a lot of material out of the sun, but move the Earth closer to the sun to compensate" (I haven't looked into the physics here, the details are obviously cruxy).

When I imagine any kind of actual realistic future that isn't maximally conservative (i.e. the bio humans are < .1% of the solar system's population and just don't have that much bargaining power), it seems even more likely that they'll at least work on compromise solutions that preserve a reasonable Earth experience but eat a bunch of the sun, if there turn out to be serious tradeoffs there. (Again I don't actually know enough physics here and I'm recently humbled by remembering the Eternity in Six Hours paper, maybe there's literally no tradeoffs here, but, I'd still doubt it)

4.

It sounds like it's not particularly cruxy anymore, but, I think the "0.00000004% of the Earth's current population" analogy is just quite different. 80 trillion suns is involves more value than has ever been had before, 3 lives is (relatively) insignificant compared to many political compromises we've made, even going back thousands of years. Maybe whatever descendants get to reap that value are so alien that they just don't count as valuable by today's lights, and it's reasonable to have some extreme time discounting here, but, if any values-you-care-about survived it would be huge.

I agree both morally and practically with "it's way more important to make sure we have good global coordination systems that don't predictably either descend into a horrible totalitarianism, or trigger a race for power that causes horrible wars or other bad things, than to get those 80 trillion suns." But, like, the 80 trillion suns are still a big deal.

5.

I'll note it's also not a boolean whether we "bulldoze the earth" or "bulldoze the rest of the solar system" for rushing to build a dyson sphere. You can start the process with a bunch of mining in some remote mountain regions or whatever without eating the whole earth. (But I think it might be bad to do this because "don't harvest Earth" is just a nice simple Schelling rule and once you start haggling over the details I do get a lot more worried)

6.

I recall reading it's actually maybe cheaper to use asteroids than Mercury to make a dyson sphere because you don't need to expensively lift things out of the gravity well. It is appealing to me if there are no tradeoffs involved with deconstructing any of the charismatic astronomical objects until we've had more time to think/orient/grow-as-a-people.

7.

Part of my outlook here is that I spend the last 14 years being personally uninterested in and scared by the sorts of rapid/crazy/exponential change you're wary of. In the past few years, I've adjusted to be more personally into it. I don't think I would have wanted to rush that grieving/orienting process for Past Me even though it cost me a lot of important time and resources (I'm referring here more to more like stuff in The God of Humanity, and the God of the Robot Utilitarians [LW · GW])

But I do wish I had somehow sped along the non-soulfully-traumatic parts of the process (i.e. some of the updates were more simple/straightforward and if someone had said the right words to me, I think I'd have gotten a strictly better outcome by my original lights).

I expect most humans, given opportunity to experiment on their own terms, will gradually have some kind of perspective shift here (maybe on a longer timescale than Me Among the Rationalists, but, like, <500 years). I don't want people to feel rushed about it, but I think there will be some societal structures that will lend themselves to dallying more and accumulating serious tradeoffs, or less.

benito on Ought We to Be Doing More Than We Are?

I was scrolling for a while, assuming I'd neared the end, only to look at the position of the scrollbar and find I was barely 5% through! This must have taken a fair bit of effort. I really like the helpful page and I'm glad I know about it, I encourage you to make a linkpost for it sometime if you haven't already.

programcrafter on Testing for Scheming with Model Deletion

Doesn't the "threat" to delete the model have to be DT-credible instead of "credible conditioned on being human-made", given that LW with all its discussion about threat resistance and ignoring is in training sets?

(If I remember correctly, a decision theory must ignore "you're threatened to not do X, and the other agent is claiming to respond in such a way that even they lose in expectation" and "another agent [self-]modifies/instantiates an agent making them prefer that you don't do X".)

nim on nim's Shortform

One lens to view AI is as a prediction engine -- predict what color to make each pixel, predict what word to put next.

Whoever is first to applying this predictive skill to stock markets will probably make immense amounts of money. Then again, people are probably already trying to do this, which creates a situation unlike that from which we derive the historic data to train on, which might render it impossible?

On the gripping hand, large slow and powerful institutions want to make the numbers go up and to the right.

maxwell-peterson on Drake Thomas's Shortform

Thanks for putting this together!

I have a vague memory of a post saying that taking zinc early, while virus was replicating in the upper respiratory tract, was much more important than taking it later, because later it would have spread all over the body and thus the zinc can’t get to it, or something like this. So I tend to take a couple early on then stop. But it sounds like you don’t consider that difference important.

Is it your current (Not asking you to do more research!) impression that it’s useful to take zinc throughout the illness?

daniel-tan on Daniel Tan's Shortform

In this context, the “resample ablation” used in AI control is like adding more noise into the communication channel