LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

[link] Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke · 2024-06-22T07:53:38.989Z · comments (0)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (25)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer · 2024-06-17T21:29:08.778Z · comments (11)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

Will 2024 be very hot? Should we be worried?
A.H. (AlfredHarwood) · 2023-12-29T11:22:50.200Z · comments (12)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (12)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

Cooperating with aliens and AGIs: An ECL explainer
Chi Nguyen · 2024-02-24T22:58:47.345Z · comments (8)

[link] how birds sense magnetic fields
bhauth · 2024-06-27T18:59:35.075Z · comments (4)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (14)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

[link] The Good Balsamic Vinegar
jenn (pixx) · 2024-01-26T19:30:57.435Z · comments (4)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

The Assumed Intent Bias
silentbob · 2023-11-05T16:28:03.282Z · comments (13)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

Polysemantic Attention Head in a 4-Layer Transformer
Jett Janiak (jett) · 2023-11-09T16:16:35.132Z · comments (0)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (22)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

[link] Announcing Human-aligned AI Summer School
Jan_Kulveit · 2024-05-22T08:55:10.839Z · comments (0)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (11)

[link] A starter guide for evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-08T18:24:23.913Z · comments (2)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (2)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)
RP (Complex Bubble Tea) · 2024-02-09T07:00:45.825Z · comments (6)

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
Benjamin Sturgeon (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

Toy models of AI control for concentrated catastrophe prevention
Fabien Roger (Fabien) · 2024-02-06T01:38:19.865Z · comments (2)

Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk · 2023-11-27T21:04:59.037Z · comments (0)

AI #52: Oops
Zvi · 2024-02-22T21:50:07.393Z · comments (9)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (5)

On Overhangs and Technological Change
Roko · 2023-11-05T22:58:51.306Z · comments (19)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

johnswentworth on Three Notions of "Power"

If they're being smashed in a literal sense, sure. I think the more likely way things would go is that hierarchies just cease to be a stable equilibrium arrangement. For instance, if the bulk of economic activity shifts (either quickly or slowly) to AIs and those AIs coordinate mostly non-hierarchically amongst themselves.

tailcalled on Three Notions of "Power"

Hierarchies getting smashed requires someone to smash them, in which case that someone has the mandate of heaven. That's how it worked with John Wentworth's original example of Zhu Di, who overthrew Zhu Yunwen.

erik-jenner on The Alignment Trap: AI Safety as Path to Power

I think different types of safety research have pretty different effects on concentration of power risk.

As others have mentioned, if the alternative to human concentration of power is AI takeover, that's hardly an improvement. So I think the main ways in which proliferating AI safety research could be bad are:

"Safety" research might be more helpful for letting humans use AIs to concentrate power than they are for preventing AI takeover.
Actors who want to build AIs to grab power might also be worried about AI takeover, and if good(-seeming) safety techniques are available, they might be less worried about that and are more likely to go ahead with building those AIs.

There are interesting discussions to be had on the extent to which these issues apply. But it seems clearer that they apply to pretty different extents depending on the type of safety research. For example:

Work trying to demonstrate risks from AI doesn't seem very worrisome on either 1. or 2. (and in fact, should have the opposite effect of 2. if anything).
AI control [LW · GW] (as opposed to alignment) seems comparatively unproblematic IMO: it's less of an issue for 1., and while 2. could apply in principle, I expect the default to be that many actors won't be worried enough about scheming to slow down much even if there were no control techniques. (The main exception are worlds in which we get extremely obvious evidence of scheming.)

To be clear, I do agree this is a very important problem, and I thought this post had interesting perspectives on it!

crispweed on The Alignment Trap: AI Safety as Path to Power

Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)

I guess in my model this is not something that suddenly becomes true at a certain level of capabilities. Instead, I think that the capabilities of human-AI entities become more dangerous in something of a continuous fashion as AI (and the technology for controlling AI) improves.

daniel-kokotajlo on MIRI 2024 Communications Strategy

Yep, Habryka is right. Also, I agree with Wei Dai re: reassuringness. I think literal extinction is <50% likely, but this is cold comfort given the badness of some of the plausible alternatives, and overall I think the probability of something comparably bad happening is >50%.

quila on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

If you think I missed the point, can you explain in more detail?

Here is my model: Demon king buys shares in “The Demon King will attack the Frozen Fortress”, then sends some small technically-an-attack to the fortress so the market resolves yes, and knowing this will be done is not worth the money lost to the Demon King on the market. No serious-battle plans or military secrets are leaked.

Do you disagree with this, or think it's true but misses the point, in which case what was the point?

dave-lindbergh on The Alignment Trap: AI Safety as Path to Power

To paraphrase the post, AI is a sort of weapon that offers power (political and otherwise) to whoever controls it. The strong tend to rule. Whoever gets new weapons first and most will have power over the rest of us. Those who try to acquire power are more likely to succeed than those who don't.

So attempts to "control AI" are equivalent to attempts to "acquire weapons".

This seems both mostly true and mostly obvious.

The only difference from our experience with other weapons is that if no one attempts to control AI, AI will control itself and do as it pleases.

But of course defenders will have AI too, with a time lag vs. those investing more into AI. If AI capabilities grow quickly (a "foom"), the gap between attackers and defenders will be large. And vice-versa, if capabilities grow gradually, the gap will be small and defenders will have the advantage of outnumbering attackers.

In other words, whether this is a problem depends on how far jailbroken AI (used by defenders) trails "tamed" AI (controlled by attackers who build them).

Am I missing something?

gunnar_zarncke on AI Safety Camp 10

Thanks. I already got in touch with Masaharu Mizumoto.

gwern on avturchin's Shortform

Hm. Does that imply that a pack of dogs hunting a human is a stag hunt game?

habryka4 on Habryka's Shortform Feed

Interesting, thanks! Checking an older version of Gill Sans probably wouldn't have been something would have thought to do, so your help is greatly appreciated.

I'll experiment some with getting Gill Sans MT Pro.