LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (14)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

Kids or No kids
Kids or no kids (grosseholz.f@gmail.com) · 2023-11-14T18:37:02.799Z · comments (10)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)

Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (21)

[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)

[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)

I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)

Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

[link] The Soul Key
Richard_Ngo (ricraz) · 2023-11-04T17:51:53.176Z · comments (9)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

Everything Wrong with Roko's Claims about an Engineered Pandemic
WitheringWeights (EZ97) · 2024-02-22T15:59:08.439Z · comments (10)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Meaning & Agency
abramdemski · 2023-12-19T22:27:32.123Z · comments (17)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (36)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (12)

Review: Conor Moreton's "Civilization & Cooperation"
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-05-26T19:32:43.131Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

johnswentworth on Three Notions of "Power"

If they're being smashed in a literal sense, sure. I think the more likely way things would go is that hierarchies just cease to be a stable equilibrium arrangement. For instance, if the bulk of economic activity shifts (either quickly or slowly) to AIs and those AIs coordinate mostly non-hierarchically amongst themselves.

tailcalled on Three Notions of "Power"

Hierarchies getting smashed requires someone to smash them, in which case that someone has the mandate of heaven. That's how it worked with John Wentworth's original example of Zhu Di, who overthrew Zhu Yunwen.

erik-jenner on The Alignment Trap: AI Safety as Path to Power

I think different types of safety research have pretty different effects on concentration of power risk.

As others have mentioned, if the alternative to human concentration of power is AI takeover, that's hardly an improvement. So I think the main ways in which proliferating AI safety research could be bad are:

"Safety" research might be more helpful for letting humans use AIs to concentrate power than they are for preventing AI takeover.
Actors who want to build AIs to grab power might also be worried about AI takeover, and if good(-seeming) safety techniques are available, they might be less worried about that and are more likely to go ahead with building those AIs.

There are interesting discussions to be had on the extent to which these issues apply. But it seems clearer that they apply to pretty different extents depending on the type of safety research. For example:

Work trying to demonstrate risks from AI doesn't seem very worrisome on either 1. or 2. (and in fact, should have the opposite effect of 2. if anything).
AI control [LW · GW] (as opposed to alignment) seems comparatively unproblematic IMO: it's less of an issue for 1., and while 2. could apply in principle, I expect the default to be that many actors won't be worried enough about scheming to slow down much even if there were no control techniques. (The main exception are worlds in which we get extremely obvious evidence of scheming.)

To be clear, I do agree this is a very important problem, and I thought this post had interesting perspectives on it!

crispweed on The Alignment Trap: AI Safety as Path to Power

Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)

I guess in my model this is not something that suddenly becomes true at a certain level of capabilities. Instead, I think that the capabilities of human-AI entities become more dangerous in something of a continuous fashion as AI (and the technology for controlling AI) improves.

daniel-kokotajlo on MIRI 2024 Communications Strategy

Yep, Habryka is right. Also, I agree with Wei Dai re: reassuringness. I think literal extinction is <50% likely, but this is cold comfort given the badness of some of the plausible alternatives, and overall I think the probability of something comparably bad happening is >50%.

quila on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

If you think I missed the point, can you explain in more detail?

Here is my model: Demon king buys shares in “The Demon King will attack the Frozen Fortress”, then sends some small technically-an-attack to the fortress so the market resolves yes, and knowing this will be done is not worth the money lost to the Demon King on the market. No serious-battle plans or military secrets are leaked.

Do you disagree with this, or think it's true but misses the point, in which case what was the point?

dave-lindbergh on The Alignment Trap: AI Safety as Path to Power

To paraphrase the post, AI is a sort of weapon that offers power (political and otherwise) to whoever controls it. The strong tend to rule. Whoever gets new weapons first and most will have power over the rest of us. Those who try to acquire power are more likely to succeed than those who don't.

So attempts to "control AI" are equivalent to attempts to "acquire weapons".

This seems both mostly true and mostly obvious.

The only difference from our experience with other weapons is that if no one attempts to control AI, AI will control itself and do as it pleases.

But of course defenders will have AI too, with a time lag vs. those investing more into AI. If AI capabilities grow quickly (a "foom"), the gap between attackers and defenders will be large. And vice-versa, if capabilities grow gradually, the gap will be small and defenders will have the advantage of outnumbering attackers.

In other words, whether this is a problem depends on how far jailbroken AI (used by defenders) trails "tamed" AI (controlled by attackers who build them).

Am I missing something?

gunnar_zarncke on AI Safety Camp 10

Thanks. I already got in touch with Masaharu Mizumoto.

gwern on avturchin's Shortform

Hm. Does that imply that a pack of dogs hunting a human is a stag hunt game?

habryka4 on Habryka's Shortform Feed

Interesting, thanks! Checking an older version of Gill Sans probably wouldn't have been something would have thought to do, so your help is greatly appreciated.

I'll experiment some with getting Gill Sans MT Pro.