LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (42)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (92)

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (18)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)

Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

Contra papers claiming superhuman AI forecasting
nikos (followtheargument) · 2024-09-12T18:10:50.582Z · comments (16)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

[question] Why is o1 so deceptive?
abramdemski · 2024-09-27T17:27:35.439Z · answers+comments (24)

[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)

Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (38)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)

This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (23)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (90)

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

My motivation and theory of change for working in AI healthtech
Andrew_Critch · 2024-10-12T00:36:30.925Z · comments (37)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (64)

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (25)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper · 2024-05-21T20:15:36.502Z · comments (16)

What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

karl-krueger on If all trade is voluntary, then what is "exploitation?"

I use capitalism in a manner mutually exclusive with slave labor because it requires self-ownership.

This seems like a sort of definitional gimbal lock; it makes it harder to describe the world because two potentially-separate degrees of freedom are collapsed into one. While I'm reluctant to argue definitions [LW · GW], I think it's worth using terms in ways that allow us to describe the world in more detail than ones that collapse distinctions.

I expect to see this usage of "capitalism" not in history or economics, but in the sort of political doctrine where it's intended to lock those concepts together; to imply that capital markets and individual freedom are either the same thing, or closely related — more closely, I think, than history and contemporary events really support.

It would seem weird to me, for instance, to claim that a publicly-traded company that is discovered to have done something to violate individual freedom is thereby no longer a participant in a capitalist economy. The New York Stock Exchange doesn't ask "does this company infringe individual freedoms anywhere in the world?" before letting a company be listed. (To be clear, I'm not proposing that it should; I'm saying that it's useful to talk about "participation in a capital market economy" and "fully respecting some set of individual freedoms" as distinct axes.)

(For what it's worth, I think "self-ownership" is a pretty odd expression, because one of the central traits of ownership is that it can be transferred, and one of the central traits of selfhood is that it cannot. Your relation to yourself is distinct from property ownership in that you can sell any piece of your property, but you cannot sell your self; no matter what obligations you may have signed up for, you always retain possession of your self.)

vladimir_nesov on o3

Test time compute is applied in-context, so it's very worthwhile to scale, getting better at better at solving a particular problem, to the extent that no amount of pretraining [LW(p) · GW(p)] would be able to match with only modest test-time compute.

sodium on Shallow review of technical AI safety, 2024

Pr(Ai)2R is at least partially funded by Good Ventures/OpenPhil

moridinamael on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

This post resonated with me when it came out, and I think its thesis only seems more credible with time. Anthropic's seminal "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" (the Golden Gate Claude paper) seems right in line with these ideas. We can make scrutable the inscrutable as long as the inscrutable takes the form of something organized and regular and repeatable.

This article gets bonus points for me for being succinct and while still making its argument clearly.

qvalq on You Provably Can't Trust Yourself

How/does this square with https://arxiv.org/abs/1902.07404?
IIUC, Gödel's Second Incompleteness Theorem was overinterpreted, and a different operationalization of consistency is provable.

I talked to Mihály Bárász about that, and he didn't think it was crazy.

mateusz-baginski on Alexander Gietelink Oldenziel's Shortform

Insufficiently catchy

akash-wasil on evhub's Shortform

I'm glad you're doing this, and I support many of the ideas already suggested. Some additional ideas:

Interview program. Work with USAISI or UKAISI (or DHS/NSA) to pilot an interview program in which officials can ask questions about AI capabilities, safety and security threats, and national security concerns. (If it's not feasible to do this with a government entity yet, start a pilot with a non-government group– perhaps METR, Apollo, Palisade, or the new AI Futures Project.)
Clear communication about RSP capability thresholds. I think the RSP could do a better job at outlining the kinds of capabilities that Anthropic is worried about and what sorts of thresholds would trigger a reaction. I think the OpenAI preparedness framework tables are a good example of this kind of clear/concise communication. It's easy for a naive reader to quickly get a sense of "oh, this is the kind of capability that OpenAI is worried about." (Clarification: I'm not suggesting that Anthropic should abandon the ASL approach or that OpenAI has necessarily identified the right capability thresholds. I'm saying that the tables are a good example of the kind of clarity I'm looking for– someone could skim this and easily get a sense of what thresholds OpenAI is tracking, and I think OpenAI's PF currently achieves this much more than the Anthropic RSP.)
Emergency protocols. Publishing an emergency protocol that specifies how Anthropic would react if it needed to quickly shut down a dangerous AI system. (See some specific prompts in the "AI developer emergency response protocol" section here). Some information can be redacted from a public version (I think it's important to have a public version, though, partly to help government stakeholders understand how to handle emergency scenarios, partly to raise the standard for other labs, and partly to acquire feedback from external groups.)
RSP surveys. Evaluate the extent to which Anthropic employees understand the RSP, their attitudes toward the RSP, and how the RSP affects their work. More on this here [LW(p) · GW(p)].
More communication about Anthropic's views about AI risks and AI policy. Some specific examples of hypothetical posts I'd love to see:
- "How Anthropic thinks about misalignment risks"
- "What the world should do if the alignment problem ends up being hard"
- "How we plan to achieve state-proof security before AGI"
- Encouraging more employees to share their views on various topics, EG Sam Bowman's post [LW · GW].
AI dialogues/debates. It would be interesting to see Anthropic employees have discussions/debates from other folks thinking about advanced AI. Hypothetical examples:
- "What are the best things the US government should be doing to prepare for advanced AI" with Jack Clark and Daniel Kokotajlo.
- "Should we have a CERN for AI?" with [someone from Anthropic] and Miles Brundage.
- "How difficult should we expect alignment to be" with [someone from Anthropic] and [someone who expects alignment to be harder; perhaps Jeffrey Ladish or Malo Bourgon].

More ambitiously, I feel like I don't really understand Anthropic's plan for how to manage race dynamics in worlds where alignment ends up being "hard enough to require a lot more than RSPs and voluntary commitments."

From a policy standpoint, several of the most interesting open questions seem to be along the lines of "under what circumstances should the USG get considerably more involved in overseeing certain kinds of AI development" and "conditional on the USG wanting to get way more involved, what are the best things for it to do?" It's plausible that Anthropic is limited in how much work it could do on these kinds of questions (particularly in a public way). Nonetheless, it could be interesting to see Anthropic engage more with questions like the ones Miles raises here.

mateusz-baginski on Daniel Tan's Shortform

Something like "We have mapped out the possible human-understandable or algorithmically neat descriptions of the network's behavior sufficiently comprehensively and sampled from this space sufficiently comprehensively to know that the probability that there's a description of its behavior that is meaningfully shorter than the shortest one of the ones that we've found is at most .".

nathan-helm-burger on Shortform

As a grad student in neuroscience I got the opportunity to sit in on some forensic histology, and it was really fascinating. Occasionally you can figure out quite insightful things about cause of death from looking at brain samples under a microscope. Other times you get a simple "yep, looks like this sample approximately agrees with the estimated time of death, nothing unusual here."

niknoble on By default, capital will matter more than ever after AGI

Even if saving money through AGI converts 1:1 into money after the singularity, it will probably be worth less in utility to you:

You'll probably be able to buy planets post-AGI for the price of houses today. More generally your selfish and/or local and/or personal preferences will be fairly easily satisfiable even with small amounts of money, or to put it in other words, there are massive diminishing returns.

No one will be buying planets for the novelty or as an exotic vacation destination. The reason you buy a planet is to convert it into computing power, which you then attach to your own mind. If people aren't explicitly prevented from using planets for that purpose, then planets are going to be in very high demand, and very useful for people on a personal level.