LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (216)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (84)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (32)

Overview of strong human intelligence amplification methods
TsviBT · 2024-10-08T08:37:18.896Z · comments (141)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (15)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (123)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (30)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (40)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (24)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (37)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (33)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (60)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (24)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (42)

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (92)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

My motivation and theory of change for working in AI healthtech
Andrew_Critch · 2024-10-12T00:36:30.925Z · comments (37)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (64)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (150)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (32)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

Momentum of Light in Glass
Ben (ben-lang) · 2024-10-09T20:19:42.088Z · comments (44)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (16)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (22)

next page (older posts) →

Archive

Recent comments

karl-krueger on If all trade is voluntary, then what is "exploitation?"

I use capitalism in a manner mutually exclusive with slave labor because it requires self-ownership.

This seems like a sort of definitional gimbal lock; it makes it harder to describe the world because two potentially-separate degrees of freedom are collapsed into one. While I'm reluctant to argue definitions [LW · GW], I think it's worth using terms in ways that allow us to describe the world in more detail than ones that collapse distinctions.

I expect to see this usage of "capitalism" not in history or economics, but in the sort of political doctrine where it's intended to lock those concepts together; to imply that capital markets and individual freedom are either the same thing, or closely related — more closely, I think, than history and contemporary events really support.

It would seem weird to me, for instance, to claim that a publicly-traded company that is discovered to have done something to violate individual freedom is thereby no longer a participant in a capitalist economy. The New York Stock Exchange doesn't ask "does this company infringe individual freedoms anywhere in the world?" before letting a company be listed. (To be clear, I'm not proposing that it should; I'm saying that it's useful to talk about "participation in a capital market economy" and "fully respecting some set of individual freedoms" as distinct axes.)

(For what it's worth, I think "self-ownership" is a pretty odd expression, because one of the central traits of ownership is that it can be transferred, and one of the central traits of selfhood is that it cannot. Your relation to yourself is distinct from property ownership in that you can sell any piece of your property, but you cannot sell your self; no matter what obligations you may have signed up for, you always retain possession of your self.)

vladimir_nesov on o3

Test time compute is applied in-context, so it's very worthwhile to scale, getting better at better at solving a particular problem, to the extent that no amount of pretraining [LW(p) · GW(p)] would be able to match with only modest test-time compute.

sodium on Shallow review of technical AI safety, 2024

Pr(Ai)2R is at least partially funded by Good Ventures/OpenPhil

moridinamael on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

This post resonated with me when it came out, and I think its thesis only seems more credible with time. Anthropic's seminal "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" (the Golden Gate Claude paper) seems right in line with these ideas. We can make scrutable the inscrutable as long as the inscrutable takes the form of something organized and regular and repeatable.

This article gets bonus points for me for being succinct and while still making its argument clearly.

qvalq on You Provably Can't Trust Yourself

How/does this square with https://arxiv.org/abs/1902.07404?
IIUC, Gödel's Second Incompleteness Theorem was overinterpreted, and a different operationalization of consistency is provable.

I talked to Mihály Bárász about that, and he didn't think it was crazy.

mateusz-baginski on Alexander Gietelink Oldenziel's Shortform

Insufficiently catchy

akash-wasil on evhub's Shortform

I'm glad you're doing this, and I support many of the ideas already suggested. Some additional ideas:

Interview program. Work with USAISI or UKAISI (or DHS/NSA) to pilot an interview program in which officials can ask questions about AI capabilities, safety and security threats, and national security concerns. (If it's not feasible to do this with a government entity yet, start a pilot with a non-government group– perhaps METR, Apollo, Palisade, or the new AI Futures Project.)
Clear communication about RSP capability thresholds. I think the RSP could do a better job at outlining the kinds of capabilities that Anthropic is worried about and what sorts of thresholds would trigger a reaction. I think the OpenAI preparedness framework tables are a good example of this kind of clear/concise communication. It's easy for a naive reader to quickly get a sense of "oh, this is the kind of capability that OpenAI is worried about." (Clarification: I'm not suggesting that Anthropic should abandon the ASL approach or that OpenAI has necessarily identified the right capability thresholds. I'm saying that the tables are a good example of the kind of clarity I'm looking for– someone could skim this and easily get a sense of what thresholds OpenAI is tracking, and I think OpenAI's PF currently achieves this much more than the Anthropic RSP.)
Emergency protocols. Publishing an emergency protocol that specifies how Anthropic would react if it needed to quickly shut down a dangerous AI system. (See some specific prompts in the "AI developer emergency response protocol" section here). Some information can be redacted from a public version (I think it's important to have a public version, though, partly to help government stakeholders understand how to handle emergency scenarios, partly to raise the standard for other labs, and partly to acquire feedback from external groups.)
RSP surveys. Evaluate the extent to which Anthropic employees understand the RSP, their attitudes toward the RSP, and how the RSP affects their work. More on this here [LW(p) · GW(p)].
More communication about Anthropic's views about AI risks and AI policy. Some specific examples of hypothetical posts I'd love to see:
- "How Anthropic thinks about misalignment risks"
- "What the world should do if the alignment problem ends up being hard"
- "How we plan to achieve state-proof security before AGI"
- Encouraging more employees to share their views on various topics, EG Sam Bowman's post [LW · GW].
AI dialogues/debates. It would be interesting to see Anthropic employees have discussions/debates from other folks thinking about advanced AI. Hypothetical examples:
- "What are the best things the US government should be doing to prepare for advanced AI" with Jack Clark and Daniel Kokotajlo.
- "Should we have a CERN for AI?" with [someone from Anthropic] and Miles Brundage.
- "How difficult should we expect alignment to be" with [someone from Anthropic] and [someone who expects alignment to be harder; perhaps Jeffrey Ladish or Malo Bourgon].

More ambitiously, I feel like I don't really understand Anthropic's plan for how to manage race dynamics in worlds where alignment ends up being "hard enough to require a lot more than RSPs and voluntary commitments."

From a policy standpoint, several of the most interesting open questions seem to be along the lines of "under what circumstances should the USG get considerably more involved in overseeing certain kinds of AI development" and "conditional on the USG wanting to get way more involved, what are the best things for it to do?" It's plausible that Anthropic is limited in how much work it could do on these kinds of questions (particularly in a public way). Nonetheless, it could be interesting to see Anthropic engage more with questions like the ones Miles raises here.

mateusz-baginski on Daniel Tan's Shortform

Something like "We have mapped out the possible human-understandable or algorithmically neat descriptions of the network's behavior sufficiently comprehensively and sampled from this space sufficiently comprehensively to know that the probability that there's a description of its behavior that is meaningfully shorter than the shortest one of the ones that we've found is at most .".

nathan-helm-burger on Shortform

As a grad student in neuroscience I got the opportunity to sit in on some forensic histology, and it was really fascinating. Occasionally you can figure out quite insightful things about cause of death from looking at brain samples under a microscope. Other times you get a simple "yep, looks like this sample approximately agrees with the estimated time of death, nothing unusual here."

niknoble on By default, capital will matter more than ever after AGI

Even if saving money through AGI converts 1:1 into money after the singularity, it will probably be worth less in utility to you:

You'll probably be able to buy planets post-AGI for the price of houses today. More generally your selfish and/or local and/or personal preferences will be fairly easily satisfiable even with small amounts of money, or to put it in other words, there are massive diminishing returns.

No one will be buying planets for the novelty or as an exotic vacation destination. The reason you buy a planet is to convert it into computing power, which you then attach to your own mind. If people aren't explicitly prevented from using planets for that purpose, then planets are going to be in very high demand, and very useful for people on a personal level.