LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

Is suffering like shit?
KatjaGrace · 2024-05-31T01:20:03.855Z · comments (5)

How I build and run behavioral interviews
benkuhn · 2024-02-26T05:50:05.328Z · comments (6)

Being against involuntary death and being open to change are compatible
Andy_McKenzie · 2024-05-27T06:37:27.644Z · comments (5)

[link] New Tool: the Residual Stream Viewer
AdamYedidia (babybeluga) · 2023-10-01T00:49:51.965Z · comments (7)

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare
trevor (TrevorWiesinger) · 2023-10-30T16:30:38.020Z · comments (0)

Some of my predictable updates on AI
Aaron_Scher · 2023-10-23T17:24:34.720Z · comments (8)

[question] How unusual is the fact that there is no AI monopoly?
Viliam · 2024-08-16T20:21:51.012Z · answers+comments (15)

[link] A computational complexity argument for many worlds
jessicata (jessica.liu.taylor) · 2024-08-13T19:35:10.116Z · comments (15)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

Book Review: On the Edge: The Gamblers
Zvi · 2024-09-24T11:50:06.065Z · comments (1)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

Some Things That Increase Blood Flow to the Brain
romeostevensit · 2024-03-27T21:48:46.244Z · comments (14)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

Features and Adversaries in MemoryDT
Joseph Bloom (Jbloom) · 2023-10-20T07:32:21.091Z · comments (6)

UDT1.01: Plannable and Unplanned Observations (3/10)
Diffractor · 2024-04-12T05:24:34.435Z · comments (0)

[question] When did Eliezer Yudkowsky change his mind about neural networks?
[deactivated] (Yarrow Bouchard) · 2023-11-14T21:24:00.000Z · answers+comments (15)

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
Miles Turpin (miles) · 2023-10-03T02:22:00.199Z · comments (0)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic · 2024-05-20T09:38:55.228Z · comments (2)

Different views of alignment have different consequences for imperfect methods
Stuart_Armstrong · 2023-09-28T16:31:20.239Z · comments (0)

Game Theory without Argmax [Part 2]
Cleo Nardo (strawberry calm) · 2023-11-11T16:02:41.836Z · comments (14)

[link] Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund
Zach Stein-Perlman · 2023-10-25T15:20:52.765Z · comments (8)

Late-talking kid part 3: gestalt language learning
Steven Byrnes (steve2152) · 2023-10-17T02:00:05.182Z · comments (5)

Superforecasting the premises in “Is power-seeking AI an existential risk?”
Joe Carlsmith (joekc) · 2023-10-18T20:23:51.723Z · comments (3)

Retrospective: PIBBSS Fellowship 2023
DusanDNesic · 2024-02-16T17:48:32.151Z · comments (1)

AI's impact on biology research: Part I, today
octopocta · 2023-12-23T16:29:18.056Z · comments (6)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

Attention Output SAEs Improve Circuit Analysis
Connor Kissane (ckkissane) · 2024-06-21T12:56:07.969Z · comments (0)

Falling fertility explanations and Israel
Yair Halberstadt (yair-halberstadt) · 2024-04-03T03:27:38.564Z · comments (4)

On Not Requiring Vaccination
jefftk (jkaufman) · 2024-02-01T19:20:12.657Z · comments (21)

Video and transcript of presentation on Scheming AIs
Joe Carlsmith (joekc) · 2024-03-22T15:52:03.311Z · comments (1)

The Byronic Hero Always Loses
Cole Wyeth (Amyr) · 2024-02-22T01:31:59.652Z · comments (4)

I was raised by devout Mormons, AMA [&|] Soliciting Advice
ErioirE (erioire) · 2024-03-13T16:52:19.130Z · comments (41)

[link] Aaron Silverbook on anti-cavity bacteria
DanielFilan · 2023-11-20T03:06:19.524Z · comments (3)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

Why wasn't preservation with the goal of potential future revival started earlier in history?
Andy_McKenzie · 2024-01-16T16:15:08.550Z · comments (1)

Comparing Quantized Performance in Llama Models
NickyP (Nicky) · 2024-07-15T16:01:24.960Z · comments (2)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

How Would an Utopia-Maximizer Look Like?
Thane Ruthenis · 2023-12-20T20:01:18.079Z · comments (23)

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

[link] Self-Resolving Prediction Markets
PeterMcCluskey · 2024-03-03T02:39:42.212Z · comments (0)

Mapping the semantic void II: Above, below and between token embeddings
mwatkins · 2024-02-15T23:00:09.010Z · comments (4)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

[link] New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
Harlan · 2024-04-04T23:41:26.439Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

thomas-kwa on 2024 Petrov Day Retrospective

As Andropov, the game ceased to be interesting for me around 2:30pm, but I was still in a tense mood, which I leveraged into writing a grim "Petrov Day carol" [LW(p) · GW(p)] about the nuclear winter we might have seen. I cried for the first time in a while. There's a big difference between being likely to win the game and being emotionally not stressed about it, especially when the theme is nuclear war.

cubefox on An Interactive Shapley Value Explainer

Thanks, this looks good. I agree on the reason for the factorial method of weighting being very unclear. Unfortunately most sources, like Wikipedia, don't really try to explain the motivation behind it.

But it seems to me there is a fundamental problem with Shapley values: They ignore probabilities. For example, they assume all necessary conditions are equally important. But this doesn't seem right.

Say a successful surgery needs both a surgeon and a nurse, otherwise the patient dies. Shapley values (or really any purely counterfactuals account of causal contribution) will then assume that both have contributed equally to the successful surgery. Which arguably isn't true, because surgeons are much less available (less replaceable) than nurses. If the nurse becomes ill, a different nurse could probably do the task, while if the surgeon becomes ill, the surgery may well have to be postponed. So the surgeon is more important for (contribute more to) a successful surgery.

This apparently comes down to probability. The task of the surgeon was, from a prior perspective, less likely to be fulfilled because it is harder to get hold of a surgeon.

Similarly: What what was the main cause of the match getting lit? The match being struck? Or the atmosphere containing oxygen? Both are necessary in the sense that if either hadn't occurred the match wouldn't be lit. So their Shapley values would be equal. But it seems clear that the main cause was the match being struck. Why? Because matches being struck is relatively rare, and hence unlikely, while the atmosphere contains oxygen pretty much always.

So I think the contributions calculated for Shapley values should be weighted by something like the inverse of their prior probabilities.

Other notes:

There is another Shapley value calculator. The website isn't designed as well as the one above, but it contains several examples and it allows assigning a value to the empty coalition, instead assigning it always the value 0.
Several years ago, the EA Forum user George Bridgewater actually pointed out an issue with Shapley values (point 1 here [EA(p) · GW(p)]) which is very similar to the one I made above, though he didn't identify missing probabilities as the problem.

error on Cryonics is free

This trips my too-good-to-be-true alarms, but has my provisional attention anyway. The main reasons I'm not signed up for cryonics are cost, inconvenience, and s-risks. Eliminating cost (and cost-related inconveniences) could move me...but I want to know how this institution differs such that they can offer such storage at low or no cost, where others don't or can't.

sharmake-farah on 'Empiricism!' as Anti-Epistemology

I basically endorse argument 1, and one other update you haven't mentioned but which is important is that the values of a human turn out to be less complicated and fragile, and more generalizable than people thought (this is because human values data is likely a small part of GPT-4, and yet it can correctly answer a lot of morality questions, and I think LLMs are genuinely learning new regularities here, so they can generalize from their training data).

Implications for AI risk of course abound.

dw2 on Cryonics is free

In this recent podcast interview, Jordan Sparks, the founder and executive director of Oregon Brain Preservation (OBP), gives more information about the low-cost services OBP provides https://londonfuturists.buzzsprout.com/2028982/episodes/15517037-the-low-cost-future-of-preserving-brains-with-jordan-sparks

tailcalled on tailcalled's Shortform

Thesis: The motion of the planets are the strongest governing factor for life on Earth.

Reasoning: Time-series data often shows strong changes with the day and night cycle, and sometimes also with the seasons. The daily cycle and the seasonal cycle are governed by the relationship between the Earth and the sun. The Earth is a planet, and so its movement is part of the motion of the planets.

ryan_greenblatt on COT Scaling implies slower takeoff speeds

I don't think "continuous" is self-evident or consistently used to refer to "a longer gap from human-expert level AI to very superhuman AI". For instance, in the very essay you link, Tom argues that "continuous" (and fairly predictable) doesn't imply that this gap is long!

ryan_greenblatt on 'Empiricism!' as Anti-Epistemology

What are the close-by arguments that are actually reasonable? Here is a list of close-by arguments (not necessarily endorsed by me!):

On empirical updates from current systems: If current AI systems are broadly pretty easy to steer and there is good generalization of this steering, that should serve as some evidence that future more powerful AI systems will also be relatively easier to steer. This will help prevent concerns like scheming from arising in the first place or make these issues easier to remove.
- This argument holds to some extent regardless of whether current AIs are smart enough to think through and successfully execute scheming strategies. For instance, imagine we were in a world where steering current AIs was clearly extremely hard: AIs would quickly overfit and goodhart training processes, RLHF was finicky and had terrible sample efficiency, and AIs were much worse at sample efficiently updating on questions about human deontological constraints relative to questions about how to successfully accomplish other tasks. In such a world, I think we should justifiably be more worried about future systems.
- And in fact, people do argue about how hard it is to steer current systems and what this implies. For an example of a version of an argument like this, see here [LW(p) · GW(p)], though note that I disagree with various things.
- It's pretty unclear what predictions Eliezer made about the steerability of future AI systems and he should lose some credit for not making clear predictions. Further, my sense is his implied predictions don't look great. (Paul's predictions as of about 2016 seem pretty good from my understanding, though not that clearly laid out, and are consistent with his threat models.)
On unfalsifiability: It should be possible to empirically produce evidence for or against scheming prior to it being too late. The fact that MIRI-style doom views often don't discuss predictions about experimental evidence and also don't make reasonably convincing arguments that it will be very hard to produce evidence in test beds is concerning. It's a bad sign if advocates for a view don't try hard to make it falsifiable prior to that view implying aggressive action.
- My view is that we'll probably get a moderate amount of evidence on scheming prior to catastrophe (perhaps 3x update either way), with some chance that scheming will basically be confirmed in an earlier model. And, it is in principle possible to obtain certainty either way about scheming using experiments, though this might be very tricky and a huge amount of work for various reasons.
On empiricism: There isn't good empirical evidence for scheming and instead the case for scheming depends on dubious arguments. Conceptual arguments have a bad track record, so to estimate the probability of scheming we should mostly guess based on the most basic and simple conceptual arguments and weight more complex arguments very little. If you do this, scheming looks unlikely.
- I roughly half agree with this argument, but I'd note that you also have to discount conceptual arguments against scheming in the same way and that the basic and simple conceptual arguments seem to indicate that scheming isn't that unlikely. (I'd say around 25%.)

ryan_greenblatt on 'Empiricism!' as Anti-Epistemology

I find this essay interesting as a case study in discourse and argumentation norms. Particularly as a case study of issues with discourse around AI risk.

When I first skimmed this essay when it came out, I thought it was ok, but mostly uninteresting or obvious. Then, on reading the comments and looking back at the body, I thought it did some pretty bad strawmanning.

I reread the essay yesterday and now I feel quite differently. Parts (i), (ii), and (iv) which don't directly talk about AI are actually great and many of the more subtle points are pretty well executed. The connection to AI risk in part (iii) is quite bad and notably degrades the essay as a whole. I think a well-executed connection to AI risk would have been good. Part (iii) seems likely to contribute to AI risk being problematically politicized and negatively polarized (e.g. low quality dunks and animosity). Further, I think this is characteristic of problems I have with the current AI risk discourse.

In parts (i), (ii), and (iv), it is mostly clear that the Spokesperson is an exaggerated straw person who doesn't correspond to any particular side of an issue. This seems like a reasonable rhetorical move to better explain a point. However, part (iii) has big issues in how it connects the argument to AI risk. Eliezer ends up defeating a specific and weak argument against AI risk. This is an argument that actually does get made, but unfortunately, he both associates this argument with the entire view of AI risk skepticism (in the essay, the "AI-permitting faction") and he fails to explain that the debunking doesn't apply to many common arguments which sound similar but are actually reasonable. Correspondingly, the section suffers from an ethnic tension style issue: in practice, it attacks an entire view by associating it with a bad argument for that view (but of course, reversed stupidity is not intelligence [LW · GW]). This issue is made worse because the argument Eliezer attacks is also very similar to various more reasonable arguments that can't be debunked in the same way and Eliezer doesn't clearly call this out. Thus, these more reasonable arguments are attacked in association. It seems reasonable for Eliezer to connect the discussion to AI risk directly, but I think the execution was poor.

I think my concerns are notably similar to the issues people had with The Sun is big, but superintelligences will not spare Earth a little sunlight [LW(p) · GW(p)] and I've encountered similar issues in many things written by Eliezer and Nate.

How could Eliezer avoid these issues? I think when debunking or arguing against bad arguments, you should explain that you're attacking bad arguments and that there exist other commonly made arguments which are better or at least harder to debunk. It also helps to disassociate the specific bad arguments from a general cause or view as much as possible. This essay seems to associate the bad "Empiricism!" argument with AI risk skepticism nearly as much as possible. Whenever you push back against an argument which is similar or similar-sounding to other arguments, but the push back doesn't apply in the same way to those other arguments, it's useful to explicitly spell out the limitations of the push back.^[1] One possible bar to try to reach is that people who disagree strongly on the topic due to other more reasonable arguments should feel happy to endorse the push back against those specific bad arguments.

There is perhaps a bit of a motte-and-bailey with this essay where Eliezer can strongly defend debunking a specific bad argument (the motte), but there is an implication that the argument also pushes against more complex and less clearly bad arguments (the bailey). (I'm not saying that Eliezer actively engages in motte-and-bailey in the essay, just that this essay probably has this property to some extent in practice.) That said there is also perhaps a motte-and-bailey for many of the arguments that Eliezer argues against where the motte is a more complex and narrow argument and the bailey is "Empiricism! is why AI risk is fake".

A thoughtful reader can recognize and avoid their own cognitive biases and correctly think though the exact correct implications of the arguments made here. But, I wish Eliezer did this work for the reader to reduce negative polarization.

When debunking bad arguments, it's useful to be clear about what you aren't covering or implying.

Part (iv) helps to explain the scope of the broader point, but doesn't explain the limitations in specifically the AI case. ↩︎

raemon on Chapter 7: Reciprocation

This one.