Posts

A Narrow Path: a plan to deal with AI extinction risk 2024-10-07T13:02:15.229Z
[Cross-post] Book Review: Bureaucracy, by James Q Wilson 2024-08-19T13:57:10.872Z
davekasten's Shortform 2024-05-01T22:07:24.997Z

Comments

Comment by davekasten on Akash's Shortform · 2024-11-20T22:13:13.935Z · LW · GW

As you know, I have huge respect for USG natsec folks.  But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the "fuck you, I'm doing Iran-Contra" folks.  Which do you expect will get in control of such a program ?  It's not immediately clear to me which ones would.

Comment by davekasten on Akash's Shortform · 2024-11-20T19:57:42.399Z · LW · GW

I think this is a (c) leaning (b), especially given that we're doing it in public.  Remember, the Manhattan Project was a highly-classified effort and we know it by an innocuous name given to it to avoid attention.  

Saying publicly, "yo, China, we view this as an all-costs priority, hbu" is a great way to trigger a race with China...

But if it turned out that we knew from ironclad intel with perfect sourcing that China was already racing (I don't expect this to be the case), then I would lean back more towards (c).  

Comment by davekasten on davekasten's Shortform · 2024-11-20T03:54:59.477Z · LW · GW

I'll be in Berkeley Weds evening through next Monday, would love to chat with, well, basically anyone who wants to chat. (I'll be at The Curve Fri-Sun, so if you're already gonna be there, come find me there between the raindrops!)

Comment by davekasten on Proposing the Conditional AI Safety Treaty (linkpost TIME) · 2024-11-18T21:35:49.234Z · LW · GW

Thanks, looking forward to it!  Please do let us folks who worked on A Narrow Path (especially me, @Tolga , and @Andrea_Miotti ) know if we can be helpful in bouncing around ideas as you work on the treaty proposal!

Comment by davekasten on Proposing the Conditional AI Safety Treaty (linkpost TIME) · 2024-11-18T13:38:20.925Z · LW · GW

Is there a longer-form version with draft treaty langugage (even an outline)? I'd be curious to read it.

Comment by davekasten on evhub's Shortform · 2024-11-09T04:06:25.048Z · LW · GW

I think people opposing this have a belief that the counterfactual is "USG doesn't have LLMs" instead of "USG spins up its own LLM development effort using the NSA's no-doubt-substantial GPU clusters". 

Needless to say, I think the latter is far more likely.
 

Comment by davekasten on Daniel Kokotajlo's Shortform · 2024-11-06T15:08:25.596Z · LW · GW

I think the thing that you're not considering is that when tunnels are more prevalent and more densely packed, the incentives to use the defensive strategy of "dig a tunnel, then set off a very big bomb in it that collapses many tunnels" gets far higher.  It wouldn't always be infantry combat, it would often be a subterranean equivalent of indirect fires.

Comment by davekasten on davekasten's Shortform · 2024-10-31T20:57:49.379Z · LW · GW

Ok, so Anthropic's new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds or @Evan Hubinger or someone else from Anthropic will, and figure the main convo should happen there, and don't want to incentivize fragmenting of conversation) seems to have a very obvious implication.

Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.
 

Comment by davekasten on davekasten's Shortform · 2024-10-29T04:31:01.005Z · LW · GW

Yup.  The fact that the profession that writes the news sees "I should resign in protest" as their own responsibility in this circumstance really reveals something. 

Comment by davekasten on davekasten's Shortform · 2024-10-28T16:37:21.299Z · LW · GW

At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky , @habryka , and some interlocutors from the frontier labs (you'll momentarily see why I'm being vague on the latter names). 

One question was: "does DC actually listen to whistleblowers?" and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.

Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article "The U.S. Spies Who Sound the Alarm About Election Interference" by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)

The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.

Comment by davekasten on davekasten's Shortform · 2024-10-28T02:02:43.136Z · LW · GW

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc?

I mean, functionally all of those things.  (Well, minus the country dynamic.  Everyone at this event I talked to was US, UK, or Canadian, which is all sorta one team for purposes of status dynamics at that event)

Comment by davekasten on davekasten's Shortform · 2024-10-27T21:20:30.162Z · LW · GW

I was being intentionally broad, here.  I am probably less interested for purposes of this particular post only in the question of "who controls the future" swerves and more about "what else would interested, agentic actors do" questions. 

It is not at all clear to me that OpenPhil is the only org who feels this way -- I can think of several non-EA-ish charities that if they genuinely 100% believed "none of the people you care for will die of the evils you fight if you can just keep them alive for the next 90 days" would plausibly do some interestingly agentic stuff.  

Comment by davekasten on davekasten's Shortform · 2024-10-27T18:01:28.272Z · LW · GW

Oh, to be clear I'm not sure this is at all actually likely, but I was curious if anyone had explored the possibility conditional on it being likely

Comment by davekasten on davekasten's Shortform · 2024-10-27T14:13:52.511Z · LW · GW

Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?

For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future... 

Comment by davekasten on localdeity's Shortform · 2024-10-27T14:11:35.681Z · LW · GW

yup, as @sanxiyn says, this already exists.  Their example is, AIUI, a high-end research one; an actually-on-your-laptop-right-now, but admittedly more narrow example is address space layout randomization.   

Comment by davekasten on Akash's Shortform · 2024-10-23T16:42:31.664Z · LW · GW

Wild speculation: they also have a sort of we're-watching-but-unsure provision about cyber operations capability in their most recent RSP update.  In it, they say in part that "it is also possible that by the time these capabilities are reached, there will be evidence that such a standard is not necessary (for example, because of the potential use of similar capabilities for defensive purposes)."  Perhaps they're thinking that automated vulnerability discovery is at least plausibly on-net-defensive-balance-favorable*, and so they aren't sure it should be regulated as closely, even if in still in some informal sense "dual use" ?

Again, WILD speculation here.  

*A claim that is clearly seen as plausible by, e.g., the DARPA AI Grand Challenge effort.

Comment by davekasten on davekasten's Shortform · 2024-10-22T20:47:28.045Z · LW · GW

It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).  

Why do we think this is the case?
I can imagine at least 3 hypotheses:
1.  Just path-dependence; someone did it, it went well, others imitated

2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas

3. This is a return to the true original meaning of an essay, under Montaigne, that it's an attempt to write thinking down when it's still inchoate, in an effort to make it more comprehensible not only to others but also to oneself.  And AGI/ASI is deeply uncertain, so the essay format is particularly suited for this.

What do you think?

Comment by davekasten on davekasten's Shortform · 2024-10-17T01:04:54.282Z · LW · GW

Okay, I spent much more time with the Anthropic RSP revisions today.  Overall, I think it has two big thematic shifts for me: 

1.  It's way more "professionally paranoid," but needs even more so on non-cyber risks.  A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)

2.  It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result."  You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet). 

Comment by davekasten on davekasten's Shortform · 2024-10-15T16:11:38.525Z · LW · GW

It's a small but positive sign that Anthropic sees taking 3 days beyond their RSP's specified timeframe to conduct a process without a formal exception as an issue.  Signals that at least some members of the team there are extremely attuned to normalization of deviance concerns.

Comment by davekasten on sarahconstantin's Shortform · 2024-10-07T21:57:49.429Z · LW · GW

I once saw a video on Instagram of a psychiatrist recommending to other psychiatrists that they purchase ear scopes to check out their patients' ears, because:
1.  Apparently it is very common for folks with severe mental health issues to imagine that there is something in their ear (e.g., a bug, a listening device)
2.  Doctors usually just say "you are wrong, there's nothing in your ear" without looking
3.  This destroys trust, so he started doing cursory checks with an ear scope
4.  Far more often than he expected (I forget exactly, but something like 10-20%ish), there actually was something in the person's ear -- usually just earwax buildup, but occasionally something else like a dead insect -- that was indeed causing the sensation, and he gained a clinical pathway to addressing his patients' discomfort that he had previously lacked

Comment by davekasten on A Narrow Path: a plan to deal with AI extinction risk · 2024-10-07T14:20:13.874Z · LW · GW

Looking forward to it!  (Should rules permit, we're also happy to discuss privately at an earlier date)

Comment by davekasten on davekasten's Shortform · 2024-10-07T03:46:41.176Z · LW · GW

Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models? 

Here are at least 3 things I think they have as benefits:
1.  Just an independent 3rd-party perspective generally

2. The ability to draw insights across multiple labs' efforts, and identify patterns that others might not be able to 

3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: "can you design an accurate classified nuclear explosive lensing arrangement").

Are there others that come to mind? 

Comment by davekasten on Ruby's Quick Takes · 2024-09-29T16:01:25.341Z · LW · GW

I think this can be true, but I don't think it needs to be true:

"I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it."

I suspect that if the government is running the at-all-costs-top-national-priority Project, you will see some regulations stop being enforceable.  However, we also live in a world where you can easily find many instances of government officials complaining in their memoirs that laws and regulations prevented them from being able to go as fast or as freely as they'd want on top-priority national security issues.  (For example, DoD officials even after 9-11 famously complained that "the lawyers" restricted them too much on top-priority counterterrorism stuff.)
 

Comment by davekasten on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T18:32:03.706Z · LW · GW

Gentlemen, it's been a pleasure playing with you tonight

Comment by davekasten on davekasten's Shortform · 2024-09-23T19:14:56.376Z · LW · GW

I suspect this won't get published until November at the earliest, but I am already delightfully pleased with this bit:


Canada geese fly overhead, honking. Your inner northeast Ohioan notices that you are confused; it’s the wrong season for them to migrate this far south, and they’re flying westwards, anyways.

A quick Google discovers that some Canada geese have now established themselves non-migratorily in the Bay Area:

"The Migratory Bird Treaty Act of 1918 banned hunting or the taking of eggs without a permit. These protections, combined with an increase in desirable real estate—parks, golf course and the like—spurred a dramatic turnaround for the species. Canada geese began breeding in the Bay Area—the southern end of their range – in the late 1950s."

You nod, approvingly; this clearly is another part of the East Bay’s well-known, long-term philanthropic commitment to mitigating Acher-Risks.

Comment by davekasten on davekasten's Shortform · 2024-09-23T05:00:46.912Z · LW · GW

Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)

Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat

Comment by davekasten on Akash's Shortform · 2024-09-17T23:18:28.311Z · LW · GW

Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments.  Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable. 

(Edit: to be clear, reporting, not endorsing, these claims)

Comment by davekasten on TurnTrout's shortform feed · 2024-09-13T02:57:00.712Z · LW · GW

Basic question because I haven't thought about this deeply: in national security stuff, we often intentionally elide the difference between capabilities and intentions.  The logic is: you can't assume a capability won't be used, so you should plan as-if it is intended to be used.

Should we adopt such a rule for AGI with regards to policy decision-making?   (My guess is...probably not for threat assessment but probably yes for contingency planning?)

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-09-09T01:02:59.588Z · LW · GW

I think, having been raised in a series of very debate- and seminar-centric discussion cultures, that a quick-hit question like that is indeed contributing something of substance.  I think it's fair that folks disagree, and I think it's also fair that people signal (e.g., with karma) that they think "hey man, let's go a little less Socratic in our inquiry mode here."  

But, put in more rationalist-centric terms, sometimes the most useful Bayesian update you can offer someone else is, "I do not think everyone is having the same reaction to your argument that you expected." (Also true for others doing that to me!)

(Edit to add two words to avoid ambiguity in meaning of my last sentence)

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-09-08T23:11:08.823Z · LW · GW

Yes, I would agree that if I expected a short take to have this degree of attention, I would probably have written a longer comment.

Well, no, I take that back.  I probably wouldn't have written anything at all.  To some, that might be a feature; to me, that's a bug. 
 

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-09-08T22:04:11.960Z · LW · GW

It is genuinely a sign that we are all very bad at predicting others' minds that it didn't occur to me that if I said effectively "OP asked for 'takes', here's a take on why I think this is pragmatically a bad idea" would also mean that I was saying "and therefore there is no other good question here".  That's, as the meme goes, a whole different sentence.  

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-09-08T17:06:26.732Z · LW · GW

I think it's bad for discourse for us to pretend that discourse doesn't have impacts on others in a democratic society.  And I think the meta-censoring of discourse by claiming that certain questions might have implicit censorship impacts is one of the most anti-rationality trends in the rationalist sphere.

I recognize most users of this platform will likely disagree, and predict negative agreement-karma on this post.  

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-09-08T17:01:56.790Z · LW · GW

Ok, then to ask it again in your preferred question format: is this where we think our getting-potential-employees-of-Anthropic-to-consider-the-value-of-working-on-safety-at-Anthropic points are best spent? 

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-09-06T22:18:54.902Z · LW · GW

Is this where we think our pressuring-Anthropic points are best spent ? 

Comment by davekasten on Habryka's Shortform Feed · 2024-09-04T02:52:58.520Z · LW · GW

I personally endorse this as an example of us being a community that Has The Will To Try To Build Nice Things.

Comment by davekasten on The Checklist: What Succeeding at AI Safety Will Involve · 2024-09-04T02:52:22.220Z · LW · GW

To say the obvious thing: I think if Anthropic isn't able to make at least somewhat-roughly-meaningful predictions about AI welfare, then their core current public research agendas have failed?

Comment by davekasten on Buck's Shortform · 2024-09-02T18:01:07.203Z · LW · GW

Fair enough! 

Comment by davekasten on Buck's Shortform · 2024-09-02T15:23:52.982Z · LW · GW

Possibly misguided question given the context -- I see you incorporating imperfect information in "the attack fails silently", why not also a distinction between "the attack succeeds noisily, the AI wins and we know it won" and "the attack succeeds silently, the AI wins and we don't know it won" ? 

Comment by davekasten on Verification methods for international AI agreements · 2024-08-31T18:43:02.621Z · LW · GW

I would suggest that the set of means available to nation-states to unilaterally surveil another nation state is far more expansive than the list you have.  For example, the good-old-fashioned "Paying two hundred and eighty-two thousand dollars in a Grand Cayman banking account to a Chinese bureaucrat"* appears nowhere in your list.  


*If you get that this is a reference to the movie Spy Game, you are cool.  If you don't, go watch Spy Game.  It has a worldview on power that is extremely relevant to rationalists. 

Comment by davekasten on "Deception Genre" What Books are like Project Lawful? · 2024-08-29T16:16:14.851Z · LW · GW

I think you could argue plausibly that the climax of Vernor Vinge's A Deepness In the Sky has aspects of this, though it's subverted in multiple interesting spoilery ways.

In fact, I think you could argue that a lot of Vinge's writing tends to have major climaxes dependent on Xanatos Gambit pileups based on deception themes. 

Comment by davekasten on Why Large Bureaucratic Organizations? · 2024-08-27T21:16:49.552Z · LW · GW

This feels like a great theory for one motivation, but it isn't at all complete. 

For example: this theory doesn't really predict why anyone is ever hired above the bottom level of an organization at the margin.  

Comment by davekasten on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-27T02:59:33.026Z · LW · GW

That's a fair criticism!  Season 1 is definitely slower on that front compared to the others.  I think season 1 is the most normal "crime of the week" season by far, which is why I view it as a good on-ramp for folks less familiar.  Arguably, for someone situated as you are, you should just watch the pilot, read a quick wiki summary of every other episode in season 1 except for the last 2, watch those last 2, and move into season 2 when things get moving a little faster.  (Finch needs a character that doesn't appear until season 2 to do a lot of useful exposition on how he thinks about the Machine's alignment). 

Comment by davekasten on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-27T02:36:52.421Z · LW · GW

I will continue to pitch on the idea that Person of Interest is a TV show chock full of extremely popular TV people, including one lead beloved by Republicans, and we inexplicably fail to push people towards its Actually Did The Research presentation of loss-of-control stuff.

We should do that.  You all should, unironically, recommend it, streaming now on Amazon Prime for free, to your normie parents and aunts and uncles if they're curious about what you do at work. 

Comment by davekasten on davekasten's Shortform · 2024-08-23T22:21:24.167Z · LW · GW
Comment by davekasten on davekasten's Shortform · 2024-08-23T22:21:14.342Z · LW · GW

Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?

(or is this just a expected-harm metric rather than a probability metric ?) 

Comment by davekasten on Zach Stein-Perlman's Shortform · 2024-08-23T14:39:24.690Z · LW · GW

I think you're eliding the difference between "powerful capabilities" being developed, the window of risk, and the best solution.  

For example, if Anthropic believes "_we_ will have it internally in 1-3 years, but no small labs will, and we can contain it internally" then they might conclude that the warrant for a state-level FMD is low.  Alternatively, you might conclude, "we will have it internally in 1-3 years, other small labs will be close behind, and an American state's capabilities won't be sufficient, we need DoD, FBI, and IC authorities to go stompy on this threat", and thus think a state-level FMD is low-value-add.  

Very unsure I agree with either of these hypos to be clear!  Just trying to explore the possibility space and point out this is complex. 

Comment by davekasten on davekasten's Shortform · 2024-08-22T22:39:11.918Z · LW · GW

I am (speaking personally) pleasantly surprised by Anthropic's letter.  https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf

Comment by davekasten on davekasten's Shortform · 2024-08-22T21:51:04.419Z · LW · GW

I think at the meta level I very much doubt that I am responsible enough to create and curate a list of human beings for the most dangerous hazards.  For example, I am very confident that I could not 100% successfully detect a foreign government spy inside my friend group, because even the US intelligence community can't do that...  you need other mitigating controls, instead.

Comment by davekasten on davekasten's Shortform · 2024-08-22T21:49:36.656Z · LW · GW

Yeah, that's a useful taxonomy to be reminded of.  I think it's interesting how the "development hazard", item 8, with maybe a smidge of "adversary hazard", is the driver of people's thinking on AI.  I'm pretty unconvinced that good infohazard doctrine, even for AI, can be written based on thinking mainly about that!

Comment by davekasten on davekasten's Shortform · 2024-08-22T21:42:22.982Z · LW · GW

I think a lot of my underlying instinctive opposition to this concept boils down to thinking that we can and do coordinate on this stuff quite a lot.  Arguably, AI is the weird counterexample of a thought that wants to be thunk -- I think modern Western society is very nearly tailor-made to seek a thing that is abstract, maximizing, systematizing of knowledge, and useful, especially if it fills a hole left by the collapse of organized religion.  

I think for most other infohazards, the proper approach requires setting up an (often-government) team that handles them, which requires those employees to expose themselves to the infohazard to manage it.  And, yeah, sometimes they suffer real damage from it.  There's no way to analyze ISIS beheading videos to stop their perpetrators without seeing some beheading videos; I think that's the more-common varietal of infohazard I'm thinking of.