davekasten's Shortform

davekasten

davekasten's Shortform

post by davekasten · 2024-05-01T22:07:24.997Z · LW · GW · 89 comments

91 comments

89 comments

Comments sorted by top scores.

comment by davekasten · 2025-03-11T03:10:22.672Z · LW(p) · GW(p)

The Trump administration (or, more specifically, the White House Office of Science and Technology Policy, but they are in the lead on most AI policy, it seems), are asking for comment on what their AI Action Plan should include. Literally anyone can comment on it. You should consider commenting on it, comments are due Saturday at 8:59pm PT/11:59pm ET via an email address. These comments will actually be read, and a large number of comments on an issue usually does influence any White House's policy. I encourage you to submit comments!

regulations.gov/document/NSF_FRDOC_0001-3479… (Note that all submissions are public and will be published)

(Disclosure: I am working on a submission for this for my dayjob but this particular post is in my personal capacity)

(Edit note: I originally said this was due Friday; I cannot read a calendar, it is in fact due 24 hours later. Consider this a refund that we have all received for being so good at remembering the planning fallacy [LW · GW] all these years.)

Replies from: peterslattery, Seth Herd

↑ comment by peterslattery · 2025-03-12T18:38:10.670Z · LW(p) · GW(p)

In the future, there should be some organization or some group of individuals in the LW community who raise awareness about these sorts of opportunities and offer content and support to ensure submissions from the most knowledgeable and relevant actors. This seems like a very low-hanging fruit and is something several groups I know are doing.

↑ comment by Seth Herd · 2025-03-11T07:27:48.019Z · LW(p) · GW(p)

Edit: I finished that post on this topic: Whether governments will control AGI is important and neglected [LW · GW].

I'm hoping for discussion on that post, and quite ready to change my draft comment, or not submit one, based on those arguments. After putting a bunch of thought into it, my planned comment will recommend forming a committee that can work in private to investigate the opportunities and risks of AI development, to inform future policy. I will note that this was Roosevelt's response to Einstein's letter on the potential of nuclear weaponry.

I hope that such a committee will conclude that yeah, there are some big dangers on expectation. I will emphasize the disagreement among experts, and suggest that the sane thing to do is put real effort into sorting out the many conflicting claims and possibilities, while also pursuing our current best guesses. I think any request for a slowdown is wasted, given the request's note about reducing regulatory barriers. But I will note that there are dangers to both our economy from potential rapid job loss, and large security risks from adversaries stealing or copying our AI, such that we may be currently building tools and weapons that will be used against us. I think I will not emphasize x-risk, and may not even include it. But I will probably mention that predictions of reaching human-level autonomous operation are very mixed, so we're not sure how far we are from creating what's effectively a new intelligent species. I'm hoping that triggers the right intuitions of danger.

Again, I'm highly uncertain and very open to changing my mind on what to say.

Original comment:

This raises the question: what should we say?

Fortunately, I've almost finished a post about this. It analyzes many aspects of the question "do we want governments to recognize the potential of AGI?".

Unfortunately, it doesn't answer the question. There are strong points on both sides, and it needs more careful thought.

Nonetheless, I'll probably get it out tomorrow since it's almost finished anyway, and this request for public comments might make a good forcing function for all of us to put some thought into it.

Replies from: davekasten

↑ comment by davekasten · 2025-03-11T14:26:32.653Z · LW(p) · GW(p)

I think on net, there are relatively fewer risks related to getting governments more AGI-pilled vs. them continuing on their current course; governments are broadly AI-pilled even if not AGI/ASI-pilled and are doing most of the accelerating actions an AGI-accelerator would want.

Replies from: Seth Herd

↑ comment by Seth Herd · 2025-03-11T19:50:15.650Z · LW(p) · GW(p)

I wasn't able to finish that post in the few minutes I've got so far today, so here's the super short version. I remain highly uncertain whether my comments will include any mention of AGI.

(Edit: I finally finished it: Whether governments will control AGI is important and neglected [LW · GW])

I think whether AGI-pilling governments is a good idea is quite complex. Pushing the government to become aware of AGI x-risks will probably decelerate progress, but it could even accelerate it if the conclusion is "build it first, don't worry we'll be super careful when we get close".

Even if it does help with alignment, it's not necessarily net good. If governments take control early enough to prevent proliferation of AGI, that helps a lot with the risks of misalignment and catastrophic misuse. The US could even cooperate with China to prevent proliferation to other countries and to nongovermental groups, just as the US cooperated with Russia on nuclear nonproliferation.

But government control also raises the risks of power concentration. Intent-aligned AGI in untrustworthy hands could create a permanent dictatorship and unbreakable police state. The current governments of both the US and China don't seem like the best types to control the future.

So it's a matter of balancing Fear of centralized power vs. fear of misaligned AGI [LW · GW].

This also needs to be balanced agains the possibility of misuse of intent-aligned AGI if it does proliferate broadly; see If we solve alignment, do we die anyway? [LW · GW]

If I had a firm estimate of how hard technical alignment is, I'd have a better answer. But I don't, and I think the best objective conclusion, taking in all of the arguments made to date and the very wide variance in opinion even among those who've thought deeply about it, is that nobody has a very good estimate. (Edit: I mean estimates between very very hard and modestly tricky. I don't know of anyone who's addressed the hard parts and concluded that it happens by default.)

Neither do we have a good estimate of how likely individuals in power would be to use AGI well or poorly, in various circumstances (unchallenged hegemony vs. close race dynamics).

comment by davekasten · 2024-05-15T09:46:13.454Z · LW(p) · GW(p)

Epistemic status: not a lawyer, but I've worked with a lot of them.

As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony). Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...

Replies from: metachirality

↑ comment by metachirality · 2024-05-16T07:34:43.791Z · LW(p) · GW(p)

A subpoena for what?

comment by davekasten · 2024-10-17T01:04:54.282Z · LW(p) · GW(p)

Okay, I spent much more time with the Anthropic RSP revisions today. Overall, I think it has two big thematic shifts for me:

1. It's way more "professionally paranoid," but needs even more so on non-cyber risks. A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)

2. It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result." You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet).

comment by davekasten · 2024-06-04T23:58:40.664Z · LW(p) · GW(p)

I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.

(I am reporting, not endorsing. For example, it is deeply unclear to me why we should take another country's statements about the year they're gonna do a war at surface level)

Replies from: thomas-kwa, william-riker, D0TheMath

↑ comment by Thomas Kwa (thomas-kwa) · 2024-06-05T21:35:27.040Z · LW(p) · GW(p)

"want to pick a war with America" is really strange wording because China's strategic goals are not "win a war against nuclear-armed America", but things like "be able to control its claims in the South China Sea including invading Taiwan without American interference". Likewise Russia doesn't want to "pick a war with the EU" but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don't know whether national security people relate to the phrasing the same way but they do understand this.

Replies from: davekasten

↑ comment by davekasten · 2024-06-07T20:39:02.974Z · LW(p) · GW(p)

I totally understand your point, agree that many folks would use your phrasing, and nonetheless think there is something uniquely descriptively true about the phrasing I chose and I stand by it.

↑ comment by William Riker (william-riker) · 2024-06-05T17:37:20.551Z · LW(p) · GW(p)

Has China has made a statment about starting a war in 2027 or later? Who exactly is the belief that "by-default China will want to pick a war with America in 2027 or later" held by and how confident are you that they hold it?

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2024-06-05T17:47:00.938Z · LW(p) · GW(p)

It is supposedly their goal for when they will have modernized their military.

Replies from: william-riker

↑ comment by William Riker (william-riker) · 2024-06-05T18:10:27.572Z · LW(p) · GW(p)

Thanks for the link! The one mention of starting war was a quote from this 2006 white paper:

"by the middle of the twenty-first century, the strategic goal of building an informatized army and winning informatized wars will be basically achieved"

Is this what you're referring to or did I miss something?

Replies from: davekasten, D0TheMath

↑ comment by davekasten · 2024-06-05T18:17:46.996Z · LW(p) · GW(p)

The general belief in Washington is that Xi Jinping has ordered his military to be ready to invade Taiwan by then. (See, e.g., https://www.reuters.com/world/china/logistics-war-how-washington-is-preparing-chinese-invasion-taiwan-2024-01-31/ )

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-06-06T10:10:11.633Z · LW(p) · GW(p)

Sufficient AI superiority will mean overwhelming military superiority. If we remain ahead in AI it won't matter what other countries do. I expect this effect will dominate the strategic landscape by 2027.

Replies from: davekasten

↑ comment by davekasten · 2024-06-07T20:35:18.410Z · LW(p) · GW(p)

Say more ?

↑ comment by Garrett Baker (D0TheMath) · 2024-06-05T18:15:44.723Z · LW(p) · GW(p)

No, the belief is that China isn’t going to start a war before it has a modernized military, and they plan to have a modernized military by 2027. Therefore they won’t start a war before 2027.

China has also been drooling over Taiwan for the past 100 years. Thus, if you don’t think diplomatic or economic ties mean much to them, and they’ll contend with the US’s military might before 2027, and neither party will use nukes in such a conflict, then you expect a war after 2027.

Replies from: william-riker

↑ comment by William Riker (william-riker) · 2024-06-05T18:19:11.911Z · LW(p) · GW(p)

Ah, I misread your comment. Thanks for clarifying!

↑ comment by Garrett Baker (D0TheMath) · 2024-06-05T18:38:10.899Z · LW(p) · GW(p)

I don't think they have stated they'll to to war after 2027. 2027 is the year of their "military modernization" target.

comment by davekasten · 2024-10-15T16:11:38.525Z · LW(p) · GW(p)

It's a small but positive sign that Anthropic sees taking 3 days beyond their RSP's specified timeframe to conduct a process without a formal exception as an issue. Signals that at least some members of the team there are extremely attuned to normalization of deviance [LW · GW] concerns.

comment by davekasten · 2025-02-11T21:49:29.821Z · LW(p) · GW(p)

Ok, so it seems clear that we are, for better or worse, likely going to try to get AGI to do our alignment homework.

Who has thought through all the other homework we might give AGI that is as good of an idea, assuming a model that isn't an instant-game-over for us? E.G., I remember @Buck rattling off a list of other ideas that he had in his The Curve talk, but I feel like I haven't seen the list of, e.g., "here are all the ways I would like to run an automated counterintelligence sweep of my organization" ideas.

(Yes, obviously, if the AI is sneakily misaligned, you're just dead because it will trick you into firing all your researchers, etc.; this is written in a "playing to your outs" mentality, not an "I endorse this as a good plan" mentality.)

Replies from: Buck, TrevorWiesinger, Thane Ruthenis, valley9, quinn-dougherty, william-brewer

↑ comment by Buck · 2025-02-11T22:03:11.971Z · LW(p) · GW(p)

@ryan_greenblatt [LW · GW] is working on a list of alignment research applications. For control applications, you might enjoy the long list of control techniques [LW · GW] in our original post.

↑ comment by trevor (TrevorWiesinger) · 2025-02-12T05:27:53.455Z · LW(p) · GW(p)

How to build a lie detector app/program to release to the public (preferably packaged with advice/ideas on ways to use and strategies for marketing the app, e.g. packaging it with an animal body-language to english translator).

↑ comment by Thane Ruthenis · 2025-02-12T07:56:56.448Z · LW(p) · GW(p)

Technology for efficient human uploading. Ideally backed by theory we can independently verify as correct and doing what it's intended to do (rather than e. g. replacing the human upload with a copy of the AGI who developed this technology).

↑ comment by Ebenezer Dukakis (valley9) · 2025-02-12T11:40:12.321Z · LW(p) · GW(p)

I think unlearning [? · GW] could be a good fit for automated alignment research.

Unlearning could be a very general tool to address a lot of AI threat models. It might be possible to unlearn deception, scheming, manipulation of humans, cybersecurity, etc. I challenge you to come up with an AI safety failure story that can't, in principle, be countered through targeted unlearning in some way, shape, or form.

Relative to some other kinds of alignment research, unlearning seems easy to automate, since you can optimize metrics for how well things have been unlearned.

I like this post [LW · GW].

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2025-02-12T22:29:26.219Z · LW(p) · GW(p)

Unlearning could be a very general tool to address a lot of AI threat models. It might be possible to unlearn deception, scheming, manipulation of humans, cybersecurity, etc. I challenge you to come up with an AI safety failure story that can't, in principle, be countered through targeted unlearning in some way, shape, or form.

The big one probably has to do with being able to corrupt the metrics so totally that whatever you think you made them unlearn actually didn't happen, or just being able to relearn the knowledge so fast that unlearning doesn't matter, but yes unlearning is a very underrated direction for AI automation, because it targets so many threat models.

It also satisfies the property of addressing a bottleneck (in this case, capabilities being so dangerous as to threaten any test), and while I wouldn't call it the best, it's still quite underrated how much unlearning will be useful.

Similarly, domain-limiting AIs would be quite useful for control of AI.

Replies from: valley9

↑ comment by Ebenezer Dukakis (valley9) · 2025-02-13T00:12:33.525Z · LW(p) · GW(p)

The big one probably has to do with being able to corrupt the metrics so totally that whatever you think you made them unlearn actually didn't happen, or just being able to relearn the knowledge so fast that unlearning doesn't matter

I favor proactive approaches to unlearning which prevent the target knowledge from being acquired in the first place. E.g. for gradient routing, if you can restrict "self-awareness and knowledge of how to corrupt metrics" to a particular submodule of the network during learning, then if that submodule isn't active, you can be reasonably confident that the metrics aren't currently being corrupted. (Even if that submodule sandbags and underrates its own knowledge, that should be fine if the devs know to be wary of it. Just ablate that submodule whenever you're measuring something that matters, regardless of whether your metrics say it knows stuff!)

Unlearning techniques should probably be battle-tested in low-stakes "model organism" type contexts, where metrics corruption isn't expected.

while I wouldn't call it the best

Curious what areas you are most excited about!

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2025-02-13T00:40:19.227Z · LW(p) · GW(p)

I basically agree with this, and on this question:

Curious what areas you are most excited about!

My big areas of excitement are AI control (in a broad sense) and synthetic dataset making for AI alignment of successors.

↑ comment by Quinn (quinn-dougherty) · 2025-02-16T00:17:42.370Z · LW(p) · GW(p)

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

↑ comment by yams (william-brewer) · 2025-02-12T01:46:22.858Z · LW(p) · GW(p)

Preliminary thoughts from Ryan Greenblatt on this here.

comment by davekasten · 2024-10-28T16:37:21.299Z · LW(p) · GW(p)

At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky [LW · GW] , @habryka [LW · GW] , and some interlocutors from the frontier labs (you'll momentarily see why I'm being vague on the latter names).

One question was: "does DC actually listen to whistleblowers?" and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.

Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article "The U.S. Spies Who Sound the Alarm About Election Interference" by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)

The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.

Replies from: gwern

↑ comment by gwern · 2024-10-29T01:55:02.256Z · LW(p) · GW(p)

Also of relevance is the wave of resignations from the DC newspaper The Washington Post the past few days over Jeff Bezos suddenly exerting control.

Replies from: davekasten

↑ comment by davekasten · 2024-10-29T04:31:01.005Z · LW(p) · GW(p)

Yup. The fact that the profession that writes the news sees "I should resign in protest" as their own responsibility in this circumstance really reveals something.

comment by davekasten · 2025-03-18T18:40:02.824Z · LW(p) · GW(p)

The elites do want you to know it: you can just email a Congressional office and get a meeting

Replies from: Seth Herd

↑ comment by Seth Herd · 2025-03-18T20:56:33.962Z · LW(p) · GW(p)

Would I have to go to DC? Because I hate going to DC.

Not that I wouldn't to save the world, but I'd want to be sure it was necessary.

Only partly kidding. Maybe if people got a rationalist enclave in DC going we'd be less averse?

Replies from: davekasten

↑ comment by davekasten · 2025-03-18T22:43:43.577Z · LW(p) · GW(p)

You can definitely meet your own district's staff locally (e.g., if you're in Berkeley, Congresswoman Simon has an office in Oakland, Senator Padilla has an office in SF, and Senator Schiff's offices look not to be finalized yet but undoubtedly will include a Bay Area Office).

You can also meet most Congressional offices' staff via Zoom or phone (though some offices strongly prefer in-person meetings).

There is also indeed a meaningful rationalist presence in DC, though opinions vary as to whether the enclave is in Adams Morgan-Columbia Heights, Northern Virginia, or Silver Spring.*

*This trichotomy is funny, but hard to culturally translate unless you want a 15,000 word thesis on DC-area housing and federal office building policy since 1945 and its related cultural signifiers. Just...just trust me on this.

Replies from: alexander-gietelink-oldenziel

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-03-20T12:09:59.446Z · LW(p) · GW(p)

The people require it, sir.

comment by davekasten · 2024-10-27T14:13:52.511Z · LW(p) · GW(p)

Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?

For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future...

Replies from: Linch, Seth Herd

↑ comment by Linch · 2024-10-27T15:57:09.370Z · LW(p) · GW(p)

My guess is that we wouldn't actually know with high confidence before (and likely even some time after) things-will-definitely-be-fine.

E.g. 3 months after safe ASI people might still be publishing their alignment takes.

Replies from: davekasten

↑ comment by davekasten · 2024-10-27T18:01:28.272Z · LW(p) · GW(p)

Oh, to be clear I'm not sure this is at all actually likely, but I was curious if anyone had explored the possibility conditional on it being likely

↑ comment by Seth Herd · 2024-10-27T20:52:09.111Z · LW(p) · GW(p)

Endgame strategies from who?

A lot of powerful people would focus on being the ones to control it when it happens, so they'd control the future - and not be subject to some else's control of the future. OpenPhil is about the only org that would think first of the public benefit and not the dangers of other humans controlling it. And not a terribly powerful org, particularly relative to governments.

Replies from: davekasten

↑ comment by davekasten · 2024-10-27T21:20:30.162Z · LW(p) · GW(p)

I was being intentionally broad, here. I am probably less interested for purposes of this particular post only in the question of "who controls the future" swerves and more about "what else would interested, agentic actors do" questions.

It is not at all clear to me that OpenPhil is the only org who feels this way -- I can think of several non-EA-ish charities that if they genuinely 100% believed "none of the people you care for will die of the evils you fight if you can just keep them alive for the next 90 days" would plausibly do some interestingly agentic stuff.

comment by davekasten · 2025-04-09T18:36:13.758Z · LW(p) · GW(p)

We're hiring at ControlAI for folks who walk to work on UK and US policy advocacy. Come talk to Congress and Parliament and stop risks from unsafe superintelligences! controlai.com/careers

(Admins: I don't tend to see many folks posting this sort of thing here, so feel free to nuke this post if not the sort of content you're going for. But given audience here, figured might be of interest)

Replies from: felix-c

↑ comment by Felix C. (felix-c) · 2025-04-09T21:16:33.063Z · LW(p) · GW(p)

Thank you for posting this. Are there any opportunities for students about to graduate to apply themselves, particularly without a C.S background? My undergraduate experience was focused on Business and IR (Cold War history, Sino-U.S relations) before I pivoted my long term focus to AI safety policy, and it's been difficult to find good entry points for EA work in this field as a new grad.

I've been monitoring 80,000 hours and applying to research fellowships where I can so far, but I'm always looking for new positions. If you or anyone else knows an org looking to onboard some fresh talent, I'd be happy to help.

Edit: Application submitted.

comment by davekasten · 2024-07-24T20:16:13.706Z · LW(p) · GW(p)

A random observation from a think tank event last night in DC -- the average person in those rooms is convinced there's a problem, but that it's the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.

This is a very weird set of selection effects. I'm not sure what to make of it, honestly.

Replies from: habryka4, Dagon, AliceZ, rhollerith_dot_com

↑ comment by habryka (habryka4) · 2024-08-21T19:16:15.859Z · LW(p) · GW(p)

Random psychologizing explanation that resonates most with me: Claiming to address big problems requires high-status. A low-rank person is allowed to bring up minor issues, but they are not in a position to bring up big issues that might reflect on the status of many high-status people.

This is a pretty common phenomenon that I've observed. Many people react with strong social slap-down motions if you (for example) call in question whether the net-effect of a whole social community or economic sector is negative, where the underlying cognitive reality seems similar to "you are not high status enough to bring forward this grievance".

Replies from: davekasten

↑ comment by davekasten · 2024-08-22T21:21:17.398Z · LW(p) · GW(p)

I think this is plausibly describing some folks!

But I also think there's a separate piece -- I observe, with pretty high odds that it isn't just an act, that at least some people are trying to associate themselves with the near-term harms and AI ethics stuff because they think that is the higher-status stuff, despite direct obvious evidence that the highest-status people in the room disagree.

↑ comment by Dagon · 2024-07-25T15:23:21.356Z · LW(p) · GW(p)

There are (at least) two models which could partially explain this:
1) The high-status/high-rank people have that status because they're better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements. They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it'll come out OK without their involvement.

2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things. So everyone is just focusing on the things most likely to impact themselves.

edit: to clarify, these are two models that do NOT imply the obvious "smarter/more powerful people are correctly worried about the REAL threats, and the average person's concerns are probably unimportant/uninformed". It's quite possible that this division doesn't tell us much about the relative importance of those different risks.

Replies from: davekasten

↑ comment by davekasten · 2024-07-25T18:06:58.696Z · LW(p) · GW(p)

Yup1 I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons. I just don't think I have anywhere near enough evidence yet to actually conclude that, so I'm just reporting the random observation for now :)

↑ comment by ZY (AliceZ) · 2024-10-27T23:51:49.747Z · LW(p) · GW(p)

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc? It is also good to note that sometimes the "status" is dynamic, and may or may not imply anything causal with their decision making or choice on priorities.

One scenario is "higher status" might correlates with better resources to achieve those statuses, and a possibility is as a result they haven't experienced or they are not subject to many near-term harms. In other words, it is not really about the difference between "average" and "high status"'s people's intelligence, but more about what kind of world they are exposed to.

I do think it is good to hear all different perspectives to stay curious/open-minded.

edit: I just saw Dragon nicely listed two potential reasons, with scenario 2 mentioning something similar with my comment here. But something slightly specific in my thinking, is that these choices made by "average" and "high status" people may or may not be conscious, but rather from the experience from their lives and the world they are exposed to.

Replies from: davekasten

↑ comment by davekasten · 2024-10-28T02:02:43.136Z · LW(p) · GW(p)

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc?

I mean, functionally all of those things. (Well, minus the country dynamic. Everyone at this event I talked to was US, UK, or Canadian, which is all sorta one team for purposes of status dynamics at that event)

↑ comment by RHollerith (rhollerith_dot_com) · 2024-07-24T23:46:48.942Z · LW(p) · GW(p)

comment by davekasten · 2024-07-01T00:14:14.863Z · LW(p) · GW(p)

I really dislike the term "warning shot," and I'm trying to get it out of my vocabulary. I understand how it came to be a term people use. But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn't the right term "mass casualty event" ?

Replies from: habryka4, robo

↑ comment by habryka (habryka4) · 2024-07-01T01:04:10.551Z · LW(p) · GW(p)

I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn't involve mass casualties.

I do agree with what I think you are pointin at, which is that there is something Orwellian about the "warning shot" language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).

Replies from: davekasten

↑ comment by davekasten · 2024-07-01T01:21:18.702Z · LW(p) · GW(p)

I totally think it's true that there are warning shots that would be non-mass-casualty events, to be clear, and I agree that the scenarios you note could maybe be those.

(I was trying to use "plausibly" to gesture at a wide range of scenarios, but I totally agree the comment as written isn't clearly meaning that).

I don't think folks intended anything Orwellian, just sort of something we stumbled into, and heck, if we can both be less Orwellian and be more compelling policy advocates at the same time, why not, I figure.

↑ comment by robo · 2024-07-01T08:00:57.673Z · LW(p) · GW(p)

I think a lot of people losing their jobs would probably do the trick, politics-wise. For most people the crux is "will AIs will be more capable than humans", not "might AIs more capable than humans be dangerous".

Replies from: davekasten

↑ comment by davekasten · 2024-07-01T19:20:58.257Z · LW(p) · GW(p)

You know, you're not the first person to make that argument to me recently. I admit that I find it more persuasive than I used to.

Put another way: "will AI take all the jobs" is another way of saying* "will I suddenly lose the ability to feed and protect those I love." It's an apocalypse in microcosm, and it's one that doesn't require a lot of theory to grasp.

*Yes, yes, you could imagine universal basic income or whatever. Do you think the average person is Actually Expecting to Get That ?

comment by davekasten · 2024-10-07T03:46:41.176Z · LW(p) · GW(p)

Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models?

Here are at least 3 things I think they have as benefits:
1. Just an independent 3rd-party perspective generally

2. The ability to draw insights across multiple labs' efforts, and identify patterns that others might not be able to

3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: "can you design an accurate classified nuclear explosive lensing arrangement").

Are there others that come to mind?

comment by davekasten · 2025-01-20T15:07:30.567Z · LW(p) · GW(p)

One point that maybe someone's made, but I haven't run across recently: if you want to turn AI development into a Manhattan Project, you will by-default face some real delays from the reorganization of private efforts into one big national effort. In a close race, you might actually see pressures not to do so, because you don't want to give up 6 months to a year on reorg drama -- so in some possible worlds, the Project is actually a deceleration move in the short term, even if it accelerates in the long term!

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-24T02:33:56.717Z · LW(p) · GW(p)

This is a point that's definitely come up in private discussions I've been a part of. I don't remember if I saw it said publicly somewhere.

Replies from: davekasten

↑ comment by davekasten · 2025-01-24T04:46:20.494Z · LW(p) · GW(p)

I am (sincerely!) glad that this is obvious to other people too and that they are talking about it already!

comment by davekasten · 2024-10-22T20:47:28.045Z · LW(p) · GW(p)

It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).

Why do we think this is the case?
I can imagine at least 3 hypotheses:
1. Just path-dependence; someone did it, it went well, others imitated

2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas

3. This is a return to the true original meaning of an essay, under Montaigne, that it's an attempt to write thinking down when it's still inchoate, in an effort to make it more comprehensible not only to others but also to oneself. And AGI/ASI is deeply uncertain, so the essay format is particularly suited for this.

What do you think?

Replies from: gwern, Seth Herd

↑ comment by gwern · 2024-10-22T21:02:27.508Z · LW(p) · GW(p)

Well, what's the alternative? If you think there is something weird enough and suboptimal about essay formats that you are reaching for 'random chance' or 'monkey see monkey do' level explanations, that implies you think there is some much superior format they ought to be using instead. But I can't see what. I think it might be helpful to try to make the case for doing these things via some of the alternatives:

a peer-reviewed Nature paper which would be published 2 years from now, maybe, behind a paywall
a published book, published 3 years from starting the first draft now, which some people might get around to reading a year or two after that, and dropping halfway through (assuming you finish and didn't burn out writing it)
a 1 minute Tiktok video by an AI person with non-supermodel looks
a 5-minute heavily-excerpted interview on CNN
a 750-word WSJ or NYT op-ed
a 10-page Arxiv paper in the standard LaTeX template
a Twitter thread of 500 tweets (which can only be read by logged-in users)
a Medium post (which can't be read because it is written in a light gray font illegible to anyone over the age of 20. Also, it's paywalled 90% of the time.)
a 6 hour Lex Fridman podcast interview, about 4 hours in after Lex has finished his obligatory throatclearing questions (like asking you if aliens exist or the universe is made out of love)
interpretive dance in front of the Lincoln Memorial livestreamed on Twitch
...

(I'd also add in Karnofsky's blog post series.)

↑ comment by Seth Herd · 2024-10-22T22:19:03.525Z · LW(p) · GW(p)

I think those are the meta because they have just enough space to not only give opinions but to mention reasons for those opinions and expertise/background to support the many unstated judgment calls.

Note that the essays by Altman and Amodei are popular because their positions are central beyond the others because they have not only demonstrable backgrounds in AI but lots of name recognition (we're mostly assuming Altman has bothered learning a lot about how Transformers work even if we don't like him). And that the Gladstone report got itself commissioned by at least a little piece of the government.

A Narrow Path just demonstrates in the text that you and your co-authors have thought deeply about the topic. Shorter essays leave more guesswork on the authors' expertise and depth of consideration.

comment by davekasten · 2025-01-16T21:18:07.096Z · LW(p) · GW(p)

Incidentally, spurred by @Mo Putera [LW · GW]'s posting of Vernor Vinge's A Fire Upon The Deep annotations, I want to remind folks that Vinge's Rainbows End is very good and doesn't get enough attention, and will give you a less-incorrect understanding of how national security people think.

comment by davekasten · 2024-08-21T15:13:43.661Z · LW(p) · GW(p)

I'm pretty sure that I think "infohazard" is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans. It is an orphan of a concept -- it doesn't go anywhere. Ok, the information's harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?

Replies from: ABlue, Dagon, shankar-sivarajan, habryka4, None

↑ comment by ABlue · 2024-08-21T17:17:16.553Z · LW(p) · GW(p)

That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.

↑ comment by Dagon · 2024-08-21T16:18:38.136Z · LW(p) · GW(p)

I agree that it's not terribly useful beyond identifying someone's fears. Using almost any taxonomy to specify what the speaker is actually worried about lets you stop saying "infohazard" and start talking about "bad actor misuse of information" or "naive user tricked by partial (but true) information". These ARE often useful, even though the aggregate term "infohazard" is limited.

Replies from: zac-hatfield-dodds

↑ comment by Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-08-22T03:23:31.874Z · LW(p) · GW(p)

See e.g. Table 1 of https://nickbostrom.com/information-hazards.pdf

Replies from: davekasten

↑ comment by davekasten · 2024-08-22T21:49:36.656Z · LW(p) · GW(p)

Yeah, that's a useful taxonomy to be reminded of. I think it's interesting how the "development hazard", item 8, with maybe a smidge of "adversary hazard", is the driver of people's thinking on AI. I'm pretty unconvinced that good infohazard doctrine, even for AI, can be written based on thinking mainly about that!

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2024-08-21T15:40:50.572Z · LW(p) · GW(p)

I suggest there is a concept distinct enough to warrant the special term, but if it's expansive enough to include secrets, beneficial information that some people prefer others not know, that renders it worthless.

"Infohazard" ought to be reserved for information that harms the mind that contains it, with spoilers as the most mild examples, SCP-style horrors as the extreme fictional examples.

↑ comment by habryka (habryka4) · 2024-08-21T17:42:31.114Z · LW(p) · GW(p)

I think within a bayesian framework where in-general you assume information has positive value, it's useful to have an explicit term when that is not the case. It's a relatively rare occurrence, and as such your usual ways of dealing with information will probably backfire.

The obvious things to do is to not learn about that information in the first place (i.e. avoid dangerous research), understand and address the causes for why this information is dangerous (because e.g. you can't coordinate on not building dangerous technology), or as a last resort, silo the information and limit the spread of it.

I do think that it would be useful to have different words that distinguish between "infohazard to the average individual" and "societal infohazard". The first one is really exceedingly rare. The second one is still rare but more common because society has a huge distribution of beliefs and enough crazy people that if information can be used dangerously, there is a non-trivial chance it will.

Replies from: tailcalled, davekasten

↑ comment by tailcalled · 2024-08-21T19:09:41.077Z · LW(p) · GW(p)

I still like the term "recipe for destruction" when limiting it to stuff similar to dangerous technology.

↑ comment by davekasten · 2024-08-22T21:42:22.982Z · LW(p) · GW(p)

I think a lot of my underlying instinctive opposition to this concept boils down to thinking that we can and do coordinate on this stuff quite a lot. Arguably, AI is the weird counterexample of a thought that wants to be thunk -- I think modern Western society is very nearly tailor-made to seek a thing that is abstract, maximizing, systematizing of knowledge, and useful, especially if it fills a hole left by the collapse of organized religion.

I think for most other infohazards, the proper approach requires setting up an (often-government) team that handles them, which requires those employees to expose themselves to the infohazard to manage it. And, yeah, sometimes they suffer real damage from it. There's no way to analyze ISIS beheading videos to stop their perpetrators without seeing some beheading videos; I think that's the more-common varietal of infohazard I'm thinking of.

↑ comment by [deleted] · 2024-08-21T17:02:43.837Z · LW(p) · GW(p)

Ok, the information's harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?

I think one of the points is that you should now focus on selective [LW · GW] rather than corrective or structural means to figure out who is nonetheless allowed to work on the basis of this information.

Calling something an infohazard, at least in my thinking, generally implies both that:

any attempts to devise galaxy-brained incentive structures [? · GW] that try to get large groups of people to nonetheless react in socially beneficial ways when they access this information are totally doomed and should be scrapped from the beginning.
you absolutely should not give this information to anyone that you have doubts would handle it well; musings along the lines of "but maybe I can teach/convince them later on what the best way to go about this is" are generally wrong and should also be dismissed.

So what do you do if you nonetheless require that at least some people are keeping track of things? Well, as I said above, you use selective methods instead. More precisely, you carefully curate a very short list of human beings that are responsible people and likely also share your meta views on how dangerous truths ought to be handled [LW · GW], and you do your absolute best to make sure the group never expands beyond those you have already vetted as capable of handling the situation properly.

Replies from: davekasten

↑ comment by davekasten · 2024-08-22T21:51:04.419Z · LW(p) · GW(p)

I think at the meta level I very much doubt that I am responsible enough to create and curate a list of human beings for the most dangerous hazards. For example, I am very confident that I could not 100% successfully detect a foreign government spy inside my friend group, because even the US intelligence community can't do that... you need other mitigating controls, instead.

comment by davekasten · 2024-12-17T18:51:49.429Z · LW(p) · GW(p)

I have a few weeks off coming up shortly, and I'm planning on spending some of it monkeying around AI and code stuff. I can think of two obvious tacks: 1. Go do some fundamentals learning on technical stuff I don't have hands-on technical experience with or 2. go build on new fun stuff.

Does anyone have particular lists of learning topics / syllabi / similar things like that that would be a good fit for "fairly familiar with the broad policy/technical space, but his largest shipped chunk of code is a few hundred lines of python" person like me?

Replies from: Josephm

↑ comment by Joseph Miller (Josephm) · 2024-12-17T19:22:21.079Z · LW(p) · GW(p)

The ARENA curriculum is very good.

comment by davekasten · 2024-09-23T05:00:46.912Z · LW(p) · GW(p)

Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)

Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat

Replies from: davekasten

↑ comment by davekasten · 2024-09-23T19:14:56.376Z · LW(p) · GW(p)

I suspect this won't get published until November at the earliest, but I am already delightfully pleased with this bit:

Canada geese fly overhead, honking. Your inner northeast Ohioan notices that you are confused; it’s the wrong season for them to migrate this far south, and they’re flying westwards, anyways.

A quick Google discovers that some Canada geese have now established themselves non-migratorily in the Bay Area:
"The Migratory Bird Treaty Act of 1918 banned hunting or the taking of eggs without a permit. These protections, combined with an increase in desirable real estate—parks, golf course and the like—spurred a dramatic turnaround for the species. Canada geese began breeding in the Bay Area—the southern end of their range – in the late 1950s."
You nod, approvingly; this clearly is another part of the East Bay’s well-known, long-term philanthropic commitment to mitigating Acher-Risks.

comment by davekasten · 2024-07-28T15:06:23.565Z · LW(p) · GW(p)

I feel like one of the trivially most obvious signs that AI safety comms hasn't gone actually mainstream yet is that we don't say, "yeah, superintelligent AI is very risky. No, I don't mean Terminator. I'm thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?"

Replies from: None, davekasten

↑ comment by [deleted] · 2024-07-28T15:48:22.271Z · LW(p) · GW(p)

I agree (minor spoilers below).

In this context, it's actually kind of funny that (at least the latter half of) Person of Interest is explicitly about a misaligned superintelligent AI, which is misaligned because its creator did not take all the necessary safety precautions in building it (as opposed to one of the main characters, who did). Well, technically it's mostly intent-aligned [LW · GW]; it's just not value-aligned. But still... And although it's mostly just misuse risks, there still is a strong component of just how difficult it is to defend [LW · GW] the world from such AGI-caused threats.

Root in Season 2 is also kind-of just a more cynical and misandrist version of Larry Page, talking about AIs as the "successor species" to humanity and that us "bad apples" should give way to something more intelligent and pure.

↑ comment by davekasten · 2024-07-28T15:06:54.355Z · LW(p) · GW(p)

(This is not an endorsement of Jim Caviezel's beliefs, in case anyone somehow missed my point here.)

comment by davekasten · 2024-11-20T03:54:59.477Z · LW(p) · GW(p)

I'll be in Berkeley Weds evening through next Monday, would love to chat with, well, basically anyone who wants to chat. (I'll be at The Curve Fri-Sun, so if you're already gonna be there, come find me there between the raindrops!)

comment by davekasten · 2024-08-23T22:21:14.342Z · LW(p) · GW(p)

Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?

(or is this just a expected-harm metric rather than a probability metric ?)

comment by davekasten · 2024-08-22T22:39:11.918Z · LW(p) · GW(p)

I am (speaking personally) pleasantly surprised by Anthropic's letter. https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf

comment by davekasten · 2024-05-27T15:27:10.670Z · LW(p) · GW(p)

I'll be at LessOnline this upcoming weekend -- would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.

comment by davekasten · 2024-05-01T22:07:25.092Z · LW(p) · GW(p)

Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the "grieving the possibility that we might all die" moment happened, on average, more years ago for the national security policy person than the AI safety person.

This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the "oh, what happens if we get nuked by Russia" moment in their careers in the Literal 1980s...

comment by davekasten · 2024-10-31T20:57:49.379Z · LW(p) · GW(p)

Ok, so Anthropic's new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds [LW · GW] or @Evan Hubinger [LW · GW] or someone else from Anthropic will, and figure the main convo should happen there, and don't want to incentivize fragmenting of conversation) seems to have a very obvious implication.

Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.

comment by davekasten · 2024-08-23T22:21:24.167Z · LW(p) · GW(p)

davekasten's Shortform

Contents

89 comments