Posts
Comments
I feel the question misstates the natsec framing by jumping to the later stages of AGI and ASI. This is important because it leads to a misunderstanding of the rhetoric that convinces normal non-futurists, who aren't spending their days thinking about superintelligence.
The American natsec framing is about an effort to preserve the status quo in which the US is the hegemon. It is a conservative appeal with global reach, which works because Pax Americana has been relatively peaceful and prosperous. Anything that threatens American dominance, including giving ground in the AI race, appears dangerously destabilizing. Any risks from AI acceleration are literally after-thoughts (a problem for tomorrow, not today).
Absurd as it is, the Trumpist effort to burn the American-led system of global cooperation to the ground is still branded as a conservative return to an imagined glorious past.
The challenge in defeating this conservative natsec framing lies in communicating that radical change is all that is on the menu, but with some options far worse than others. I, for one, currently believe the fatal effect pre-AGI AI will have on democracy and other liberal values, regardless of who wields it, to be a promising rhetorical avenue that should be amplified.
Richard, reading this piece with consideration of other pieces you've written/delivered about Effective Altruism, such as your Lessons from my time in Effective Altruism and your recent provocative talk at EA Global Boston lead me to wonder what it is (if anything) that leads you to self-identify as an Effective Altruist? There may not be any explicit EA shibboleth, but it seems to me to nevertheless entail a set of particular methods, and once you have moved beyond enough of them it may not make any sense to call oneself an Effective Altruist.
My mental model of EA has it intentionally operating at the margins, in the same way that arbitrageurs do, maximizing returns (specifically social ones) to a small number of actors willing to act not just in contrary fashion to but largely in isolation from the larger population. Once we recognize the wisdom of faith it seems to me we are moving back into the normie fold, and in that integration or synthesis the EA exclusivity habit may be a hindrance.
A relevant, very recent opinion piece that has been syndicated around the country, explaining the universal value of faith:
https://www.latimes.com/opinion/story/2025-03-29/it-is-not-faith-that-divides-us
There's a gap in the Three Mile Island/Chernobyl/Fukushima analogy, because those disasters were all in the peaceful uses of nuclear power. I'm not saying that they didn't also impact the nuclear arms race, only that, for completeness, the arms race dynamics have to be considered as well.
There are at least 3 levers of social influence, and I suspect that we undervalue the 3rd one for getting anything done, especially when it comes to AI safety. They are 1) government policy 2) the actions of firms 3) low-coordination behaviors of individuals influenced by norms. There is a subclass of #3 that is having its day in the sun, the behaviors of employees of the US federal government, a.k.a. the "Deep State." If their behaviors didn't matter there wouldn't be a perceived need by the Trump administration to purge them en masse (and replace with loyalists, or not). But if government employees' low-coordination individual choices matter, then so can the choices of members of the general population.
States and firms are modern instruments, whereas (at least if you trust some of the accounts from Dawn of Everything) for about 100,000 years the more organic form of coordination was all humans had, and it worked surprisingly well (for example, people could and did travel long distances and count on receiving shelter from strangers).
As already stated, we rely on norm-observance in government employees performing their duties, and in everyone else to more or less comply with laws in functioning welfare states, but traditional norm enforcement is weakened by liberal laissez fair values. But if one believes (a la Suleyman's The Coming Wave) that AI undermines the liberal welfare state, which is likely to be captured by powerful AI firms, then one shouldn't discount norm-enforced resistance emerging to fill the void for an increasingly disenfranchised population.
It is therefore a mistake to treat the race dynamics of AI development between firms and nation-states as an inevitable force pointing in only one direction. Given a critical mass of people recognizing that AI is bad for them, low-coordination resistance is possible, despite the absence of democratic policy-making.
On the flip side, this also suggests a tipping point where AI economic disruption becomes extremely violent, between powerful government-capturing firms wishing to maintain control and general populations resisting. Thus we should consider the existence of a hidden race between would-be powerful government-capturing firms and a would-be resistant population.
That summary doesn't sound to me to be in the neighborhood of the intended argument. I would be grateful if you pointed to passages that suggest that reading so that I can correct them (DM me if that's preferable).
Where I see a big disconnect is your conclusion that "AI will have an incentive to do X." The incentives that the essay discusses are human incentives, not those of a hypothetical artificial agent.
The subject of your post is the recurring patterns of avoidance you're observing, without mentioning the impact on those more receptive and willing to engage. Nevertheless, I figure you'd still appreciate examples of the latter:
A link to the GD website was sent to me by a relatively distance acquaintance. This is a person of not-insignificant seniority at one of the FAANGs, whose current job is now hitched to AI. They have no specific idea about my own thoughts about AI risk, so my inference is that they send it to anyone they deem sufficiently wonky. The tone of the message in which they sent the link was "holy smokes, we are totally screwed," suggesting strong buy-in to the paper's argument and that it had an emotional impact.
There are two kinds of beliefs, those that can be affirmed individually (true independently of what others do) and those that depend on others acting as if they believe the same thing. They are, in other words, agreements. One should be careful not to conflate the two.
What you describe as "neutrality" to me seems to be a particular way of framing institutional forbearance and similar terms of cooperation in the face of the possibility of unrestrained competition and mutual destruction. When agreements collapse, it is not because these terms were unworkable (except for in the trivial sense that, well, they weren't invulnerable to gaming and do on) but because cooperation between humans can always break down.
@AnthonyC I may be mistaken, but I took @M. Y. Zuo to be offering a reductio ad absurdum response to your comment about not being indifferent between the two ways of dying. The 'which is a worse way to die' debate doesn't respond to what I wrote. I said
With respect to the survival prospects for the average human, this [whether or not the dying occurs by AGI] seems to me to be a minor detail.
I did not say that no one should care about the difference.
But the two risks are not in competition, they are complementary. If your concern about misalignment is based on caring about the continuation of the human species, and you don't actually care how many humans other humans would kill in a successful alignment(-as-defined-here) scenario, a credible humans-kill-most-humans risk is still really helpful to your cause, because you can ally yourself with the many rational humans who don't want to be killed either way to prevent both outcomes by killing AI in its cradle.
You have a later response to some clarifying comments from me, so this may be moot, but I want to call out that my emphasis is on the behavior of human agents who are empowered by automation that may fall well short of AGI. A "pivotal act" is a very germane idea, but rather than the pivotal act of the first AGI eliminating would-be AGI competitors, this act is carried out by humans taking out their human rivals.
It is pivotal because once the target population size has been achieved, competition ends, and further development of the AI technology can be halted as unnecessarily risky.
If an unaligned AI by itself can do near-world-ending damage, an identically powerful AI that is instead alignable to a specific person can do the same damage.
If you mean that as the simplified version of my claim, I don't agree that it is equivalent.
Your starting point, with a powerful AI that can do damage by itself, is wrong. My starting point is groups of people whom we would not currently consider to be sources of risk, who become very dangerous as novel weaponry, along with changes in relations of economic production, unlock the means and the motive to kill very large numbers of people.
And (as I've tried to clarify in my other responses) the comparison of this scenario to misaligned AI cases is not the point, it's the threat from both sides of the alignment question.
Thanks, title changed.
I agree, and I attempted to emphasize the winner-take-all aspect of AI in my original post.
The intended emphasis isn't on which of the two outcomes is preferable, or how to comparatively allocate resources to prevent them. It's on the fact that there is no difference between alignment and misalignment with respect to the survival expectations of the average person.
The title was intended as an ironic allusion to a slogan from the National Rifle Association in the U.S., to dismiss calls for tighter restrictions on gun ownership. I expected this allusion to be easily recognizable, but see now that it was probably a mistake.
An argument for danger of human-directed misuse doesn't work as an argument against dangers of AI-directed agentic activity.
I agree. But I was not trying to argue against dangers of AI-directed agentic activity. The thesis is not that "alignment risk" is overblown, nor is the comparison of the risks the point, it's that those risks accumulate such that the technology is guaranteed to be lethal for the average person. This is significant because the risk of misalignment is typically thought to be accepted because of rewards that will be broadly shared. "You or your children are likely to be killed by this technology, whether it works as designed or not" is a very different story from "there is a chance this will go badly for everyone, but if it doesn't it will be really great for everyone."
I'm surprised by the lack of follow-up to this post and the accompanying thread, which took place in the immediate aftermath of the October 7th massacre. A lot has happened since then -- new data against which the original thinking could be evaluated. Also time has provided opportunity to self-educate about the conflict, which a few people admitted to not knowing a lot about. Given the human misery that has only worsened since the OP started asking questions, I would think that a follow-up would be a worthy exercise. @Annapurna ?
Ever since first hearing the music of the Disney movie "Encanto" I've been sneering at the lyrics "stars don't shine they burn/ and constellations shift" because, no, of course constellations don't shift, without really stopping to think about it. Caught in my epistemic arrogance again!
Oops, Samuel beat me to the punch by 2 minutes.
You've already noted that it doesn't really matter, but I thought I'd help fill in the blanks.
The current global regime of sovereign nation-states that we take for granted is the product of the 20th century. It's not like an existing sovereign nation-state belonging to the Palestinians was carved up by external powers and arbitrarily handed to Jews. Rather, the disintegration of empires created opportunities for local nationalist movements to arise, creating new countries based on varying and competing unifying or dividing factors such as language, tribal associations, and sect. Palestinians and Zionist Jews both had nationalist aspirations during this period, and for various reasons the Zionists came out on top.
The idea that "the Palestinians were there first" is not particularly meaningful or accurate, especially given the historical fact of Judea and Israel as the birthplace of Judaism and the continuous presence of Jewish communities in the region, despite the many events contributing to the creation of a Jewish diaspora.
This response is not unreasonable, but the description of "WW2-style solution" seems ignorant of the fact that Israel did occupy Gaza for decades, and had something very similar to a "puppet government" there, in the form of the Fatah party in control of the Palestinian Authority in the West Bank. Israel unilaterally withdrew in 2005, and Hamas violently took over in 2007.
The rest of it operates under the hypothesis that Hamas is opposed to the objective interests of the Palestinians of Gaza. This ends up being tautological if objective self-interest is defined as 'not being killed during Israeli retaliation.' But this is a very narrow definition. There is a common saying that 'it is better to die on one's feet than to live on one's knees.' One need not drag in religious extremism or poor education as explanatory factors for an average Gazan viewing their own death under Israeli bombardment as an acceptable alternative to living with the indignity of the perpetual Israeli blockade of Gaza, to say nothing of the evisceration of dreams of Palestinian sovereignty.
Note that the preceding was not an endorsement of last week's attack, I'm just calling out the weaknesses in the depiction of Gazans as nothing but uneducated bomb fodder to the Hamas regime.
A helpful counterpoint.
Why would the human beings have to be suicidal, if they can also have the AI provide them with a vaccine?
Thank you. If I understand your explanation correctly, you are saying that there are alignment solutions that are rooted in more general avoidance of harm to currently living humans. If these turn out to be the only feasible solutions to the not-killing-all-humans problem, then they will produce not-killing-most-humans as a side-effect. Nuke analogy: if we cannot build/test a bomb without igniting the whole atmosphere, we'll pass on bombs altogether and stick to peaceful nuclear energy generation.
It seems clear that such limiting approaches would be avoided by rational actors under winner-take-all dynamics, so long as other approaches remain that have not yet been falsified.
Follow-up Question: does the "any meaningfully useful AI is also potentially lethal to its operator" assertion hold under the significantly different usefulness requirements of a much smaller human population? I'm imagining limited AI that can only just "get the (hard) job done" of killing most people under the direction of its operators, and then support a "good enough" future for the remaining population, which isn't the hard part because the Earth itself is pretty good at supporting small human populations.
Hi all, writing here to introduce myself and to test the waters for a longer post.
I am an experienced software developer by trade. I have an undergraduate degree in mathematics and a graduate degree in a field of applied moral and political philosophy. I am a Rawlsian by temperament but an Olsonian by observation of how the world seems to actually work. My knowledge of real-world AI is in the neighborhood of the layman's. I have "Learning Tensorflow.js" on my nightstand, which I promised my spouse not to start into until after the garage has been tidied.
Now for what brings me here: I have a strong suspicion that the AI Risk community is under-reporting the highly asymmetric risks from AI in favor of symmetric ones. Not to deny that a misaligned AI that kills everyone is a scary problem that people should be worrying about. But it seems to me that the distribution of the "reward" in the risk-reward trade-off that gives rise to the misalignment problem in the first place needs more elucidation. If, for most people, the difference between AI developers "getting it right" and "getting it wrong" is being killed at the prompting of people instead of the self-direction of AI, the likelihood of the latter vs. the former is rather academic, is it not?
To use the "AI is like nukes" analogy, if misalignment is analogous to the risk of the whole atmosphere igniting in the first fission bomb test (in terms of a really bad outcomes for everyone -- yes, I know the probabilities aren't equivalent), then successful alignment is the analogue of a large number of people getting a working-exactly-as-intended bomb dropped on their city, which, one would think, on July 15th 1945 would have been predicted as the likely follow-up to a successful Trinity test.
The disutility of a large human population in a world where human labor is becoming fungible with automation seems to me to be the fly in the ointment for anyone hoping to enjoy the benefits of all that automation.
That's the gist of it, but I can write a longer post. Apologies if I missed a discussion in which this concern was already falsified.