Beyond Defensive Technology

post by ejk64 · 2024-10-14T11:34:24.595Z · LW · GW · 1 comments

Contents

    Claim 1: Defensive technologies are status-quo preserving
    Claim 2: Not all defensive technologies are inherently good
    Claim 3 (bonus): There are lots of other considerations that might be valuable
      Coordination engineering
      Anti risk-compensation engineering
  Conclusion
None
1 comment

I’ve been pretty confused by what it means for a technology or operation to be ‘defensive’. 

Technologies do things. What does it mean for a technology to, like, be something that stops bad things? Is an anti-missile missile the same as a missile? Can we come up with a classification system that feels a bit more systematic than Vitalik Buterin’s classification based on whether things are big or not? Can we extend the great work being done around resilience and adaptation to risks from AI to technology more broadly? 

And perhaps most crucially: if we can make technologies that stop bad things, does that mean that they’re inherently good and we should go ahead and def/acc making them? 

In this short blog post, I make three claims. 

1. Good ‘defensive’ technologies are status-quo preserving or entropy minimising. 

The most obvious view of defensive technologies is as tools that counter deliberate attempts to disrupt a valuable system. However, we often want to design systems that defend against disruption without a conscious perpetrator (e.g. sprinkler systems against accidental fires). 

For this reason, I’ve started thinking about ‘narrowly defensive’ technologies as a subcategory of ‘broadly defensive’ technologies (e.g. sprinklers) which broadly work to preserve the status-quo. For convenience, I’ll refer to the latter here as ‘defensive technologies’, but you could call them ‘anti-entropic’ or ‘status-quo preserving’ as preferred.

Specifically, this set of technologies either helps to 1) secure the status quo against interventions that would change it, 2) identify interventions that would change it or 3) return the situation to normal as quickly as possible after an intervention has changed it. 

2. Second, and as such, not all defensive technologies are inherently ‘good’. 

Making the status quo harder to change is not always a good thing. But it’s not inherently a bad thing, either. 

Most defensive technologies might defend things that most people agree are good. But some defensive technologies might defend things that you might think are bad, and/or might make it harder for the people who do follow your value system to change this. As a basic example, you can encrypt biosecurity information secrets to prevent people from hacking and revealing it, but you can also encrypt child pornography

Broadly, I think most defensive technologies are net-good, because I think there’s a lot of consensus about good directions for humanity. However, I think there might be times when defensive technologies might make things worse (like technologically-enhanced dictatorship, below). Carefully considering the potential negative consequences of developing defensive technologies and actively creating strategies to mitigate them remains essential.

Moreover, defensive technologies equip their beholders with power that can incentivise them to initiate conflict. If you know that you’ve got better bunkers, you might want to start more wars. It might even trigger your enemies to start an attack. 

3. If you want technology to make the world a better place, there are at least three other types of technology you should consider:

These include:

Quick note before we begin: this post builds on existing theories of defensive technologies, but does not go into them in detail. To learn more about other theories, see:

Claim 1: Defensive technologies are status-quo preserving

There are lots of technologies that I often see being described as ‘defensive’. What unites them is that they are useful for minimising uncertainty or protecting the status quo. To make that more clear, I’ve split them into three categories:

Let’s break these down in more detail.

Securing technologies make sure that a pre-decided series of actions do in fact happen. For instance: 

Assuring technologies help humans to work out what is going to happen, in areas that we don’t directly have control, so we can keep the status quo stable. For instance: 

Finally, Insuring technologies help humans to ‘bounce back’ in scenarios where bad things do happen, to reassume the status quo as quickly as possible. For instance: 

It’s worth noticing two things here. First, these technologies work in an ecosystem together, which we might think of as a defensive ecosystem. So effective prediction and effective red-teaming might support effective underwriting, or effective security interventions. Effective underwriting in turn might keep companies solvent in the case of failures, allowing for further investment in effective security development, generating revenues that flow into companies doing further prediction, and so on. This might take the form of a virtuous cycle, propagating through a network of organisations. 

Second—and perhaps obviously—defensive technologies do seem really good. I think to a large degree they could be. I’m skeptical, however, that they would always improve the world. In the next section I suggest why. 

Claim 2: Not all defensive technologies are inherently good

A lot of the applications of defensive technologies seem very close to inherently good. Humans aren’t that great at getting things right, or having what they want to happen happen, and most deviations from intended pathways are just bad and serve no purpose other than chaos. Technologies that secure against non-human hazards like fires, pandemics, and the natural breakdown of machines over time seem like very good things. Same with insurance. Perhaps one reason why I’m so optimistic about defensive technologies is that I’m a big believer that there are quite a few things that almost everyone agrees on and which should be preserved by default. 

However, some categories of defensive technologies might serve to protect human systems against the efforts of humans who would like to dismantle those systems. Indeed, this seems most true to the ‘defender’ framing, which implies an intentional attack. 

This would be an okay state of affairs, if the attackers/defenders framing was not subjective. 

Just as one man’s terrorist is another man’s freedom fighter, so can people use defensive technology to defend ‘the wrong thing’. When you’re developing a technology, you might have a very clear idea about who you want to develop that technology and why. This might work out, but it’s also possible that ideas might be copied by other groups and defend value systems different from your own. And when you’re publishing open science, it might be very hard to stop ‘defensive science’ from being used by other groups to create ‘defensive technology’ that undermine your values. We can’t always be confident that the work we do to support defensive science and technology will always be used to support values that we approve of.

Take an area of research that is as obviously defensive as it gets: research to make general purpose AI models more resilient to jailbreaking attacks, also known as AI robustness. This research is critical to ensuring that powerful systems are defended against malicious actors who might try to disable or misuse them, and consequently a key pillar of AI safety research for good reason.

Yet at the same time, AI robustness might assist nefarious aims. Imagine the case of a totalitarian dictator who uses snooping models pre-installed in national hardware to monitor how his subjects use the internet, or agents which ‘clean up’ undesirable opinions from online platforms. Their subjects dream of building digital spaces where they can plan their insurrection, but they cannot jailbreak the models or otherwise divert them from their rigid paths. ‘Defensive technology’ can serve to calcify the power difference between oppressor and oppressed. 

This might seem a slightly contrived example, but it remains the case that for almost any defensive technology you can imagine, there exists a negative potential application: 

Basically, if you’re unhappy with the state of affairs you’re living in, then you probably don’t want anyone to develop technology that makes it harder to change that state of affairs. Whilst many people around the world might broadly support their regimes becoming harder to challenge, others might find this less desirable. But when you develop science or technology, it’s really hard to stop that information from exfiltrating out into the world. 

This isn’t to say that defensive technologies are never useful. There might be a lot of values that most humans agree on, and developing technologies to defend these is a robustly good thing. However, people should think carefully about both the values that they are attempting to protect and the extent to which these technologies might be used to protect values that contradict them. The situation is often more complicated than ‘helping defenders do their job is good’. 

Some cases where it still might be worthwhile releasing a defensive technology that could help malicious actors to defend values antithetical to yours:

Claim 3 (bonus): There are lots of other considerations that might be valuable

Defensive or ‘status-quo preserving’ technologies aren’t the only way to develop technologies that can improve the future. I’m interested in specific interventions that make the future better by making suffering less bad and less likely. 

I nearly considered ending the piece here: the following is more notational and uncertain. However, I’d be super interested in people’s comments on this (what’s most effective? Where are the low hanging fruit) so I’m including it a bonus.

Offensive de-escalation

Developing substitute technologies that achieve the intended outcomes with minimal suffering and maximal reversibility seems like a robustly good thing. These sorts of technologies have been well mapped out around weapons (guns → rubber bullets, tasers, bean-bag bullets etc.). But I haven’t seen as much literature around what substitutes would look like for cyberattacks, sanctions, landmines (e.g. ones that deactivate automatically after a period of time or biodegrade), missiles etc. Maybe this is something I should look out for? 

(Note: less harmful substitutes for offensive technologies may encourage greater use: see ‘anti-risk compensation engineering’ for thoughts on this, below).

Coordination engineering

I’m interested in the ways in which AI technologies might help to resolve conflicts and solve public goods problems. 

One area I’m interested in relates to AI-based conflict resolution systems. These might work at the interpersonal level (solicitor-agents), the inter-corporate level, or the international level. Consider the benefits of a system that was capable of organising complex multi-national coordination in conflict scenarios: 

I think that these systems could become quite effective at searching the problem space to find the optimal outcome for all parties. This might take conflicts from the point of ‘costly, but negotiation is out of reach’ to ‘resolvable’. They might also be able to review or flag potential unintended side effects of clauses, helping to reduce the likelihood of future conflicts.

Maybe at the larger scale this looks more like markets. For instance, consider a global platform that tracks carbon emissions in real-time and automatically rewards or penalises countries and companies based on their carbon footprint. In this version, rather than mapping actors (countries) preferences onto a specific treaty, they’re projected onto a market which rewards and penalises actors in real time. Maybe these sorts of systems could help incentivise actors to reduce emissions and coordinate global environmental efforts.

Anti risk-compensation engineering

Instead of thinking about constitutional AI, let’s think about making the human use case constitutional. In this world, if a system detected that it was being used to perpetrate harms, it might shut down or limit the user’s ability to deploy it. 

For instance, if a country develops highly advanced autonomous weapon systems, they might become more likely to escalate conflicts, believing they have an upper hand. A global safeguard could ensure that if such weapons are used to provoke unjust conflict, they automatically malfunction or turn off, maintaining a balance of power.

In practice, I think doing this in examples such as the above would be extremely difficult, as organisations are very unlikely to accept technologies with backdoors, or which might render them useless in certain situations. However, there might still be examples in different domains where this was appropriate, or certain geopolitical scenarios that this would be relevant to. 

Conclusion

Defensive technology is technology that defends something, and whether that something is good or bad is often a difficult question. We cannot abrogate the responsibility of thinking through the difficult value trade-offs simply by saying that ‘defense is good’. Specific examples might be, but they should have robust theories as to why that is the case. Most importantly, it might not always be useful to ‘accelerate’ through these value debates, lest we wind up helping actors to defend values that we never subscribed to in the first place. 

If there’s one thing you should take away from this, it’s that building technologies that differentially improve the future might be really hard. It’s important to have clear theories of change for the technologies you build, a clear view of the negative consequences that they might have, and strategies for mitigating them. On the other hand, there may be options—like coordination engineering, anti-risk-compensation engineering, and substitute technologies—that present ways to improve the future beyond defensive technologies.

 

Thanks to Jamie Bernardi, Jack Miller and Tom Reed for their comments on this piece.

1 comments

Comments sorted by top scores.

comment by lsusr · 2024-10-14T18:28:58.089Z · LW(p) · GW(p)

In terms of preserving a status quo in an adversarial conflict, I think a useful dimension to consider is First Strike vs. Second Strike [LW · GW]. The basic idea is that technologies which incentivise a preemptive strike are offensive, whereas technologies which enable retaliation are defensive.

However, not all status-quo preserving technologies are defensive. Consider disruptive[1] innovations which flip the gameboard. Disruptive technologies are status-destroying, but can advantage the incumbent or the underdog. They can make attacks more or less profitable. I think "disruptive vs sustaining" is a different dimension that should be considered orthogonal to "offensive vs defensive".

But I haven’t seen as much literature around what substitutes would look like for cyberattacks, sanctions, landmines (e.g. ones that deactivate automatically after a period of time or biodegrade), missiles etc.

Here's a video by Perun, a popular YouTuber who makes hour-long PowerPoint lectures about defense economics. In it, cyberattack itself is considered a substitute technology used to achieve political aims through an aggressive act less provocative than war.

They might help countries to organise more complex treaties more easily, thereby ensuring that countries got closer to their ideal arrangements between two parties…. It might be that there are situations in which two actors are in conflict, but the optimal arrangement between the two groups relies on coordination from a third or a fourth, or many more. The systems could organise these multilateral agreements more cost-effectively.

Smart treaties have existed for centuries, though they didn't involve AI. Western powers used them to coordinate against Asian conquests. [LW · GW] Of course, they didn't find the optimal outcome for all parties. Instead, they enabled enemies to coordinate the exploitation of a mutual adversary.


  1. I'm using the term "disruptive" the way Clayton Christenson defined it in his book The Innnovator's Dilemmma where "disruptive technologies" are juxtiposed against a "sustaining technology". ↩︎