Protecting agent boundaries
post by Chipmonk · 2024-01-25T04:13:50.993Z · LW · GW · 6 commentsContents
How agent boundaries get violated Protecting agent boundaries 1. There was a potential threat. 2. There was a collision. 3. The victim failed to defend itself. How human societies already try to solve this problem How this applies to AI safety: Empower membranes to be better at self-defense None 6 comments
If the preservation of an agent's boundary is necessary for that agent's safety [LW · GW], how can that boundary/membrane [? · GW] be protected?
How agent boundaries get violated
In order to protect boundaries, we must first understand how they get violated.
Let’s say there’s a cat, and it gets stabbed by a sword. That’s a boundary violation (a.k.a. membrane piercing). In order for that to have happened, three conditions must have been met:
- There was a sword.
- The cat and the sword collided.
- The cat wasn’t strong enough to resist penetration from the sword.
More generally, in order for any existing membrane to be pierced, three conditions must have all been met:
- There was a potential threat. (E.g., a sword, or a person with a sword.)
- The moral patient and the threat collided.
- The victim failed to adequately defend itself. (Because if the cat was better at self-defense — if its skin was thicker or if it was able to dodge — then it would not have been successfully stabbed.)
Protecting agent boundaries
Each of these three conditions then implies ways of preventing boundary violations (a.k.a. membrane piercing):
1. There was a potential threat.
- → Minimize potential threats
2. There was a collision.
- → Minimize dangerous collisions
- → Predict and prevent collisions before they occur.
- → Prevent collisions by putting distance between threats and moral patients.
- → Prevent premeditated collisions by pre-committing to retribution.
3. The victim failed to defend itself.
- → Empower the membranes of humans and other moral patients to be better at self-defense.
How human societies already try to solve this problem
As a helpful analogy, here’s some examples of how modern human societies try to solve this problem:
Minimize potential threats
- Restrict access to weapons (e.g., nukes, bioweapons, etc.)
- Minimize potential perpetrators (i.e., e.g., some fictional societies predict and eliminate potential psychopaths).
Minimize dangerous collisions
- Protect high-risk individuals, e.g. put them witness protection
- Prevent collisions before they occur, e.g. predictive policing, traffic lights.
- Police crimes after they occur.
Empower membranes to be better at self-defense
- Infosec defense: Use good security practices and strong encryption.
- Biological defense: Develop and use beneficial vaccines.
- Manipulation defense: Reduce unhelpful cognitive biases and emotional insecurities.
How this applies to AI safety:
Minimize potential AI threats
(this is obvious/boring so I'm omitting it)
Minimize dangerous AI collisions
(this is obvious/boring so I'm omitting it)
Empower membranes to be better at self-defense
Empower the membranes of humans and other moral patients to be more resilient to collisions with threats. Examples:
- Manipulation defense: You have an AI assistant that filters potentially-adversarial information for you.
- Crime defense: Police have AI assistants that help them predict, deduce, investigate, and prevent crime.
- Physical threat defense: (If nanotech works out) You have an AI assistant that shields you from physical threats.
- Biological defense: Faster better vaccines, personal antibody printers, etc.
- Cybersecurity defense: Good security practices and strong encryption. Software encryption can be arbitrarily strong.
- Legal defense: personal AI assistants for e.g. interfacing with contracts and the legal system.
- Bargaining: personal AI assistants for negotiation.
- Human intelligence enhancement [? · GW]
- Cyborgism [? · GW]
- Mark Miller and Allison Duettmann (Foresight Institute) outline more ideas in the form of “Active Shields” here: 7. DEFEND AGAINST PHYSICAL THREATS | Multipolar Active Shields. Cf Engines of Creation by Eric Drexler.
- Related: We have to Upgrade [LW · GW] – Jed McCaleb
6 comments
Comments sorted by top scores.
comment by the gears to ascension (lahwran) · 2024-01-25T04:42:32.543Z · LW(p) · GW(p)
solid, but I still think you're missing structure that makes this approach less effective than it seems on the face:
in full generality, what's a "threat"?
in full generality, what's a "dangerous" collision?
I worry that the current failure mode of attempting to empower in order to defend is that the defense is actually used to strike inside another's boundary, as has been the case for ~all weapons
Replies from: Chipmonk, Chipmonk↑ comment by Chipmonk · 2024-01-25T05:02:25.282Z · LW(p) · GW(p)
is that the defense is actually used to strike inside another's boundary, as has been the case for ~all weapons
Yeah, I am worried about this.
This is notably not the case for infosec and encryption, where defensive capability doesn't imply offensive capability. However, I'm unsure if this is also true for any physical interventions. (e.g.: Vaccines? No, bioweapons… Nanotech? No…)
That said, physical interventions do seem to be defense-dominant when there is coordination among a sufficiently large portion of society/power.
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2024-01-25T18:11:12.468Z · LW(p) · GW(p)
I don't think I'm convinced physical interactions are defense dominant. The easiest-to-formally-certify defense is to enclose something in a hunk of impenetrable matter, and that only can be certified up to a given impact energy level. Above that energy level, the defense will simply be stripped away. Only MAD seems able to be game theoretically durable, and certifying that a MAD situation will endure requires proving through a simulation of the opposition.
comment by VojtaKovarik · 2024-01-25T19:58:36.100Z · LW(p) · GW(p)
Might be obvious, but perhaps seems worth noting anyway: Ensuring that our boundaries are respected is, at least with a straightforward understanding of "boundaries", not sufficient for being safe.
For example:
- If I take away all food from your local supermarkets (etc etc), you will die of starvation --- but I haven't done anything with your boundaries.
- On a higher level, you can wipe out humanity without messing with our boundaries, by blocking out the sun.
↑ comment by Chipmonk · 2024-01-25T21:16:33.858Z · LW(p) · GW(p)
Yes, see Agent membranes/boundaries and formalizing “safety” [LW · GW] and davidad's comment.
(Also, I'm not necessarily agreeing that your examples are not violations of boundaries. First one isn't a violation of end-person (although probably the farmer). Second one could be.)