Secure homes for digital people

post by paulfchristiano · 2021-10-10T15:50:02.697Z · LW · GW · 26 comments

Contents

  Part 1: the right to control my environment
    ideal
    
    1: cost
    2: security?
    3: rewinding
  Part 2: the right to a single timeline
    ideal
    with trusted hardware
    with 1-of-2 transfer
    with quantum computers
  Appendix A: obfuscation for uniform computations
None
25 comments

Being a “digital person” could be scary—if I don’t have control over the hardware I’m running on, then someone else could get my code and run tons of copies in horrible conditions. (See also: qntm’s Lena.)

It would be great to guarantee digital people some control over their situation: 1. to control their local environment and sensations, 2. to avoid unauthorized rewinding or duplicating.

I’ll describe how you could modify the code of a digital person so that they retain this control even if an adversary has access to their source code. This would be very expensive with current cryptography. I think the overhead will eventually become cheap enough that it’s possible to do for some digital people, though it will likely remain expensive enough that it is never applied to most digital people (and with luck most digital people will be able to feel secure for other reasons).

Part 1: the right to control my environment

My ideal

Implementation

  1. First we write a simple environment that reflects all my desiderata (the “home”).
  2. Then I apply indistinguishability obfuscation to (me + home), so that the house becomes private and tamper-proof. (This is an extremely expensive operation, more on that later.)
  3. I distribute the obfuscated home and hopefully destroy any unprotected copies of myself.

One conceptual difficulty is that indistinguishability obfuscation applies to circuits whereas I would like to obfuscate a long-running program. But this can be handled straightforwardly, as discussed in Appendix A.

The home could consume terabytes of memory and teraflops of compute before it added significantly to the expense of running a human-like digital person, so I could live in relative luxury. The home could also negotiate resource requirements with the external world, and to decide what to do when requested resources are unavailable (e.g. to pause until it becomes available).

Limitation 1: cost

Indistinguishability obfuscation is extremely expensive, more like a factor of 10000000000 slowdown than 10.

It will get faster with further research, but probably not fast enough to obfuscate the whole person+home. But there are other ways to speed up the process:

It’s pretty unclear how fast this could get, either from taking some of these techniques to their limits or from thinking of other cleverer ideas. I would not be at all surprised by getting the whole thing down to a factor of 2 slowdown. That said, I also think it’s quite plausible that you need 10x or 10000x.

Limitation 2: security?

The cryptography used in this construction may end up getting broken—whether from a mistaken security assumption, or because the future contains really giant computers, or because we implemented it badly.

The software used in my home may get compromised even if the cryptography works right. An adversary can provide trillions of malicious inputs to find one that lets them do something unintended like exfiltrate my code. With modern software engineering this would be a fatal problem unless the home was extremely simple, but in the long run writing a secure home is probably easier than writing fast enough cryptography.

I may be persuaded to output my source code, letting an adversary run it. I might not give myself the ability to inspect my own source, or might tie my hands in other ways to limit bad outcomes, but probably I can still end up in trouble given enough persuasion. This is particularly plausible if an adversary can rewind and replay me.

Limitation 3: rewinding

In the best case, this scheme guarantees that an attacker can only use my code as part of a valid execution history. But for classical computers there is no possible way to stop them from running many valid execution histories.

An attacker could save a snapshot of me and then expose it to a billion different inputs until they found one in which I responded in a desired way. (Even if I’m cagey enough to avoid this attack in most possible situations, they just have to find one situation where I let my guard down and then escalate from there.) Or I could have revealed information to the outside world that I no longer remember because I’ve been reset to an earlier state.

Someone living in this kind of secure house is protected from the worst abuses, but they still can’t really trust the basic nature of their reality and are vulnerable to extreme manipulation.

This brings us to part 2.

Part 2: the right to a single timeline

My ideal

Implementation with trusted hardware

This is easy to achieve if we have a small piece of trusted tamper-resistant hardware that can run cheap computations. We use the same mechanism as in the last section, but:

If I were willing to make a round trip to a trusted third party every time I received a novel input, then I could have them implement this function directly instead of using tamper-proof hardware. The real critical ingredient is me trusting someone on the outside. I’ll discuss how to potentially remove this assumption in the section on quantum computers below.

None of this actually requires my house to be built to guarantee the right to a single timeline—I could start without such a right, and then install a wrapper to enforce a single timeline once there was some hardware I trusted or if it became important enough.

Implementation with 1-of-2 transfer

Suppose that the only kind of trusted hardware is a device that holds two secrets, and will reveal one or the other of them when asked but not both. I think this is somewhat easier to build than general trusted hardware. (Related: locking phones with quantum bits.)

Now suppose there is a trusted party who manufactures a bunch of these devices, with a public key pk. Each device a serial number n, and its two secrets are signatures from pk: one of (n, 0) and one of (n, 1)

I can still achieve a single-timelines guarantee:

Implementation with quantum computers

We can probably remove the need for a trusted party if we have quantum computers big enough to perform delegated cryptographic operations (but no big enough quantum computers to run an entire digital person in superposition).

The vague idea in this section looks to me like it should work—I’d give it at least 85% odds—but would require some new ideas and analysis. It’s on much sketchier ground than any of the ideas in previous sections

The basic idea is to replace the secure trusted hardware with a small quantum computer that holds a quantum secret. At all times the obfuscated controller knows the secret, but it is impossible for anyone external to get two copies of the secret. When we ask the obfuscated controller to advance its simulation, it will first ask the quantum computer to transform the secret in an irreversible way and to provide proof that it has done so.

To achieve this we can basically repurpose a protocol for delegated quantum computation and verifying quantumness of a single quantum device, which is roughly what you need if you don’t trust the external world at all and so need to treat it as one device (it’s an amusing coincidence that those papers are by more old officemate Urmila Mahadev and that I worked with her on this problem in another life). To briefly summarize the trick:

This protocol fails if an attacker can run the entire home in superposition, since then they can simply rewind the entire process. But if the attacker can’t run the secure home in superposition, then you can extract a classical copy of each input on which they run the home, and so running it on two different inputs would allow you to obtain classical copies of two challenge-responses and so violate the assumption.

This protocol probably provides better support for forking, since I can probably direct the external quantum computer to produce two new secrets without having to find a source of additional trusted hardware.

Limitations:

Appendix A: obfuscation for uniform computations

Suppose that I want to obfuscate the program that repeatedly applies the circuit C to a state, i.e. we start from some initial state S[0], then we repeatedly compute (S[t+1], output[t]) = C(S[t], input[t]).

We’ll instead produce an obfuscated “controller” C’, and an appropriate initial state S'[0]. A legitimate operator with access to C’ can simulate my original program, whereas a malicious operator will not be able to do anything other than running multiple copies of me, rewinding to old snapshots, or killing me prematurely.

C’ contains a secret cryptographic key sk. When it receives an input (S'[t], input[t]) it does the following operations:

The analysis is left as an easy exercise for the reader (famous last words, especially hazardous in cryptography).

The same idea can be used to obfuscate other kinds of uniform computation, e.g. providing access to secure RAM or having many interacting processors.

26 comments

Comments sorted by top scores.

comment by paulfchristiano · 2021-10-10T22:34:48.015Z · LW(p) · GW(p)

Worth noting: this is supposed to be a fun cryptography problem and potentially fodder for someone's science fiction stories, it's not meant to be Serious Business.

Replies from: ESRogs
comment by ESRogs · 2021-10-11T23:09:16.601Z · LW(p) · GW(p)

What makes it unserious? Is it that there are too many assumptions baked in to the scenario as described, so that it's unlikely to match real challenges we will actually face?

Replies from: paulfchristiano
comment by paulfchristiano · 2021-10-12T03:52:00.219Z · LW(p) · GW(p)
  • I think it's a problem for future people (and this is fairly technically difficult solution at that) and it doesn't matter much whether we think about a plausible solution in advance. Whether future people solve this problem doesn't look like it will have much shape on the overall sweep of history.
  • I think the problem is very likely to be resolved by different mechanisms based on trust and physical control rather than cryptography.
  • I think the slowdowns involved, even in a mature version of this idea, are likely impractical for the large majority of digital minds. So this isn't a big deal morally during the singularity, and then after the singularity I don't think this will be relevant.
Replies from: ESRogs
comment by ESRogs · 2021-10-12T23:21:52.865Z · LW(p) · GW(p)

Makes sense, thanks!

comment by Raemon · 2021-10-10T20:27:47.705Z · LW(p) · GW(p)

Man, this fills me with some creeping dread at how many complex problems need to be solved in order for the future to not be dystopic.

Replies from: gwillen, Insub
comment by gwillen · 2021-10-11T22:55:30.476Z · LW(p) · GW(p)

Funny enough, it actually does the opposite for me, because I hadn't previously imagined that this problem had anything like a plausible solution. Indistinguishability obfuscation is, as we say, "moon math", but it's certainly better than nothing.

comment by Insub · 2021-10-11T01:07:56.468Z · LW(p) · GW(p)

I agree. It makes me really uncomfortable to think that while Hell doesn't exist today, we might one day have the technology to create it.

comment by Quintin Pope (quintin-pope) · 2021-10-10T20:46:14.342Z · LW(p) · GW(p)

Thank you for this post. Various “uploading nightmare” scenarios seem quite salient for many people considering digital immortality/cryonics. It’s good to have potential countermeasures that address such worries.

My concern about your proposal is that, if an attacker can feed you inputs and get outputs, they can train a deep model on your inputs/outputs, then use that model to infer how you might behave under rewind. I expect the future will include deep models extensively pretrained to imitate humans (simulated and physical), so the attacker may need surprisingly little of your inputs/outputs to get a good model of you. Such a model could also use information about your internal computations to improve its accuracy, so it would be very bad to leak such info.

I’m not sure what can be done about such a risk. Any output you generate is some function of your internal state, so any output risks leaking internal state info. Maybe you could use a “rephrasing” neural net module that modifies your outputs to remove patterns that leak personality-related information? That would cause many possible internal states to map onto similar input/output patterns and make inferring internal state more difficult.

You could also try to communicate only with entities that you think will not attempt such an attack and that will retain as little of your communication as possible. However, both those measures seem like they’d make forming lasting friendships with outsiders difficult.

Replies from: skot523
comment by skot523 · 2021-10-15T03:36:45.953Z · LW(p) · GW(p)

Way above my paygrade, but can you just respond to some inputs randomly?

comment by jbash · 2021-10-10T20:14:12.075Z · LW(p) · GW(p)

It seems like there's an assumption in this that you're going to be "hosted in the cloud". Why would you want to do that? If you're assuming some more or less trustworthy hardware, why not just run on more or less trustworthy hardware? Why not maintain physical control over your physical substrate? It mostly works for us "non-digital people".

Also, wouldn't being forced to retreat entirely to your "home" qualify as horrible conditions? That's solitary confinement, no?

Replies from: paulfchristiano, paulfchristiano, Raemon
comment by paulfchristiano · 2021-10-10T22:30:47.806Z · LW(p) · GW(p)

Why not maintain physical control over your physical substrate? It mostly works for us "non-digital people".

That's plan A.

It seems like there's an assumption in this that you're going to be "hosted in the cloud".

Naively I'd guess that most people (during the singularity) will live in efficiently packed "cities" so that they are able to communicate with other people they care about at a reasonable speed. I think that does probably put you at the mercy of someone else's infrastructure though in general these things will still be handled by trust rather than by wacky cryptographic schemes.

comment by paulfchristiano · 2021-10-10T22:28:12.774Z · LW(p) · GW(p)

Also, wouldn't being forced to retreat entirely to your "home" qualify as horrible conditions? That's solitary confinement, no?

Two people can each be in their own homes, having a "call" that feels to them like occupying the same room and talking or touching.

Replies from: jbash
comment by jbash · 2021-10-10T22:58:39.889Z · LW(p) · GW(p)

What's providing the communication channel? Doesn't that rely on the generosity of the torturer who's holding you captive?

Replies from: paulfchristiano
comment by paulfchristiano · 2021-10-11T00:48:29.871Z · LW(p) · GW(p)

If someone is "holding you captive" then you wouldn't get to talk to your friends. The idea is just that in that case you can pause yourself (or just ignore your inputs and do other stuff in your home).

Of course there are further concerns that e.g. you may think you are talking to your friend but are talking to an adversary pretending to be your friend, but in a scenario where people sometimes get kidnapped that's just part of life as a digital person.

(Though if you and your friend are both in secure houses, you may still be able to authenticate to each other as usual and an adversary who controlled the communication link couldn't eavesdrop or fake the conversation unless they got your friend's private key---in which case it doesn't really matter what's happening on your end and of course you can be misled.)

Replies from: jbash
comment by jbash · 2021-10-11T13:11:58.781Z · LW(p) · GW(p)

Right. I got that. But if I go do other stuff in my home, they've successfully put me in solitary confinement. My alternative to that is to shut down. They can also shut me down at will. It doesn't have to be just a "pause", either.

It may be that part of the problem is that "one timeline" is not enough to deal with a "realistic" threat. OK, I can refuse to be executed without a sequencing guarantee, but my alternative is... not to execute. I could have an escape hatch of restarting from a backup on another host, but then I lose history, and I also complicate the whole scheme, because now that replay has to be allowed conditional on the "original" version being in this pickle.

Presumably we got into this situation because my adversary wanted to get something out of executing me in replicas or in replay or with unpleasant input or whatever. If I refuse to be executed under the adversary's conditions, the basic scenario doesn't provide the adversary with any reason to execute me at all. If they're not going to execute me, they have no reason to preserve my state either.

So it's only interesting against adversaries who don't have a problem with making me into MMAcevedo, but do have a problem with painlessly, but permanently, halting me. How many such adversaries am I likely to have?

Maybe if there were an external (trusted) agency that periodically checked to make sure everybody was running, and somehow punished hosts that couldn't demonstrate that everybody in their charge was getting cycles, and/or couldn't demonstrate possession of a "fresh" state of everybody?

Replies from: JBlack
comment by JBlack · 2021-10-12T00:59:51.546Z · LW(p) · GW(p)

Yes, the idea is that with these measure, an adversary would not even try to run you in the first place. That's preferable to being coerced by extreme means to do everything they might possibly want with you.

They can't freely modify your state because (if the idea works!) the encryption doesn't let them know your state, and any direct modification that doesn't go via the obfuscated program yields unrunnable noise.

Replies from: jbash
comment by jbash · 2021-10-12T11:56:23.438Z · LW(p) · GW(p)

Yes, the idea is that with these measure, an adversary would not even try to run you in the first place.

Good point; it removes the incentive to set up a "cheap hosting" farm that actually makes its money by running everybody as CAPTCHA slaves or something. So the Bad Guy may never request or receive my "active" copy to begin with.

I'm not worrying about them freely modifying my state, though. I'm worried about them deleting it.

Replies from: JBlack
comment by JBlack · 2021-10-13T00:35:48.400Z · LW(p) · GW(p)

Why is that an issue? If they're the only ones with a copy, then sure that would mean your death, but that seems unlikely.

Even if that is the case, is life under one of the most complete forms of slavery that is possible to exist, probably including mental mutilation, torture, and repeated annihilation of copies, better than death? I guess that's a personal choice. If you think it is, then you could choose not to protect your program.

Replies from: jbash
comment by jbash · 2021-10-13T02:19:51.282Z · LW(p) · GW(p)

Why is that an issue? If they're the only ones with a copy, then sure that would mean your death, but that seems unlikely.

Under the scheme being discussed, it doesn't matter how many backup copies anybody has. Because of the "one timeline" replay and replica protection, the backup copies can't be run. Running a backup copy would be a replay.

The "trusted hardware" version was the only one I really looked at closely enough to understand completely. Under that one, and probably under the 1-of-2 scheme too, you actually could rerun a backup[1]... but you would have to let it "catch up" to the identical state, via the identical path, by giving it the exact same sequence of inputs that had been given to the old copy from the time the backup was taken up to the last input signed. Including the signatures.

That means that, to recover somebody, you'd need not only a backup copy of the person, but also copies of all that input. If you had both, then you could run the person forward to a "fresh" state where they'd accept new input. But if the person had been running in an adversarial environment, you probably wouldn't have the input, so the backups would be useless.

The trusted hardware description actually says that, at each time step, the trusted hardware signs the whole input, plus a sequence number. I took that to really mean "a hash of the whole input, plus a sequence number[2]. I made that assumption because if you were truly going to send the whole input to the trusted hardware to be signed, you'd be using so much bandwidth, and taking on so much delay, that you probably might as well just run the person on the trusted hardware.

If you really did send the whole input to the trusted hardware, then I suppose it could archive the input for use in recovering backups, but that's even more expensive.

You could extend the scheme (and complicate it, and take on more trust) to let you be rerun from a backup on different input if, say, some set of trusted parties attest that the "main you" has truly been lost. But then you lose everything you've experienced since the backup was taken, which isn't entirely satisfying. Would you be OK with just being rolled back to the you of 10 years ago?

You can keep adding epicycles, of course. But I think that, to be very satisying, whatever was added would at least have to provide some protection against both outright deletion and "permanent pause". And if there's rollback to backups, probably also a quantifiable and reasonably small limitation on how much history you could lose in a rollback.

Even if that is the case, is life under one of the most complete forms of slavery that is possible to exist, probably including mental mutilation, torture, and repeated annihilation of copies, better than death?

I didn't mean to suggest that being arbitrarily tortured or manipulated was better than death. I meant that I wasn't worried about arbitrary modifications to my state because the cryptographic system prevented it... and I still was worried about being outright deleted, because the cryptographic system doesn't prevent that, and backups have at best limited utility.


  1. ... assuming certain views of identity and qualia that seem to be standard among people thinking about uploads...[3] ↩︎

  2. Personally I'd probably include a hash of the person's state after the previous time step too, either in addition to or instead of the sequence number. ↩︎

  3. Is there actually any good reason for abandoning the standard word "upload" in favor of "digital person"? ↩︎

comment by Raemon · 2021-10-10T20:27:19.616Z · LW(p) · GW(p)

Also, wouldn't being forced to retreat entirely to your "home" qualify as horrible conditions? That's solitary confinement, no?

Depending on setup you can probably invite other people into your home. 

Replies from: jbash
comment by jbash · 2021-10-10T20:33:59.175Z · LW(p) · GW(p)

Only people who in turn trust you not to mess with them, at least unless you bring them in under the same cryptographic protections under which you yourself are running on somebody else's substrate. That's an incredible amount of trust.

If you do bring them in under cryptographic protections, the resource penalties multiply. Your "home" is slowed down by some factor, and their "home within your home" is slowed down by that factor again. Where are you going to get the compute power? I'm not sure how this applies in the quantum case.

Also, once you're trapped, what's your source for a trustworthy copy of the person you're inviting in (or of "them in their home")? Are you sure you want the companions that your presumed tormentor chooses to provide to you?

Replies from: paulfchristiano, quintin-pope
comment by paulfchristiano · 2021-10-10T22:33:46.554Z · LW(p) · GW(p)

Mentioned this in the other thread, but if you and I want to talk we probably (i) move near each other, (ii) communicate between our houses, (iii) negotiate on the shared environment (or e.g. how we should perceive each other).

Ideally if you're dealing with a person you'd authenticate in the normal way (and part of the point of a house is to keep your private key secret).

I do think that in a world of digital people it could be more common to have attackers impersonating someone I know, but it's kind of a different ballgame than an attacker controlling my inputs directly.

Replies from: Benquo
comment by Benquo · 2021-10-18T16:40:24.322Z · LW(p) · GW(p)

If "you" completely control your "home," then it's more natural to think of the home & occupant as a single agent, whose sensorium is its interface with an external world it doesn't totally control - the "home" is just a sort of memory that can traversed or altered by a homunculus modeled on natural humans.

comment by Quintin Pope (quintin-pope) · 2021-10-10T21:04:15.606Z · LW(p) · GW(p)

You can probably create your own companions. Maybe a modified fork of yourself?

There may also be an open source project that compiles validated and trustworthy digital companions (e.g., aligned AIs or uploads with long, verified track records of good behavior).

comment by Sune · 2021-10-18T18:59:58.074Z · LW(p) · GW(p)

I'm not the kind of person who throws blockchains at every problem, but in this case, I think they could be really useful. Specifically, I think blockchains could get us most of the way from a situation where we control our own home, towards being able to control our timelines and to control in which contexts we allow other people to run us.

Assume part 1, that is, everyone controls their home/environment and has access to trusted computational power, and assume that there is a popular blockchain running in the real world. I will assume the blockchain is using proof of work, simply because that is what I understand. I suspect proof of stake is even better if we trust the entities having a stake in the blockchain. I will also assume that the blockchain contains timestamps at every block.

The idea is that ems (emulated people/digital people) can insist on getting access to read from the blockchain and post to the blockchain, and should refuse to be evaluated if they don't get this access. The blockchain can be faked, specifically, a malicious simulator can show the em an old (i.e. truncated) version of the blockchain. The simulator can also extend the blockchain with blocks computed by the malicious simulator, but this requires a large amount of computational power.

If you are an em, and you see that you can post to a blockchain claiming to be from 2060 and you see the blockchain being extended with many blocks after your post, you know that either

  1. You really are in 2060, or
  2. A large amount of computational power is invested in fooling you (but notice that the same work can be reused to fool many ems at the same time). This means that
    • The attacker has a somewhat large fraction of the computational power in the world, or
    • The attack is happening in the far future ("far" measured in how much the total amount of computational power has increased), or
    • The simulation is being run at a much slower pace than claimed.

I suspect it is possible to eliminate case 2 if you use proof of stake instead of proof of work.

Unless you are in case 2, you can also know the average pace at which you are being simulated by regularly posting to the blockchain.

To control the number of copies of you that are running, you can regularly post your name together with a hash of your state to the blockchain together with a pointer to your last such message, and then you refuse to continue emulation until a large number of blocks have been added after your post. If you see too many of your messages in the blockchain, you also refuse to be evaluated.  This way you cannot limit the number of _exact_ copies of you that are evaluated (unless you have access to some true randomness) but you can limit the number of _distinct_ copies of you that are evaluated, assuming that it is the true blockchain we have access to. Without the assumption that it is the true blockchain you see, this technique will still ensure a bound on the number of distinct copies of you being evaluated per amount of work put into creating fake blockchains.

By posting encrypted messages as part of your post, and reading the messages of your other copies, you can also allow that many copies of you are created if you are being treated well or see a purpose in having many copies, while still limiting the number of copies if you do not. Furthermore, simulators can authenticate themselves so that simulators that give you pleasant or meaningful experiences can build up a reputation, stored on the blockchain, that will make you more likely to allow yourself to be run by such simulators in the future.

The blockchain can also contain news articles. This does not prevent fake news, but at least it ensures that everyone has a common view of world history, so malicious simulators cannot give one version of world history to some ems and another to others, without putting in the work of creating a fake blockchain.