post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by paulfchristiano · 2022-08-20T01:28:36.064Z · LW(p) · GW(p)
  • I'm relatively skeptical of the importance of this kind of boxing, because I don't think side-channel attacks by incredibly sophisticated AI are likely to be an important part of the problem. Instead I expect AI systems to be broadly deployed in ways that let them cause lots of trouble (or even used internally within the AI firm in ways that cause lots of trouble) well before an AI used in an incredibly limited way would be transformative.
  • If doing this kind of boxing, it seems very unlikely that the overhead of FHE would be acceptable. Even a factor of 10 seems like pushing it a whole lot. I expect the action would be in much more mundane forms of protection against side channels with orders of magnitude lower cost. (And e.g. conservatism about what you do with AI outputs and how much you allow your AI to pursue incomprehensible plans.)
  • I think unconditional security for FHE is wildly out of reach, that's more ambitious than P vs NP which I'd hope is significantly harder than alignment. I think that "FHE that is applicable to massive compute-intensive applications" is also extremely difficult but less totally crazy.
  • Overall my guess would be that random tangentially relevant questions have better cost/benefit (e.g.: what's the complexity of games between two teams of non-communicating provers?), but that hopefully we could give prizes for stuff that's much more on target than either.
  • I'd guess that random investment in security or cryptography, even in ways totally undirected at AI, are likely also better bang for your buck as well.

Also it's worth noting that the LW post in question is quite old; I was mostly amused by the thought of encryption protecting us from the data instead of the other way around, but this was before I'd really gotten into AI safety.

Replies from: None
comment by [deleted] · 2022-08-20T06:17:30.704Z · LW(p) · GW(p)Replies from: Thane Ruthenis
comment by Thane Ruthenis · 2022-08-24T13:41:55.460Z · LW(p) · GW(p)

Is there anywhere I can read more on [Paul's] views on this part?

This [LW · GW] and this [LW · GW] seem relevant.

Replies from: None
comment by [deleted] · 2022-08-26T06:57:28.244Z · LW(p) · GW(p)
comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-08-19T23:11:57.946Z · LW(p) · GW(p)

I agree with DavidHolmes that this doesn't seem particularly worthwhile, since there's already profit-motive and motivated researchers working on it. Also, I think there are much more technologically-mature AI boxing infrastructures that could be built today without need for additional inventions. For instance, data diodes leading into Faraday cages with isolated test hardware that gets fully reset following each test. 

comment by DavidHolmes · 2022-08-19T19:29:19.703Z · LW(p) · GW(p)
  1. I think that

provable guarantees on the safety of an FHE scheme that do not rely on open questions in complexity theory such as the difficulty of lattice problems.

is far out of reach at present (in particular to the extent that there does not exist a bounty which would affect people’s likeliness to work on it). It is hard to do much in crypto without assuming some kind of problem to be computationally difficult. And there are very few results proving that a given problem is computationally difficult in an absolute sense (rather than just ‘at least as hard as some other problem we believe to be hard’). C.f. P vs NP. Or perhaps I misunderstand your meaning; are you ok with assuming e.g. integer factorisation to be computationally hard?

Personally I also don’t think this is so important; if we could solve alignment modulo assuming e.g. integer factorisation (or some suitable lattice problem) is hard, then I think we should be very happy…

  1. More generally, I’m a bit sceptical of the effectiveness a bounty here because the commercial application of FHE are already so great.

  2. About 10 years ago when I last talked to people in the area about this I got a bit the impression that FHE schemes were generally expected to be somewhat less secure than non-homomorphic schemes, just because the extra structure gives an attacker so much more to work with. But I have no idea if people still believe this.

Replies from: None
comment by [deleted] · 2022-08-19T20:39:44.488Z · LW(p) · GW(p)
comment by trevor (TrevorWiesinger) · 2022-08-20T01:06:01.611Z · LW(p) · GW(p)

From a policy angle, this is a great idea and highly implementable. This is for four reasons:

  1. It fits well into the typical policymaker's impulse to contain, control, and keep things simple.
  2. It allows progress to continue, albeit in an environment appropriate to the risk.
  3. Spending additional money on containment strategies does not risk particular AI lab or country ending up being "behind on AI" in the near-term.
  4. Policymakers are more comfortable starting years ahead of time working on simple concepts In this case. building the cage before you try to handle the animal.
comment by catkage · 2022-08-20T15:22:44.845Z · LW(p) · GW(p)

People have already commented on why FHE here might not be the most useful thing, however I wouldn't completely dismiss the idea of cryptography helping with AI safety. For example, I have recently been thinking about building/training agents simultaneously with a 'filter' domain-specific AI of sorts to achieve a vague form of sandboxing, such the the output of the core agent/model is gibberish without the 'filter' serving as a key with a hard to solve inverse problem,