Machine Learning vs Differential Privacy
post by Ilio · 2023-01-05T15:14:05.489Z · LW · GW · No commentsThis is a question post.
Contents
Answers 13 rpglover64 None No comments
edit: see below for clarifications by expert domain rpglover64 and a good pick of references from the gears to ascenscion. My one (cheatingly long) sentence takeaway is: it’s clear training does not automatically lead to DP, it’s unclear if DP can always or seldom help training, it’s likely that easy algorithms are not available yet, it’s unlikely that finding one is low hanging fruit.
From wikipedia, « an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation ». In other words, if some training process asymptotically converges toward generalisable knowledge only, then it should tend to become differentially private.
…or so it seems to me, but actually I’ve no idea if that’s common knowledge in ml- or crypto- educated folks, versus it’s pure personal guess and there’s no reason to believe that. What do you see as the best argument for or against that idea? Any guess on how to disprove or prove it?
Extra Good Samaritan point : my english sucks, so any comment rewriting this post in good english, even if for minor details, is a great help thank you.
This is version 0.1
Answers
I've been working in differential privacy for the past 5 years; given that it doesn't come up often unprompted, I was surprised and pleased by this question.
Short answer: no, generalizability does not imply differential privacy, although differential privacy does imply generalizability, to a large degree.
The simplest reason for this is that DP is a property that holds for all possible datasets, so if there is even one pathological data set for which your algorithm overfits, it's not DP, but you can still credibly say that it generalizes.
(I have to go, so I'm posting this as it, and I will add more later if you're interested.)
↑ comment by Ilio · 2023-01-05T18:33:55.357Z · LW(p) · GW(p)
Thanks and yes please add! For example, if DP implies generalization, then why isn’t every one trying to complete backprop with DP principles to make a (more robust)/(better at privacy) learning strategy? Or that’s what every one tried but it’s trickier than it seems?
Replies from: alex-rozenshteyn, lahwran↑ comment by rpglover64 (alex-rozenshteyn) · 2023-01-06T01:22:54.635Z · LW(p) · GW(p)
As mentioned in the other reply, DP gives up performance (though with enough data you can overcome that, and in many cases, you'd need only a little less data for reliable answers anyway).
Another point is that DP is fragile and a pain:
- you have to carefully track the provenance of your data (so you don't accidentally include someone's data twice without explicitly accounting for it)
- you usually need to clip data to a priori bounds or something equivalent (IIRC, the standard DP algorithm for training NNs requires gradient clipping)
- you can "run out of budget"--there's a parameter that bounds the dissimilarity, and at some point, you have run so many iterations that you can't prove that the result is DP if you keep running; this happens way before you stop improving on the validation set (with current DP techniques)
- hyperparameter tuning is an active research area, because once you've run with one set of hyperparameters, you've exhausted the budget (see previous point); you can just say "budget is per hyperparameter set", but then you lose generalization guarantees
- DP bounds tend to be pessimistic (see previous example about single pathological dataset), so while you might in principle be able to continue improving without damaging generalization, DP on its own won't be informative about it
There are relaxations of DP (here's a reference that tries to give an overview of over 200 papers; since its publication, even more work in this vein has come out), but they're not as well-studied, and even figuring out which one has the properties you need is a difficult problem.
It's also not exactly what you want, and it's not easy, so it tends to be better to look for the thing that correlates more closely with what you actually want (i.e. generalization).
Replies from: Ilio↑ comment by Ilio · 2023-01-06T14:55:27.523Z · LW(p) · GW(p)
Thanks, this kind of inside knowledge from practice is both precious and the hardest to find. I also like the review very much: yes it may be not exactly what I wanted, but it feels like exactly what I should have wanted, in that it helped me realize that each and every of the numerous DP variants discussed implicates a slightly different notion of generalizability. In retrospect, my last question was a bit like « Ice-cream is good, DP is good, so why not use DP to improve ice-cream? » => because this false logic stops working as soon as « good » is properly defined.
I need some time to catch up with the excellent reading list bellow and try a few codes myself, but will keep your name in mind if I have more technical questions. In the mean time, and as I’m not very familiar with the habits here: should I do something like rewriting the main post to mentioned the excellent answers I had, or can I click on some « accept this answer » button somewhere?
Replies from: alex-rozenshteyn↑ comment by rpglover64 (alex-rozenshteyn) · 2023-01-06T17:42:45.975Z · LW(p) · GW(p)
should I do something like rewriting the main post to mentioned the excellent answers I had, or can I click on some « accept this answer » button somewhere?
I don't think there's an "accept answer" button, and I don't think you're expected to update your question. I personally would probably edit it to add one sentence summarizing your takeaways.
↑ comment by the gears to ascension (lahwran) · 2023-01-05T18:46:17.617Z · LW(p) · GW(p)
because DP gives up performance for guaranteed generalization.
Replies from: Ilio↑ comment by Ilio · 2023-01-05T20:00:34.132Z · LW(p) · GW(p)
Is « garanteed » important in your answer? E.g. do you know some code that shows this is real in practice or it’s more of a theorical result?
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2023-01-05T21:23:27.317Z · LW(p) · GW(p)
It's well studied. I'm not an expert in differential privacy and would need to read multiple papers in depth to be sure I'd answered precisely, but I know that at an english level of mathematical description, what's guaranteed is that there is definitely not memorization of individual datapoints, so any successful performance on a test set is definitely generalization. That doesn't mean it's causal generalization though. and the accuracy is usually worse - getting both differential privacy and capabilities pushes non-differentially-private capabilities more, usually, I think, or something. I'd have to go paper hunting to find how well it performs. Instead of doing that, I'll post my usual pitch: I strongly encourage you to do your level best to find some papers, because that shit is not always trivial and attempting and failing is a really good finding-stuff workout. tools I'd recommend include (in order of recommendation) wiki page on differential privacy, opening papers on semantic scholar -> a result and browsing forward through the citations they've received, especially sorted by latest (same link as "a result" but with sort), maybe a metaphor.systems query on the topic (sign in required but worth it). problem is, these results don't directly answer your question without doing some reading; I'd suggest opening papers, skimming through them fast to see if they answer, and browse the paper citation graph forwards and backwards towards titles that sound relevant. it might help to put the papers that seem relevant into papermap.xyz [edit: giving me 500 errors now] (via reddit u/Neabfi) or https://my.paperscape.org/ (which helps with forward and backward browsing a lot but is a jankier ui than papermap).
some results from the metaphor query:
- https://www.borealisai.com/research-blogs/tutorial-12-differential-privacy-i-introduction/
- https://differentialprivacy.org/
- https://opendp.org/about
↑ comment by rpglover64 (alex-rozenshteyn) · 2023-01-06T01:29:34.309Z · LW(p) · GW(p)
getting both differential privacy and capabilities pushes non-differentially-private capabilities more, usually, I think, or something
I don't think it does in general, and every case I can think of right now did not, but I agree that it is a worthwhile thing to worry about.
tools [for finding DP results] I'd recommend include
I'd add clicking through citations and references on arxiv and looking at the litmap explorer in arxiv.
↑ comment by Ilio · 2023-01-06T14:19:04.649Z · LW(p) · GW(p)
Thx for these links. I’ll need some time for a deeper reading, but after a few hours the first bits of my take home message are probably settled to: there’s no theorical guarantee that pursuing DP (or, more precisely, pursuing any of DP numerous variants) lead to worse result, except that’s what everyone report in practice, so if that’s a pure technical issue it’ll probably be not trivial to fix it.
I like your note that DP generalizability might not be causal generalizability: that’s both a contender for explaining why these seemingly technical difficulties arise and a potentially key thought for improving this strategy.
No comments
Comments sorted by top scores.