Posts
Comments
Confirming that efficiently finding a small circuit (you don't actually need further restrictions than size) based on its values on a fixed collection of test inputs is known to imply --- see this paper.
This is a fact worth knowing and a lucid explanation --- thanks for writing this!
I know it's not the main point of the post, but I found myself a little lost when you talked about complexity theory; I would be interested to hear more details. In particular:
- When you say
what are the definitions of these classes? Are these decision problems in the usual sense, or is this a statement about learning theory? I haven't been able to either track down a definition of e.g. "NP-invertible" or invent one that would make sense --- if this is a renamed-version of something people have studied I would be curious to know the original name.
- You claim "the assumption PBPP implies that there is no way, in polynomial time, to get reliable information about A’s secret circuit C (beyond some statistical regularities)" --- I suspect this is not quite what you mean. I'm not sure exactly the formal statement you're making here, but it would be pretty unusual for PBPP to be a relevant assumption (in fact most people in TCS strongly believe P=BPP). You call this "the probablistic version [of PNP]", so I'm guessing this was a typo and you mean NP ⊄ BPP? But I would also be impressed if you could get something like this statement from only NP ⊄ BPP. My best guess would be that you do need those cryptographic assumptions you mentioned for this part (that is, if you want to say "P/poly is not PAC learnable" --- in other words, no polynomial-time algorithm can, given random input-output samples from an arbitrary poly-size circuit, find a circuit computing a nearby function except with tiny success probability --- I'm pretty sure this is only known conditional on the existence of one-way functions).
Again though, I know these are quibbles and the main point of this section is "you don't need any unproven complexity assumptions to get lower bounds against statistical query learning".
(Separately, a minor nit with the presentation: I found the decision to use "learning algorithm" to refer specifically to "weight-based ML architecture that learns via SGD" to be slightly confusing. The takeaway being "there exists an algorithm that learns parities, but there doesn't exist a learning algorithm that learns parities" is a little slippery --- I think it would be worth replacing "learning algorithm" with e.g. "gradient-based learning algorithm" in the writeup.)