A computational no-coincidence principle

post by Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · LW · GW · 9 comments

This is a link post for https://www.alignment.org/blog/a-computational-no-coincidence-principle/

Contents

  Our no-coincidence conjecture
  How we came up with the statement
  Thoughts for theoretical computer scientists
  Why we care
None
9 comments

This post presents a conjecture formulated at the Alignment Research Center in 2023. Our belief in the conjecture is at least partly load-bearing for our belief in ARC's overall agenda. We haven't directly worked on the conjecture for a while now, but we believe the conjecture is interesting in its own right.

 

In a recent paper in Annals of Mathematics and Philosophy, Fields medalist Timothy Gowers asks why mathematicians sometimes believe that unproved statements are likely to be true. For example, it is unknown whether  is a normal number (which, roughly speaking, means that every digit appears in  with equal frequency), yet this is widely believed. Gowers proposes that there is no sign of any reason for  to be non-normal -- especially not one that would fail to reveal itself in the first million digits -- and in the absence of any such reason, any deviation from normality would be an outrageous coincidence. Thus, the likely normality of  is inferred from the following general principle:

No-coincidence principle (Gowers): If an apparently outrageous coincidence happens in mathematics, then there is a reason for it.

In other words: suppose that the digit 3 accounts for not 10% but 11% of digits in the decimal expansion of  . Intuitively, there ought to be an explanation of this fact: maybe not a proof, but something that someone could say to me that would make me say, "Oh, I get it now, it makes sense that 3 appears more than 10% of the time in ".

(For an apparently outrageous coincidence that turns out to be true, check out Chebyshev's bias: in a sense, primes that are 3 mod 4 are more common than primes that are 1 mod 4. As expected by Gowers' no-coincidence principle, we have a heuristic understanding for why this should be true. However, we only have a proof conditioned on strong forms of the Riemann hypothesis.)

The no-coincidence principle is profound, but also informal. The purpose of this post is to propose a formalization, so that we can at least ask whether or not it's true. In this post, I will:

 

Our no-coincidence conjecture

Here is our best attempt to capture the spirit of Gowers' no-coincidence principle in a formal mathematical statement:

Computational no-coincidence conjecture: For a reversible[1] circuit , let  be the property that there is no input  to  that ends in  zeros, such that  also ends in  zeros. There exists a polynomial-time verification algorithm V that receives as input:

such that:

Here,  plays the role of the "apparently outrageous coincidence". To see why it's so outrageous, let's do some math.

It is conjectured that random depth- reversible circuits are pseudorandom permutations (see e.g. Section 1.5 here), so let's model a random reversible circuit as an actually random permutation of . There are  inputs  that end in  zeros, and for each of these, the probability that  ends in  zeros is  (approximately independently). So the probability that none of these  inputs are mapped to an input that ends in  zeros is roughly

Now, obviously the actual probability that a randomly chosen reversible circuit satisfies  is much higher than that (there are only exponentially many polynomial-size circuits, and e.g. the circuit  that negates all bits satisfies ). This discrepancy comes from our choice to model random reversible circuits as truly random permutations.

However, the fact that a random circuit is so unlikely to satisfy  "by chance" suggests that any circuit that does satisfy  has some kind of special structure that explains "why" it satisfies . (For example, the aforementioned circuit , implemented as  parallel NOT gates, has a very clear structure.) In this context, our no-coincidence conjecture states: such structure can be expressed in a polynomial-length advice string.[4]

In other words, we believe that there is a "formal language" for expressing different kinds of structure that can give rise to circuits that satisfy .[5] All circuits that satisfy  have some such structure, while most random circuits have no such structure.[6]

 

How we came up with the statement

Gowers' no-coincidence principle speaks of "apparently outrageous coincidences". We wanted some simple yet expansive domain in which we could speak of outrageous coincidences, and boolean circuits were a natural one.[7] Our first attempt at formalization was to consider the family of circuits  and to let the "outrageous coincidence" be the property  be that  outputs 1 on every input.

The resulting no-coincidence conjecture was then: There exists a polynomial-time verification algorithm V that receives as input a boolean circuit  and an advice string , such that for any  that always outputs 1, there is a polynomial-length  such that , but for 99% of random circuits, no such  exists.

Unfortunately, this statement was uncompelling for two reasons:

Our next attempt was to consider the family of circuits  and the property  that  never outputs the all-zeros string. (On a naive analysis, this is unlikely to happen "by chance", since there are way more inputs than possible outputs.) This modification resolved the second issue (even if  outputs the all-zeros string on some inputs, finding any such input by sampling may be infeasible.) However, the first issue remained.

To the best of our knowledge, the "reversible circuits" formulation above (which is similar in spirit to the previous formulation but inherits the nice properties of random reversible circuits) resolves the first issue as well.

Ultimately, though, we are not wedded to our particular formulation. Perhaps there is some clever sampling-based verifier that "trivializes" our conjecture as well, in which case we would want to revise it. We are ultimately interested in the informal claim that if a circuit exhibits a very surprising property, it is possible to point out some internal structure of the circuit that will "explain" to a verifier why the surprising property holds.

 

Thoughts for theoretical computer scientists

I think that complexity theorists ought to find our conjecture really compelling and deep. This is for a few reasons:

 

Why we care

At ARC, we are interested in finding explanations of neural network behavior. Concretely, a trained neural net (such as GPT-4) exhibits a really surprising property: it gets low loss on the training set (far lower than a random neural net). We are interested in understanding the structural properties of the neural net that result in it getting low loss.

Our notion of an explanation is related to, but different from, interpretations sought after by mechanistic interpretability researchers. The main commonality is that we also seek a mechanistic understanding of neural nets via a close analysis of their internals. Our methods might also be similar: we are interested in learning generative models of neural activations, much as Anthropic's interpretability team learned sparse autoencoders for neural features using dictionary learning. However, there are important differences:

One might reasonably ask: why do we believe that such explanations exist? In other words, maybe we can understand some bits and pieces of neural network behavior, but not fully explain the internal structural properties that result in the model's behavior.

Our belief in the existence of such explanations follows from a more general belief: that all surprising mathematical structure has an explanation, in the sense of Gowers' no-coincidence principle.

From our discussions with other researchers, we have gotten the impression that some agree with this overarching intuition while others do not. On the other hand, it seems difficult to argue about the truth or falsehood of such an informal statement. Thus, our attempt at formalization is in part motivated by wanting to explain more concretely what we believe.

If our conjecture is false,[12] we would like to know. It may cause us to lose faith in our belief that neural networks are explainable (in the way that we are using the word "explainable"), and to pivot to a new research direction.

 

  1. ^

    A circuit is reversible if every gate is a bijective function, see e.g. the Toffoli gate.

  2. ^

    In fact, we believe that this is probably true if  must be quasi-linear, i.e. .

  3. ^

    We think that the distribution over random circuits probably doesn't matter too much for the formal statement, so long as the circuits are deep enough to ensure good mixing. But to be concrete, we could say that a random circuit consists of  layers. Each layer has  gates with three inputs and three outputs, such that each of the  wires is an input to exactly one gate (with wires randomly assigned to gate). Each gate is uniformly chosen among the 8! bijective functions from  to .

    For some prior work on random reversible circuits, see He & O'Donnell (2024).

  4. ^

    Technically, our conjecture makes a weaker claim. For instance, there might be a verifier for which the advice string  does not necessarily have a natural interpretation as pointing out structural properties of . Ultimately, though, we are interested in finding a verifier that accepts or rejects  based on a structural explanation of the circuit; our no-coincidence conjecture is our best attempt to formalize that claim, even if it is imperfect.

  5. ^

    This is analogous to how there is a formal language for mathematical proofs, and a proof verifier that checks whether a proof is valid. But in our case,  is valid if it provides "compelling evidence" that  may be true. A verifier can sometimes accept  even if  is false, so long as it does not do so for too many random circuits; thus, our soundness condition is weaker than the soundness condition for proofs.

  6. ^

    Some circuits that do not satisfy  may have a structure that causes  to accept. This is inevitable. Consider for instance a reversible circuit  that does not do anything with the first  bits of the input, while having no discernible structure on the last  bits. The heuristic estimate we gave before now suggests a roughly  chance that  is true. Since  must be able to output 1 on every  for which  is true, the structural observation that the firs  bits are left alone ought to be enough to cause  to output 1.

  7. ^

    After all, many mathematical propositions can be written as statements about boolean circuits. Consider the Goldbach conjecture: every even number greater than 2 can be written as a sum of two primes. We can write this as: for all , there exist  and  such that  and for all  and . Consider a circuit  that outputs  if , and . Then Goldbach's conjecture is equivalent to "For all  there exist  such that for all ." (Or to be more precise, there is a family of such circuits, one for every possible size of , and Goldbach's conjecture is equivalent to this statement being true for all circuits in the family.)

  8. ^

    We reduce from Circuit-SAT. Let  be a boolean circuit. For , let  be equal to  if  and  if , where  is the bitwise NOT of . Then  is reversible, and  is true if and only if  is satisfiable.

  9. ^

    If our conjecture is true, we think it is probably independent of ZFC. (This is because we expect it to be impossible to rule out that some  satisfies  just by chance, despite the incredible implausibility.) Despite that, it seems realistic to us to find a specific verifier  that would likely satisfy the conjecture statement (even if this could never be proven).

  10. ^

    Save perhaps for the "random circuit" bit, although that's not unprecedented either.

  11. ^

    This is mainly because we do not think it is safe to assume that safety-relevant internal structures will be understandable to humans.

  12. ^

    At least assuming that we have not messed up the conjecture statement. Perhaps it will be false for a reason that has nothing to do with the existence of explanations for circuit behavior, in a way that we have not anticipated. (But we think that our conjecture is more likely to be true for unsatisfying reasons, than false for unsatisfying reasons.)

9 comments

Comments sorted by top scores.

comment by Jacob_Hilton · 2025-02-14T21:45:55.686Z · LW(p) · GW(p)

Before reversible circuits, we first considered a simpler setting: triangle counting. The no-coincidence principle in that setting turned out to be true, but for a relatively uninteresting reason, because the domain was not rich enough. Nevertheless, I think this result serves as a helpful exercise for people trying to get to grips with our definitions, as well as providing more of the story about how we ended up with our reversible circuits statement.

In the triangle counting setting, we consider the distribution  over undirected 3-partite graphs on 3 groups of  vertices obtained by taking each of the  possible edges to be present independently with probability . We take , so there are many more edges than vertices, but much fewer than the total number of possible edges.

For a randomly sampled , one can check that the number of triangles in  has mean  and variance . So if  has  triangles, then we consider this to be an "outrageous coincidence", since this exceeds the mean by  standard deviations. (This is "outrageous" because if the number of triangles were normally distributed with this mean and variance, then the probability of exceeding this for any of the  possible graphs would tend to zero.) This motivates the following statement, which turns out to be true:

Proposition (No-coincidence principle for triangle counting, ). For any , there is a linear-time verification algorithm  that receives as input:

  • A 3-partite graph on 3 groups of  vertices, represented as a list of edges
  • An advice string 

such that:

  • For all graphs  with  triangles, if  is sufficiently large then there exists  with length linear in the number of edges of  such that .
  • For , the probability that there is any  with  tends to zero.

(Note that the result would be trivial if we allowed  to run in polynomial time, since it could then directly count the number of triangles, so we need to be more fine-grained than that.)

Dmitry Vaintrob and Aryan Bhatt were able to generalize this result to polygon counting (note that we consider graphs on  groups of  vertices arranged in a cycle, not -partite graphs in general, so there are  possible edges, not ) and to permanents. So we concluded that neither of these settings seemed to be rich enough to produce an interesting no-coincidence principle (even though permanents are #P-hard to compute), and moved on to considering circuits instead.

Hint for polygon counting:

Write the number of polygons as a cyclic trace .

comment by Gurkenglas · 2025-02-15T10:11:14.658Z · LW(p) · GW(p)

Ultimately, though, we are interested in finding a verifier that accepts or rejects  based on a structural explanation of the circuit; our no-coincidence conjecture is our best attempt to formalize that claim, even if it is imperfect.

Can you say more about what made you decide to go with the 99% clause? Did you consider any alternatives?

Replies from: Alibi
comment by Alibi · 2025-02-15T19:29:09.050Z · LW(p) · GW(p)

Reading the post, I also felt like 99% was kind of an arbitrary number. I would have expected it to be something like: for all $\epsilon > 0$ there exists a $V$ such that ... $1-\epsilon$ of random circuits satisfy ...

Replies from: Gurkenglas
comment by Gurkenglas · 2025-02-15T20:27:16.320Z · LW(p) · GW(p)

I'm hoping more for some stepping stones between the pre-theoretic concept of "structural" and the fully formalized 99%-clause. If we could measure structuralness more directly we should be able to get away with less complexity in the rest of the conjecture.

comment by tailcalled · 2025-02-15T21:08:00.832Z · LW(p) · GW(p)

Would "the neural network has learned a lookup table with a compressed version of the dataset and interpolates on that in order to output its answers" count as an explanation of the low dataset loss?

(Note, this phrasing kind of makes it sound too simple. Since the explanations you are seeking presumably don't come with the dataset baked-in as a thing they can reference primitively, presumably the actual formal statement would need to include this entire compressed lookup table. Also, I'm imagining a case where there isn't really a "compression algorithm" because the compression is intimately tied up with the neural network itself, and so it's full of ad-hoc cases.)

Like I guess from an alignment perspective this could still be useful because it would be nice to know to what extent "bag of heuristics" holds, and this is basically a formalization of that. But at the same time, I already roughly speaking (with lots of asterisks, but not ones that seem likely to be addressed by this) expect this to hold, and it doesn't really rule out other dangers (like those heuristics could interact in a problematic way), so it seems kind of like it would just lead to a dead-end from my perspective.

Replies from: tailcalled
comment by tailcalled · 2025-02-15T21:14:11.504Z · LW(p) · GW(p)

Maybe to expand: In order to get truly good training loss on an autoregressive training objective, you probably need to have some sort of intelligence-like or agency-like dynamic. But much more importantly, you need a truly vast amount of knowledge. So most of the explanation for the good performance comes from the knowledge, not the intelligence-like dynamic.

(Ah, but intelligence is more general, so maybe we'd expect it to show up in lots of datapoints, thereby making up a relatively big chunk of the training objective? I don't think so, for two reasons: 1) a lot of datapoints don't really require much intelligence to predict, 2) there are other not-very-intelligence-requiring things like grammar or certain aspects of vocabulary which do show up in a really big chunk.)

comment by tailcalled · 2025-02-15T10:03:37.168Z · LW(p) · GW(p)

If this is meant to be a weakening of NP vs co-NP, what do you make of the stronger statement that NP = co-NP? As I understand it, this most complexity theorists think this is false. Do you have any reason to think that your conjecture is much much more likely to hold than NP = co-NP, or do you also think NP = co-NP could hold?

Replies from: Jacob_Hilton
comment by Jacob_Hilton · 2025-02-15T17:33:18.240Z · LW(p) · GW(p)

Good question! We also think that NP ≠ co-NP. The difference between 99% (our conjecture) and 100% (NP = co-NP) is quite important, essentially because 99% of random objects "look random", but not 100%. For example, consider a uniformly random string  for some large . We can quite confidently say things like: the number of 0s in  is between  and ; there is no streak of  alternating 0s and 1s; etc. But these only hold with 99% confidence (more precisely, with probability tending to 1), not 100%.

Going back to the conjecture statement, the job of the verification algorithm  is much harder for 100% than 99%. For 100%,  has to definitively tell (with the help of a certificate) whether a circuit has property . Whereas for 99%, it simply has to spot (again with the help of a "certificate" of sorts) any structure at all that reveals the circuit to be non-random in a way that could cause it to have property . For example,  could start by checking the proportions of different types of gates, and if these differed too much from a random circuit, immediately reject the circuit out-of-hand for being "possibly structured". Footnote 6 has another example of structure that could cause a circuit to have property , which seems much harder for a 100%- to deal with.

comment by Capybasilisk · 2025-02-15T08:02:55.742Z · LW(p) · GW(p)

Isn't this just the problem of induction in philosophy?

E.g., we have no actual reason to believe that the laws of physics won't completely change on the 3rd of October 2143, we just assume they won't.