Permanents: much more than you wanted to know
post by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-16T08:04:00.510Z · LW · GW · 2 commentsContents
#P and Santa’s factory Permanents and polynomial algebra Back to the factory Enter our hero Valiant The easy part The hard part The quantum computing frame Sampling sampling sampling None 2 comments
Today's "nanowrimo [LW · GW]" post is a fun longform introduction to permanents and their properties, organized in the way I wish it had been explained to me. Note before going on: this is not my field, and this post is even more likely than usual to contain incorrect statements or bugs.
Recall that given an matrix with coefficients its determinant is a sum over permutations on n letters with , Here the notation denotes the parity of a permutation. Determinants are ubiquitous and extremely useful. They are also computable in polynomial time.
The permanent is defined the same way as the determinant, but without the signs : It is neither ubiquitous nor computable in polynomial time.
When I first heard of it in the pure math community, it was described to me as the ugly stepsister of the determinant: an object with “no elegant meaning”: the result of an idle question of “whoa, what if no signs”. The most famous result about the permanent is Leslie Valiant’s proof that it is NP hard (and more specifically, #P complete; #P is a more sophisticated version of NP that we’ll discuss later). I knew about this result, but thought of it as one of the dime-a-dozen theorems that show that inelegant problems lie in inelegant complexity classes.
I’ve since updated. As I’ll hopefully get across, permanents are actually quite elegant artifacts if thought about correctly, namely in the context of multiparameter polynomial algebra. I’ll explain how to think about their properties in this context, and what interesting things they say about high-dimensional polynomial algebra in general.
But first as a teaser, I’ll explain the #P class and why Valiant’s proof, far from being a routine “complexity check” of an arbitrary problem should be thought of as a pretty mind-blowing insight.
#P and Santa’s factory
In order to introduce the #P complexity class, let me tell you a story about Santa and his elves. I promise this story will be relevant. Namely, as is widely known, there is a singularity at the North pole of the Earth. At this singularity lies Santa’s workshop, which produces exponential quantities of content for the exponential quantum branches of boys and girls in the multiverse. In order to meet demand, Santa keeps a factory of exponentially many elves. The elves are very well-treated: they receive 3 soylents a day, and Santa’s family-oriented work ethic has that special personalized touch: each elf has a unique individual name, b, which is a boolean string of n 0’s and 1’s (here the length of the string n is a “large but not too large” number controlling the “polynomial sized” scale in complexity theory: think . The number of elves is the exponent of that, , which is why a singular dimension is needed to accommodate this amount of happy elves). Now in the modern world, instead of toys Santa’s factory produces computations. Namely, each morning Santa’s reindeer write a program P that depends on a boolean string b (of length n) and outputs an integer, F(b) (as well as any additional global variables that the reindeer introduce into the program). We assume that the program’s size – its memory requirements and runtime – are assumed polynomial in n (as is standard in complexity theory). Upon receiving the morning’s instructions, each elf executes the program F(b) associated to his name and writes down the resulting integer. At the end of the day, all of these integers are collected by friendly factory foremen and added together, to produce the day’s output, While the resulting number might be large in absolute value (note that it can be positive or negative!), it can’t be exponentially large: indeed, because of the problem complexity restrictions, each elf’s output must have polynomial-in-n digits. Since the number of elves is , the number of digits of the combined day’s result can be at most n digits longer than the largest single output, hence is still polynomial. In most cases of practical interest, this will be a single number that easily fits on a page.
The set of all quantities that can be outputted as for a program run by Santa’s factory is called #P. With this definition:
- Note that the result of a single polynomial-time algorithm A is obviously in #P: it can be produced by having the elf named run the algorithm A and have all other elves take a day off and output 0. Thus .
- More generally, any algorithm that is in NP is also in #P. Here recall that an NP problem is associated to a program F(b) of a boolean input that produces a single-digit boolean output, . The associated NP problem is to check whether there exists (at least one) satisfying input b such that P(b) = 1 – but in this context this is equivalent to checking whether or not the sum of outputs is positive.
Note that conventionally, #P is defined as the class of computations of the form for boolean-valued functions F (as was used in (2) above), where “counts” the number of satisfying assumptions (hence the “\#” term). It is straightforward to see that the two definitions are equivalent. Indeed, to see this it’s enough to check that a single polynomial-time algorithm A that outputs a positive integer can be converted to counting the number of 1’s in a polynomial-time algorithm on elves; to do this, simply have each elf with name b (viewed as a positive integer) first run the same algorithm A, then output output the truth value b<A[1].
It is generally assumed – but currently not proven (this is in the same class of assumptions as ) that there exist some #P algorithms that are not in NP, i.e., that the containment is strict,
Now Santa has a younger brother Chris Krinkle, who takes the efficiency-oriented family ethos very seriously. Recently Chris has proposed a revolutionary new method to reduce the elves’ workload to a simple repetitive computation, to make everyone even more happy. In fact, it will reduce their need for work so much that the new factory will be able to really cut the margins and let the elves perform their important tasks on only one soylent per day!
In order to implement his exciting cost-cutting idea, Chris builds a factory at the singularity in the South pole. His elves have a new format of exciting personalized names: namely the name of each elf is a string of values equal to (instead of ).
Now instead of running a single complicated algorithm in a single day, Chris’s Krinkle’s factory works in a massively simplified format, with the following simple triple of steps:
Step 1. At every clock cycle t, Chris sends all his elves a single length-n vector of integers, .
Step 2. At each clock step, each elf simply takes the dot product of with her name, and writes down the value, .
Step 3. At the end of the day, each elf outputs the product of these values.
Finally, at the end of the day the foremen take the results of step 3 from all the elves and sum them together (I didn’t include this as “step 4” because this is the same summing procedure that we’ve seen at the North pole).
In other words, the computation done by each elf in Kringle’s factory is in some sense extremely shallow. Phrased slightly differently, what elf b does can be understood as:
- Take the product of vector b with the matrix M whose columns are the (here T is the total number of clock cycles). The result is a length-T vector.
- Take the product of the entries of this length-T vector.
Obviously, any program implemented by the South pole elves can also be implemented by the North pole ones: they’re just applying a bunch of linear functions to their ID and multiplying them together, which is a valid polynomial-time program.
The surprising fact, equivalent to Valiant’s result on permanents (in a way that shouldn’t be obvious to you; I’ll explain it later) is the following theorem.
Theorem (Valiant). Any computation performed by the North pole elves can also be performed by the South pole elves[2].
I’ll explain and discuss the result and its connection to permanents in the next sections. Before going on, note that there is a suggestive analogy between arithmetic operations in programs that output a general number and logical operations in programs that output a boolean. Namely, one can think of “linear combination” for non-boolean functions as analogous to the OR operation and “multiplication” as analogous of the AND operation. To make this precise, we’ve seen above that there is a natural way to convert any positive integer-valued program F(b) to a -valued function which returns under this reduction, addition and multiplication map to OR and AND, respectively. More generally, “linear combination” of boolean inputs corresponds to the OR of some subset of inputs or their negations. Putting this all together, we get the informal statement that the South pole elves are performing a #P variant of the SAT circuit. (Don’t worry if this isn’t familiar to you – this is just an intuition pump, and we won’t look at the SAT context in the future). Now in this (not at all rigorous) analogy, the theorem above corresponds to the famous Cook-Levin theorem that the SAT problem is NP complete. Note actually, that there is a direct analog of the Cook-Levin theorem in #P (that any #P expression can be reduced to counting satisfying inputs of SAT), but the “South pole theorem” above is a more “natively #P” version of this result. Both the Cook-Levin theorem and Valiant’s theorem (in either the above form, or the form involving permanents) are pretty nontrivial to prove – with the Valiant result being quite a bit tricker than Cook-Levin. Nevertheless it seems reasonable to think of these two results as distant cousins.
Permanents and polynomial algebra
Let’s leave Santa’s elves alone for a while and finally get back to talking about permanents. Now let me say this straight off the bat: the formula for the permanent in terms of matrices and permutations is not the right way to think about it. Let’s forget it – throw it in the garbage. Instead, let’s think about polynomials: specifically, multiplying polynomials.
Let’s consider the set of polynomials in n (real) variables, This is obviously a vector space, and in fact since polynomials are closed under multiplication, it’s a commutative ring (if you don’t know what that is don’t worry: we will not be using the theory of rings below). As a vector space it has a basis of monomials, . To avoid writing out big products, we’ll compress monomials into tuple indices: so write where is a vector of variables and is a tuple of exponents. Now for an algebraist, multiplying polynomials is the easiest thing ever. We have Done and dusted. What if the polynomial is not a monomial but a linear combination, you ask? Easy, just open parentheses, a.k.a. expand!
However, if instead of an algebraist, you find yourself talking to a computer programmer, you will notice her starting to get concerned. Sure, opening parentheses is fine – but it creates more terms. Indeed, if you have a pair of polynomials which both have N terms, their product may have up to terms upon expanding. So far this is ok if you’re a complexity theorist: it’s still a polynomially bounded value. But things get really messy if you multiply many terms. Indeed, if you multiply n polynomials each with N terms (something we’ll encounter in a minute), the result might have up to terms if you’re unlucky. Note that interestingly, computing values only takes polynomial time: you’re just multiplying together n polynomially computable quantities. However even writing down the resulting polynomial in terms of monomials (assuming someone has precomputed them for you) may take exponential time . Bad news for your programmer friend.
Well fine, you might ask – products of polynomials in n variables are hard to store in the monomial basis. But if we care so much about their coefficients, presumably we only need a few of them at a time. So how hard is it to retrieve a single coefficient, in front of the monomial , of our product ? Enter the permanent.
In fact, the permanent, is the answer to a specific instance of the above procedure (and this is the correct way to think of permanents, everyone else is lying to you). Namely, recall that the permanent is an invariant of an matrix A. Let’s now use lower-case letters for the coefficients of A. Then the following result is straightforward, but worth checking for yourself:
Proposition. The permanent is equal to the coefficient of the monomial (exponent multiindex ) in the big product
Define the function the kth multipland in the product expression for above and a linear polynomial. Then of course we have Now if you’ve checked the proposition above – you see that the permanent is a particular monomial coefficient in this product.
Note that this expression immediately makes manifest some obvious properties of the permanent. Namely:
- If we permute the rows of the matrix, the answer doesn’t change (we’re just permuting the factors in the product).
- If we permute the columns of the matrix, the answer doesn’t change (we’re just permuting the summands in each factor).
- If we multiply each row by a scalar , then the permanent scales by (we’re just rescaling each term of the product).
Interestingly, something that isn’t manifestly obvious in this expression, but is clear from the standard definition, is that the permanent is invariant under transposition. We’ll see the right way to understand this later.
So ok, how hard is it to extract this coefficient? According to our friend the algebraist, all you have to do is expand. But when you expand, as you’ve checked in the proposition above (have you checked it? I can wait), you exactly get the expression for the permanent. Contributions from different terms, each a product of some subset of the . That’s a lot of terms!
So can we make our life easier? It turns out we can, a bit. Let’s think a little more about polynomials. Remember that one way to extract a monomial from a polynomial is to recursively differentiate: Here I’m using the shorthand for partial derivatives. This is nice, but it’s not clear how this makes our life easier. But now, let’s think a little more carefully about all the possible terms of the product f. Since f is a product of n linear factors, each monomial in f has degree n. This means (by the pigeonhole principle) that either it is the monomial whose coefficient we want to extract or it is a monomial that doesn’t depend on some specific coordinate . Now obviously this monomial gets killed by . But notice that more generally, it gets killed by any manipulation that takes a difference of values that differ only in the coordinate (whether infinitesimally close or not). In particular, instead of the derivative we can take what’s called the discrete derivative On the other hand because of the factor of 2 in the denominator, applying this operator to our monomial of interest does not kill it, but has the same effect as , namely it returns (where the “hat” notation here means “skip” – we’re multiplying all but ith term). Iterating this argument, we see that instead of recursively applying the usual derivatives , we can iteratively apply the discrete derivatives and evaluate at zero. Now an application of the discrete derivative evaluates the function in two values. By induction, after applying k discrete derivatives, we will be evaluating the function in values. At the end of the day and once all the smoke clears, we see (check this!) that the discrete derivative formula for the permanent is a linear combination of values of at the terms of the cube in . Specifically, we get: Here the sum is over points on the cube \{\sigma\in (\pm 1, \pm 1,\dots, \pm 1)\} of points with coordinates, and the scalar is the product of the coordinates of [3].
Back to the factory
Returning to our happy factory elves, we can now interpret the problem of finding permanents as a problem solvable by the South pole elves. Namely, recall that South pole elves have names that are n-length strings of ’s. The South pole’s calculation works by sending all the elves a sequence of length-n vectors (in general, there can be more or less of these vectors, but for the permanent we want exactly n). Each elf now iteratively takes the dot product of each with his name and multiplies the results for .
But now if we take to be the kth column of A, then \) is the value on a point of the cube () of the linear factor L_k from our formula for . Multiplying, each elf now outputs . Note that so far this is missing the sign , but we can recover this by padding A by the identity matrix to get a a matrix: the result of adding the extra “task vectors” multiplies the result of each elf by So at the end of the day, adding the result of this big product of linear terms over all elves returns the permanent (up to a constant factor of from remembering the denominators).
Thus we’ve shown that the “South pole algorithm” (for those of you keeping track, remember that this is a #P analogue of SAT) can compute any permanent. What next?
Enter our hero Valiant
At this point, all that remains to prove is Valiant’s theorem that every #P calculation – i.e., every calculation performable by the north-pole elves – can be expressed as a permanent. The standard proof is actually a reinrepretation of Valiant’s original proof by Ben-Dor and Halevi. They in fact prove a stronger result: namely, that the subproblem of competing “01 permanents”, i.e., permanents of matrices with only 0 and 1 entries, is #P hard (note this implies that it is also #P complete, as we’ve seen the permanent is in #P). The proof has an easy part and a hard part.
The easy part
Let G be a graph on n vertices labeled . A cycle cover of G is a directed subgraph of G that contains all vertices, and consists of a disjoint collection of cycles. So for n = 7, a potential example of a cycle cover would be (assuming every directed edge is in the graph).
Now a directed graph on n vertices is exactly a 0-1 matrix with no diagonal entries (to go from the matrix to the graph: put a 1 in entry if and only if there is an edge from i to j. In the other direction, take the adjacency matrix). The following lemma gives a way to interpret the permanent of a 01 matrix in a combinatorial way (note: the specific lemma isn’t crucial, the key idea is that there exists a nice combinatorial interpretation).
Lemma. The number of cycle covers of a directed graph G is equal to the permanent of its adjacency matrix.
The hard part
Now the hard part is of the proof is to convert another #P-complete problem (that I briefly mentioned above), namely the SAT instance count problem, to a count of cycle covers. Here the proof amounts to packaging circuits in the SAT calculation and ways to combine them into so-called “graph gadgets”, certain hand-constructed graphs that organize the logical expression. I won’t try to explain more of the proof here – nor do I know it – but picture a scary sign saying “here be combinatorics”.
The quantum computing frame
No discussion of permanents is complete nowadays without also referencing Scott Aaronson’s paper “A Linear-Optical Proof that the Permanent is #P-Hard”. This is an updated proof of the #P completeness that reduces this result to certain complexity results that appear naturally in quantum computing theory. Here readers who have interacted with quantum computing shouldn’t be surprised that it comes up, since the kinds of “large polynomial” computations we’re doing are the bread and butter of quantum algorithms. Specifically, if I understand the argument correctly, Aaronson reduces #P hardness of the permanent to a different hardness result originating from quantum computing work called “boson sampling”. The context of Boson Sampling is, a quantum sampling problem that is #P hard to convert to a density estimation – while this is not an inherently quantum statement, according to Scott the quantum computing literature gives a standard and elegant way to prove this. Now boson sampling is a process on bosonic Fock Space, which is just another name for the vector space of polynomials. Once this intuition is made precise, one can then reduce the problem of density estimation for boson sampling to finding coefficients of products of polynomials[4].
Sampling sampling sampling
As we saw at the very beginning, the #P class is strictly harder than the NP class. However, when a #P-type problem appears in real life, it is frequently much better news than if an NP problem appears. The reason is that, in many contexts, while exactly solving a #P problem is blocked by complexity theory, approximating it (say to 1% accuracy – or more generally, to any multiplicative accuracy which is polynomial in ). This is not always possible – indeed, there exist cases where even approximation is NP hard. At the same time it’s possible remarkably often. In particular for the permanent problem, probabilistically convergent algorithms for approximating the permanent exist in cases where the matrix A either has nonnegative entries, or is symmetric and positive-definite.
This remarkable susceptibility of #P algorithms to approximation is provided by sampling algorithms such as MCMC, and often reduce to the social dynamics of elves. Indeed, going back to our original elf-themed definition of the #P complexity class – a population of elves running parallel computations that get added together at the end – we can formally interpret the problem of evaluating the sum of the parallel computations to a (scaled) expectation of the output produced by a random elf. A naive next thing to do is now to “directly sample” the elf population: i.e., recursively pick a random elf, see his calculation result, write it down, and average it at the end to get a sample. However for most interesting #P algorithms this will not work, as one expects the vast majority of the mean to be provided by an ~exponentially (or at least a superpolynomially) small fraction of lucky elves. This in particular is true in the “South pole-style” calculation involved in the permanent. Thus if you run your “naive” sampler for a polynomial amount of time, you have a vanishingly small probability of ever stumbling on a lucky elf, and will end up (assuming that all outputs are positive) with a vastly underestimated guess for the average. However statistical physics provides a class of methods that one uses in precisely these situations. Namely, instead of doing the naive sampler, it is often possible to use methods like MCMC to sample from a modified probability distribution that is concentrated almost exclusively on the “lucky” elves (and adjusts their outputs appropriately to keep the mean the same).
In the case of the permanent, it turns out that depending on elves is not the only way to convert the computation to an expectation value. Indeed (as I briefly mentioned above), a number of continuous probability distributions can be integrated against the product polynomial that determines the permanent in order to get the correct expectation, to get analytic integral expressions for the permanent that can (at least potentially) be computing using sampling tricks. For positive-definite matrices, one such continuous sampling algorithm provably converges to an approximate solution (whereas for matrices with positive entries, there is a different discrete sampling method that provably converges to estimating invariants of graphs).
One consequence of the analytical methods for computing the permanent is based on the relationship between a function and its Fourier dual. Since I’ve likely already bored most of my readers to death, I’ll only sketch it out very approximately, in order to fulfill a promise I made earlier in the piece. Namely, note that we can express the permanent either as an iterated partial derivative of a product polynomial, or as an L2 dot product of the same polynomial against any of a big class of probability distributions, including Gaussian probability distributions. After a change of basis, integrating the distribution against our polynomial (which is a product of linear terms) amounts to taking an iterated moment of a suitably reparameterized Gaussian. Once we notice this, we can apply a Fourier transform to convert the derivative computation for the permanent in coordinates to a moment computation in the Fourier transformed coordinates , and vice versa. This “duality” between the two pictures ultimately relates to permanent expressions on the two sides – but these permanent expressions are transposed. Now I promised you that there is a “correct” way of thinking about the transposition-invariance of the permanent, and this is it. Indeed, while this may seem like overkill for proving an identity that is obvious in coordinates, the Fourier transform expression is much more adaptable to results in other contexts (such as non-square matrices).
- ^
If A might have negative outputs, then for this conversion Santa technically needs to subtract the results of two days worth of algorithms, once with only positive values and once with negative ones, but this doesn’t change the complexity class.
- ^
Here we allow using a larger but still polynomial-in-n number of elves for the South pole calculation.
- ^
Note that the formula I wrote down for the iterated discrete derivative is equivalent to taking the dot product of f with the singular distribution (which is in fact the probability distribution of a random variable, since it integrates to 1). The fact that this returns the permanent is equivalent to the fact that the dot product of with any degree-n monomial other than – equivalently any nth-order moment of the associated distribution other than the (1,1,\dots, 1) moment – is zero. Using this observation in different contexts leads to a number of alternative “standard” formulae for the permanent – see this note.
- ^
Note that I am very far from an expert in quantum computing, and have likely missed some important steps here.
2 comments
Comments sorted by top scores.
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-16T08:57:29.546Z · LW(p) · GW(p)
Hope this will be answered in a later post, but why should I care about the permanent for alignment ?
Replies from: dmitry-vaintrob↑ comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-16T08:59:20.273Z · LW(p) · GW(p)
The elves care, Alex. The elves care.