When performing a dimensionality reduction on tensors, the trace is often zero.
post by Joseph Van Name (joseph-van-name) · 2023-08-02T21:06:55.423Z · LW · GW · 1 commentsContents
Remark: Remark: Conclusion: Edited 1/10/2024 None 1 comment
In this post, we shall define my new dimensionality reduction for tensors in where , and we shall make an empirical observation about the structure of the dimensionality reduction. There are various simple ways of adapting this dimensionality reduction algorithm to tensors in and even mixed quantum states (mixed states are just positive semidefinite matrices in which trace 1), but that will be a topic for another post.
This dimensionality reduction shall represent tensors in as tuples of matrices . Computer experiments indicate that, in many cases, we have whenever .
If is a matrix, then the spectral radius of is the value
.
If is a matrix, then define the conjugate matrix ; this is the matrix obtained from by replacing each entry with its complex conjugate.
If is a tuple of real or complex matrices, then define the -spectral radius by setting
.
Suppose that is either the field of real numbers or the field of complex numbers. Suppose that is a non-commutative homogeneous of degree polynomial with coefficients in (it is easier to define the dimensionality reduction in terms of homogeneous non-commutative polynomials than tensors).
Then define a fitness function by setting
.
This function is bounded, and it has a maximum value, but to prove that it attains its maximum value, we need to use quantum channels.
We shall call a tuple where is maximized an
-spectral radius dimensionality reduction (LSRDR) of the non-commutative polynomial . The motivation behind the notion of an LSRDR is that it is easier to represent the variables as the matrices than it is to work with the non-commutative polynomial . The -matrices have parameters while the non-commutative polynomial could have up to parameters where is the degree of the polynomial .
We observe that if is a quadratic non-commutative homogeneous polynomial, then where refers to the Frobenius norm. In other words, we already have a well-developed theory of matrices, and LSRDRs do not improve the theory of matrices, but LSRDRs help us analyze tensors of order 3 in several different ways.
Given square matrices , define a completely positive superoperator by setting. The operator is similar to the matrix .
Observation: Suppose that is a non-commutative homogeneous polynomial of degree with random complex coefficients. Let be an -spectral radius dimensionality reduction of . Then we often have whenever is a homogeneous non-commutative homogeneous polynomial with degree where . Furthermore, the set of eigenvalues of is invariant under rotations by . Said differently, whenever .
I currently do not have an adequately developed explanation for why and so often (more experimentation is needed), but such an explanation is probably within reach. The observation does not occur 100 percent of the time since we get only when the conditions are right.
If , then
. Therefore, precisely when for . Furthermore,
, so precisely when whenever .
. Therefore, precisely when whenever .
Remark:
LSRDRs of tensors are well-behaved in other ways besides having trace zero. For example, if we train two LSRDRs of a tensor multiple times with the same initialization, then we typically have (but this does not happen 100 percent of the time either). After training, the resulting LSRDR therefore does not have any random information left over from the initialization or the training, and any random information present in an LSRDR was originally in the tensor itself.
Remark:
We have some room to modify our fitness function while still retaining the properties of LSRDRs of tensors. For example, suppose that is a homogeneous non-commutative polynomial of degree , and define by setting
. Then if is a random homogeneous non-commutative complex polynomial and and denotes the Schatten norm ( which is the norm of the singular values of ), and maximizes , then (if everything works out right), we still would have whenever .
Conclusion:
Since LSRDRs of tensors do not leave behind any random information that is not already present in the tensors themselves, we should expect for LSRDRs to be much more interpretable than machine learning systems like neural networks that do retain much random information left over from the initialization. Since LSRDRs of tensors give us so many trace zero operators, one should consider LSRDRs of tensors as very well behaved systems, and well behaved systems should be much more interpretable than poorly behaved systems.
I look forward of using LSRDRs of tensors to interpret machine learning models and produce new highly interpretable machine learning models. I do not see LSRDRs of tensors replacing deep learning, but LSRDRs have properties that are hard to reproduce using deep learning, so I look forward to exploring the possibilities with LSRDRs of tensors. I will make more posts about LSRDRs of tensors and other objects produced with similar objective functions.
Edits: (10/12/2023) I originally claimed that my dimensionality reduction does not work well for tensors in , but after reexperimentation, I was able to reduce random tensors in to matrices, and such a dimensionality reduction performed well.
Edited 1/10/2024
1 comments
Comments sorted by top scores.
comment by Joseph Van Name (joseph-van-name) · 2023-08-07T09:50:12.632Z · LW(p) · GW(p)
Massively downvoting mathematics without commenting at all just shows that the people on this site are very low quality specimens who do not care at all about rationality at all but who just want to pretend to be smart.