[Link] Wavefunctions: from Linear Algebra to Spinors

post by sen · 2022-12-07T12:44:33.522Z · LW · GW · 12 comments

This is a link post for https://paperclip.substack.com/p/understanding-wavefunctions

Contents

12 comments

I wrote this blogpost because I thought it took an excessive amount of digging to understand what a spinor was. My original motivation was to understand wavefunctions more concretely since I recently discovered that wavefunctions are spinor-valued, not (necessarily) complex-valued. That took me down a rabbit hole of gamma matrices, geometric algebra, quaternions, and about a dozen other topics.

I think physics is taught very badly. Modern physical theories are built on some very heavy and very powerful mathematical machinery. That machinery is absolutely worth learning, but expositions on physical phenomena seem to have no middle ground between "breadth-first" (require all the background before being able to understand anything), "assembly-level" (discuss the raw equations without any intuition), and "vague analogies." It seems entirely possible to introduce slices of very abstract math as needed so people can go deep without having to go wide and without having to sacrifice either intuition or precision.

Anyway. This blogpost was a proof of concept. It assumes a background in linear algebra, no more than what's taught to a STEM freshman. I try to explain a vertical slice of the mathematical machinery needed to understand spinors. I'm not a physicist, nor do I have access to one, so I might have gotten something wrong. If you notice any errors, please let me know.

12 comments

Comments sorted by top scores.

comment by interstice · 2022-12-07T19:19:55.356Z · LW(p) · GW(p)

[Also not a physicist] This makes sense but seems a bit unintuitive. I like to think of spinors as being generalizations of vector fields. Consider, what makes a vector field different from 3 scalar fields? They can store the same amount of information. The answer is that when you tilt your head, the vectors tilt with you -- but in the opposite direction, from your perspective -- while the scalar fields stay fixed. In other words, the vector field transforms according to a 3-dimensional representation of the rotation group. You can get spinors by generalizing from the ordinary rotation group to the Lorentz group of metric-preserving transformations of spacetime, and noticing that, in addition to the "obvious" 4-dimensional representation, there are 2-dimensional representations as well.

Replies from: sen
comment by sen · 2022-12-07T21:45:07.487Z · LW(p) · GW(p)

EDIT: This post is incorrect. See the reply chain [LW(p) · GW(p)] below. After correcting my misunderstanding, I agree with your explanation.

The difference you're describing between vector fields and scalar fields, mathematically, is the difference between composition and precomposition. Here it is more precisely:

  • Pick a change-of-perspective function P(x). The output of P(x) is a matrix that changes vectors from the old perspective to the new perspective.
  • You can apply the change-of-perspective function either before a vector field V(x) or after a vector field. The result is either V(x)P(x) or P(x)V(x).
  • If you apply P(x) before, the vector field applies a flow in the new perspective, and so its arrows "tilt with your head."
  • If you apply P(x) after, the vector field applies a flow in the old perspective, and so the arrows don't tilt with your head.
  • You can do replace the vector field V(x) with a 3-scalar field and see the same thing.

Since both composition and precomposition apply to both vector fields and scalar fields in the same way, that can't be something that makes vector fields different from scalar fields.

As far as I can tell, there's actually no mathematical difference between a vector field in 3D and a 3-scalar field that assigns a 3D scalar to each point. It's just a choice of language. Any difference comes from context. Typically, vector fields are treated like flows (though not always), whereas scalar fields have no specific treatment.

Spinors are represented as vectors in very specific spaces, specifically spaces where there's an equivalence between matrices and spatial operations. Since a vector is something like the square root of a matrix, a spinor is something like the square root of a spatial operation. You get Dirac Spinors (one specific kind of spinor) from "taking the square root of Lorentz symmetry operations," along with scaling and addition between them.

As far as spinors go, I think I prefer your Lorentz Group explanation for the "what" though I prefer my Clifford Algebra one for the "how". The Lorentz Group explanation makes it clear how to find important spinors. For me, the Clifford Algebra makes it clear how the rest of the spinors arise from those important spinors, and it makes it clear that they're the "correct" representation when you want to sum spatial operations, as you would with wavefunctions. It's interesting that the intuition doesn't transfer as I expected. I guess the intuition transfer problem here is more difficult than I expected.

Note: Your generalization only accounts for unit vectors, and spinors are NOT restricted to unit vectors. They can be scaled arbitrarily. If they couldn't, ψ†ψ would be uniform at every point. You probably know this, but I wanted to make it explicit.

Replies from: interstice
comment by interstice · 2022-12-08T00:05:26.982Z · LW(p) · GW(p)

As far as I can tell, there’s actually no mathematical difference between a vector field in 3D and a 3-scalar field that assigns a 3D scalar to each point.

The difference is in how they transform under coordinate changes. To physicists, a vector field is defined by how it transforms. So this:

You can do replace the vector field V(x) with a 3-scalar field and see the same thing

is not correct; by definition, a 3-scalar field should transform trivially under coordinate changes.

Replies from: sen, sen
comment by sen · 2022-12-08T00:52:52.985Z · LW(p) · GW(p)

Reading the wikipedia page on scalar field, I think I understand the confusion here. Scalar fields are supposed to be invariant under changes in reference frame assuming a canonical coordinate system for space.

Take two reference frames P(x) and G(x). A scalar field S(x) needs to satisfy:

  • S(x) = P'(x)S(x)P(x) = G'(x)S(x)G(x)
  • Where P'(x) is the inverse of P(x) and G'(x) is the inverse of G(x).

Meaning the inference of S(x) should not change with reference frame. A scalar field is a vector field that commutes with perspective transformations. Maybe that's what you meant?

I wouldn't use the phrase "transforms trivially" here since a "trivial transformation" usually refers to the identity transformation. I wouldn't use a head tilt example either since a lot of vector fields are going to commute with spatial rotations, so it's not good for revealing the differences. And I think you got the association backwards in your original explanation: scalar fields appear to represent quantities in the underlying space unaffected by head tilts, and so they would be the ones "transforming in the opposite direction" in the analogy since they would remain fixed in "canonical space".

Replies from: interstice
comment by interstice · 2022-12-08T01:33:37.352Z · LW(p) · GW(p)

I wouldn’t use the phrase “transforms trivially” here since a “trivial transformation” usually refers to the identity transformation

No, I do mean the identity transformation. Scalar fields do not transform at all under coordinate changes. To be precise, if we have a coordinate change matrix , a scalar field transforms like

Whereas a vector field transforms like

For more details check out these wikipedia pages.

Replies from: sen
comment by sen · 2022-12-08T02:12:16.810Z · LW(p) · GW(p)

Ah. Thank you, that is perfectly clear. The Wikipedia page for Scalar Field makes sense with that too. A scalar field is a function that takes values in some canonical units, and so it transforms only on the right of f under a perspective shift. A vector field (effectively) takes values both on and in the same space, and so it transforms both on the left and right of v under a perspective shift.

I updated my first reply to point to yours.

comment by sen · 2022-12-08T00:25:01.364Z · LW(p) · GW(p)

Interesting. That seems to contradict the explanation for Lie Algebras, and it seems incompatible with commutators in general, since with commutators all operators involved need to be compatible with both composition and precomposition (otherwise AB - BA is undefined). I guess scalar fields are not meant to be operators? That doesn't quite work since they're supposed used to describe energy, which is often represented as an operator. In any case, I'll have to keep that in mind when reading about these things.

comment by Charlie Steiner · 2022-12-07T15:03:11.530Z · LW(p) · GW(p)

Yeah, physics tends to be taught as if you're going to use it. So you don't just get told what a Christoffel symbol is, it's assumed you're going to spend a few hours calculating them.

I found the post itself a bit confusing. The connection of quaternions to rotations wasn't clear to me (what does the real part do? IF nothing, isn't this a violation of one of the desiderata for representations? How does this relate to spinors - don't spinors use all the degrees of freedom? Etc.). I think there's an interesting comparison to be made between the representation as size-2 vectors of quaternions versus size-4 vectors of complex numbers, both practically (spinor calculations do seem to involve duplicated effort in the size-4 representation) and in interpretation (antimatter!).

Replies from: sen
comment by sen · 2022-12-07T15:42:18.524Z · LW(p) · GW(p)

In the 2D matrix representation, the basis element corresponding to the real part of a quaternion is the identity matrix. So scaling the real part results in scaling the (real part of the) diagonal of the 2D matrix, which corresponds to a scaling operation on the spinor. It incidentally plays the same role on 3D objects: it scales them. Plus, it plays a direct role in rotations when it's -1 (180 degree rotation) or 1 (0 degree rotation). Same as with i, j, and k, the exact effect of changing the real part of the quaternion isn't obvious from inspection when it's summed with other non-zero components. For example, it's hard to tell by inspection what the 2 or the 3j is doing in the quaternion 2+3j.

In total, quaternions represent both scaling, rotating, and any mix of the two. I should have been clearer about that in the post. Spinors for quaternions do include any "state changes" resulting from the real part of the quaternion as well as any changes resulting from i, j, and k components, so the spinor does use all degrees of freedom.

The change in representation between 2-quaternion and 4-complex spinors is purely notational. It doesn't affect any of the math or underlying representations. Since a quaternion operation can be represented by a 2x2 complex matrix, you can represent a 2-quaternion operation as the tensor product of two 2x2 complex matrices, which would give you a 4x4 complex matrix. That's where 4x4 gamma matrices come from-- each is a tensor products of two 2x2 Pauli matrices. For all calculations and consequences, you get the exact same answers whether you choose to represent the operations and spinors as quaternions or complex numbers.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2022-12-07T16:34:49.685Z · LW(p) · GW(p)

Plus, it plays a direct role in rotations when it's -1 (180 degree rotation) or 1 (0 degree rotation)

Isn't -1 inversion? Inverting the axis of rotation makes total sense (while "180 degree rotation" with no axis is nonsense) - and inverting the scale of the object also makes sense, but is nonphysical. (This is why physicists talk about SO(3), not O(3)).

Replies from: Korz
comment by Mart_Korz (Korz) · 2022-12-07T18:31:04.246Z · LW(p) · GW(p)

Isn't -1 inversion?

I think for quaternions, corresponds both to inversion and a 180 degree rotation.

When using quaternions to describe rotations in 3D space however, one can still represent rotations with unit-quaternions where n is a 'unit vector' distributed along the directions and indicates the rotation axis, and is the 3D rotation angle. If one wishes to rotate any orientation (same type of object as n) by q, the result is . Here, corresponds to and is thus a full 360 turn.

I have tried to read up on explanations for this a few times, but unfortunately never with full success. But usually people start talking about describing a "double cover" of the 3D rotations.

Maybe a bit of intuition about this relation can come from thinking about measured quantities in quantum mechanics as 'expectation values' of some operator written as : Here it becomes more intuitive that replacing (rotating the measured quantity back and forth by around the axis ) results in , which is an -rotated X measured on an -rotated wavefunction.

Replies from: sen
comment by sen · 2022-12-07T23:15:31.230Z · LW(p) · GW(p)

Thanks for the explanation. I found this post that connects your explanation to an explanation of the "double cover." I believe this is how it works:

  • Consider a point on the surface of a 3D sphere. Call it the "origin".
  • From the perspective of this origin point, you can map every point of the sphere to a 2D coordinate. The mapping works like this: Imagine a 2D plane going through the middle of the sphere. Draw a straight line (in the full 3D space) from the selected origin to any other point on the sphere. Where the line crosses the plane, that's your 2D vector representation of the other point. Under this visualization, the origin point should be mapped to a 2D "point at infinity" to make the mapping smooth. This mapping gives you a one-to-one conversion between 2D coordinate systems and points on the sphere.
  • You can create a new 2D coordinate system for sphere surface points using any point on the sphere as the origin. All of the resulting coordinate systems can be smoothly deformed into one another. (Points near the origin are always large, points on the opposite side of the sphere are always close to the 0,0,0, and the changes are smooth as you move the origin smoothly.)
  • Each choice of origin on the surface of the sphere (and therefore each 2D coordinate system) corresponds to two unit-length quaternions. You can see this as follows. Pick any choice of i,j,k values from a unit quaternion. There are now either 1 or 2 choices for what the real component of that quaternion might have been. If i,j,k alone have unit length, then there's only one choice for the real component: zero. If i,j,k alone do not have unit length, then there are two choices for the real component since either a positive or a negative value can be used to make the quaternion unit length again.
  • Take the set of unit quaternions that have a real component close to zero. Consider the set of 2D coordinate systems created from those points. In this region, each coordinate system corresponds to two quaternions EXCEPT at the points where the quaternion's real component is 0. This exceptional case prevents a one-to-one mapping between coordinate transformations and quaternion transformations.
  • As a result, there's no "smooth" way to reduce the two-to-one mapping from quaternions to coordinate systems down to a one-to-one mapping. Any mapping would require either double-counting some quaternions or ignoring some quaternions. Since there's a one-to-one mapping between coordinate systems and candidate origin points on the surface of the sphere, this means there is also no one-to-one mapping between quaternions and points on the sphere.
  • No matter what smooth mapping you choose from SU(2), unit quaternions, to SO(3), unit spheres, the mapping must do the equivalent of collapsing distinctions between quaternions with positive and negative real components. And so the double cover corresponds to the two sets of covers: one of positive-real-component quaternions over the sphere, and one of the negative-real-component quaternions over the sphere. Within each cover, there's a smooth one-to-one conversion between quaternion-coordinates mappings, but across covers there is not.