# Should correlation coefficients be expressed as angles?

post by Sniffnoy · 2012-11-28T00:05:50.347Z · LW · GW · Legacy · 25 comments**Edit 11/28:** Edited note at bottom to note that the random variables should have finite variance, and that this is essentially just L². Also some formatting changes.

This is something that has been bugging me for a while.

The correlation coefficient between two random variables can be interpreted as the cosine of the angle between them[0]. The higher the correlation, the more "in the same direction" they are. A correlation coefficient of one means they point in exactly the same direction, while -1 means they point in exactly opposite directions. More generally, a positive correlation coefficient means the two random variables make an acute angle, while a negative correlation means they make an obtuse angle. A correlation coefficient of zero means that they are quite literally orthogonal.

Everything I have said above is completely standard. So why aren't correlation coefficients commonly *expressed as angles* instead of as their cosines? It seems to me that this would make them more intuitive to process.

Certainly it would make various statements about them more intuitive. For instance "Even if A is positive correlated with B and B is positively correlated with C, A might be negatively correlated with C." This sounds counterintuitive, until you rephrase it as "Even if A makes an acute angle with B and B makes an acute angle with C, A might make an obtuse angle with C." Similarly, the geometric viewpoint makes it easier to make observations like "If A and B have correlation exceeding 1/√2 and so do B and C, then A and C are positively correlated" -- because this is just the statement that if A and B make an angle of less than 45° and so do B and C, then A and C make an angle of less than 90°.

Now when further processing is to be done with the correlation coefficients, one wants to leave them as correlation coefficients, rather than take their inverse cosines just to have to take their cosines again later. (I don't know that the angles you get this way are actually useful mathematically, and I suspect they mostly aren't.) My question rather is about when correlation coefficients are *expressed to the reader*, i.e. when they are considered as an end product. It seems to me that expressing them as angles would give people a better intuitive feel for them.

Or am I just entirely off-base here? Statistics, let alone the communication thereof, is not exactly my specialty, so I'd be interested to hear if there's a good reason people don't do this. (Is it assumed that anyone who knows about correlation has the geometric point of view completely down? But most people can't calculate an inverse cosine in their head...)

[0]Formal mathematical version: If we consider real-valued random variables with finite variance on some fixed probability space Ω -- that is to say, L²(Ω) -- the covariance is a positive-semidefinite symmetric bilinear form, with kernel equal to the set of essentially constant random variables. If we mod out by these we can consider the result as an inner product space and define angles between vectors as usual, which gives us the inverse cosine of the correlation coefficient. Alternatively we could just take L²(Ω) and restrict to those elements with zero mean; this is isomorphic (since it is the image of the "subtract off the mean" map, whose kernel is precisely the essentially constant random variables).

## 25 comments

Comments sorted by top scores.

## comment by RichardKennaway · 2012-11-28T10:18:00.018Z · LW(p) · GW(p)

It's interesting to compare these angles with those that come up in engineering.

One second of arc (the Ramsden theodolite): r = 0.99999999925

1/5000 to 1/1000 inch per inch (accuracy of a try-square): r = 0.99999950 to 0.999999980

1 minute of angle (the accuracy that a rifle should hold to) = 1 inch at 100 yards: r = 0.999999970

1 degree: r = 0.99985

1 compass point (1/32 of a revolution, limit of navigational accuracy in the days of sail): r = 0.981

And in contrast:

Correlation that implies 1 bit of mutual information (for a bivariate Gaussian): r = 0.866, angle = 30 degrees (exactly).

Correlation considered high in the softer disciplines: r = 0.8, angle = 36.9 degrees.

Correlation considered publishable: r = 0.2, angle = 78.5 degrees. This means that if the truth is due North, you're proceeding East by North (one compass point away from East).

## comment by Kindly · 2012-11-28T03:15:30.332Z · LW(p) · GW(p)

Among mathematicians, of course, one ought to be able to go from one interpretation to the other as necessary (I think it boils down to switching from an algebraic interpretation of your random variables to a geometric one).

If non-mathematicians were exposed to the angle version... that might help a lot of people understand how correlations interact. But, ultimately, the result is still a black box: in practice, I expect people to memorize "0.99 very good, 0.7 okay, 0.05 bad" which just turns into "8 degrees good, 45 degrees okay, 87 degrees bad".

In some cases, a (positive) correlation coefficient can be thought of as a probability, which is another point in its favor.

## comment by RichardKennaway · 2012-11-28T08:44:05.826Z · LW(p) · GW(p)

In some cases, a (positive) correlation coefficient can be thought of as a probability, which is another point in its favor.

I haven't come across that. Could you amplify it?

## comment by Kindly · 2012-11-28T13:04:00.993Z · LW(p) · GW(p)

*Edit: I started out giving a somewhat confusing example so I'm just going to edit this comment to say something less confusing.*

Let X be any random variable, and define Y as follows: with probability R, Y=X, and otherwise Y is independently drawn from the same distribution as X. Then X and Y are identically distributed and have correlation R.

## comment by Douglas_Knight · 2012-11-28T22:25:04.118Z · LW(p) · GW(p)

What definition of correlation are you using?

## comment by Kindly · 2012-11-28T22:39:16.047Z · LW(p) · GW(p)

This one. As far as I know, it's the only kind of correlation that the cosine interpretation is valid for.

If you want to verify my claim, it will help to assume that your distribution has mean 0 and variance 1, in which case the correlation between X and Y is just E[XY]. But correlation is invariant under shifting and scaling, so this is fully general.

Edit: I suppose I was imprecise when referring to the correlation between bit strings; what I mean there is simply the correlation between any pair of corresponding bits. This sort of confusion appears to be standard.

## comment by Douglas_Knight · 2012-11-28T23:31:32.244Z · LW(p) · GW(p)

Yes, I was wondering what you meant by the correlation of a string. Also, the definition doesn't apply to binary variables unless you somehow interpret bits as numbers, but as you say, it doesn't matter how you do so. There's a reason I asked what definition *you* were using, not the original post.

This sort of confusion appears to be standard.

Which confusion? The confusion between a random variable and sequence of iid draws from it? And what do you mean by standard?

## comment by Kindly · 2012-11-28T23:55:20.343Z · LW(p) · GW(p)

By "this sort of confusion" I meant the ambiguity between the correlation of two bit strings and the correlations between the individual bits. By "standard" I mean that I didn't make this up; I've seen other people do this.

Anyway, perhaps it's better to focus on the more general (and less notation-abusing) example I gave.

## comment by Decius · 2012-11-28T04:44:35.591Z · LW(p) · GW(p)

All's well and good, until you get to adding angles together, unless you visualize A,B,C existing on a sphere and the angles between them being from the center of the sphere- are the possible angles of the sphere *identical* to the inverse cosines of the possible correlations? (If A and B are .707 correlated, and B and C are .707 correlated, are A and C necessarily somewhere between 0 and 1 correlated?)

For four variables (six angles), you would require visualizing them on a four-dimensional surface analogous to a sphere. I can do some of the math involved in calculating the possible angles there, but I can't visualize it beyond 'there are four points on that surface such that all of them are 90 degrees apart.' I know that any three of those points uniquely define a sphere, and each of those three spheres is a cross-section of the four-dimensional surface.

## comment by Sniffnoy · 2012-11-28T05:46:42.055Z · LW(p) · GW(p)

All's well and good, until you get to adding angles together, unless you visualize A,B,C existing on a sphere and the angles between them being from the center of the sphere- are the possible angles of the sphere identical to the inverse cosines of the possible correlations? (If A and B are .707 correlated, and B and C are .707 correlated, are A and C necessarily somewhere between 0 and 1 correlated?)

Yes. If you have n random variables, you can restrict to the subspace spanned by them (or their equivalence classes). For instance, if we have 3 random variables whose equivalence classes are linearly independent, then the span of these equivalence classes will be a 3-dimensional real inner product space, which will then necessarily be isomorphic to **R**^3 with the dot product.

Actually, it occurs to me that I've never actually sat down and checked that angle addition obeys the triangle inequality in 3 dimensions -- I suppose if nothing else it can be done by lots of inequality grinding -- but that's not relevant. The point is, it does hold in dimension 3, and hence by the argument above, it holds regardless of dimension, finite or infinite.

## comment by Decius · 2012-11-28T17:27:42.752Z · LW(p) · GW(p)

That's half of it- does there exist any set of angles which *are* mutually compatible angles on the n-dimensional surface but *not* inverse cosines of correlations?

## comment by Kindly · 2012-11-28T18:04:42.901Z · LW(p) · GW(p)

No. Given any mutually compatible angles (which means we can choose unit vectors that have those angles) we can generate appropriately correlated Gaussian variables as follows: take these unit vectors, generate an n-dimensional Gaussian, and then take its dot product with each of the unit vectors.

## comment by Decius · 2012-11-29T17:24:04.352Z · LW(p) · GW(p)

Now the hard question: Is there a finite number n such that all finite combinations of possible correlations can be described in n-dimensional space as mutually compatible angles?

My gut says no, n+1 uncorrelated variables would require n+1 right angles, which appears to require n+1 dimensions. I'm only about 40% sure that that line of thought leads directly to a proof of the question I tried to ask.

## comment by Kindly · 2012-11-29T17:38:15.772Z · LW(p) · GW(p)

Your gut is right, both about the answer and about its proof (n+1 nonzero vectors, all at right angles to each other, always span an n+1-dimensional space). You should trust it more!

## comment by Decius · 2012-11-30T01:52:26.960Z · LW(p) · GW(p)

I think that my 40% confidence basis for the very specific claim is proper. Typically I am wrong about three times out of five when I reach beyond my knowledge to this degree.

I was hoping that there would be some property true of 11-dimensional space (or whatever the current physics math indicates the dimensionality of meatspace is) that allows an arbitrary number of fields to fit.

## comment by RichardKennaway · 2012-11-29T14:20:32.756Z · LW(p) · GW(p)

Actually, it occurs to me that I've never actually sat down and checked that angle addition obeys the triangle inequality in 3 dimensions -- I suppose if nothing else it can be done by lots of inequality grinding

The ordinary triangle inequality is immediate from -- is practically identical to -- the statement that a straight line is the shortest distance between two points.

The spherical triangle inequality, in any number of dimensions, is the same thing with "straight line" replaced by "great circle". A detail that doesn't arise for flat space is that there are two angles between two lines (whereas there is only one distance between two points in flat space), and you have to choose one that is no more than pi.

## comment by faul_sname · 2012-11-28T18:25:00.574Z · LW(p) · GW(p)

Apparently, some people can visualize more than 3 dimensions fairly easily. As for me, I use a little trick that engages the ability of my visual-spatial mind to visualize more than one object at a time.

To visualize a 6-sphere, I usually visualize a sphere of fixed size with three axes going through it. This sphere and the axes represent the higher-order dimensions. I then imagine myself somewhere in this three-dimensional space (specifically, somewhere inside the fixed sphere). I note the distance directly from the x, y, and z axes directly through me to the edge of the sphere. Each of these distances defines the radius of the 'visible surface' of the 6-sphere from that point in higher-order space looking from the x, y, or z axis respectively. By rotating the axes, the relative sizes of these surfaces change, and I'm guessing you can already visualize a rotating normal sphere in your mind. So you can rotate the 6-sphere in two sets of 3 dimensions fairly easily. Rotating between higher and lower dimensions is a bit more challenging, but still doable. For a 4- or 5- sphere, replace the 3-dimensional higher-order sphere with a circle, or just ignore the z axis.

If you can figure out how to do that, you can get a feeling for the possible orientations of the angles to one another. And that actually is fairly interesting, and the first time I really understood on a gut level why r^2 is used instead of r for correlations.

## comment by **[deleted]** ·
2012-12-03T13:20:37.200Z · LW(p) · GW(p)

The Springer Series in Statistics has a book called *Statistical Methods: The Geometric Approach* by Saville and Wood, which covers geometric intuition behind statistical concepts. Section 15.4 has correlation coefficients. Here is a screenshot of the relevant section:

## comment by **[deleted]** ·
2012-11-28T11:33:00.814Z · LW(p) · GW(p)

So why aren't correlation coefficients commonly expressed as angles instead of as their cosines?

Dunno, I guess hysterical raisins.

I *have* taken the arccosine of correlation coefficients in my mind occasionally.

## comment by iDante · 2012-11-28T01:05:41.215Z · LW(p) · GW(p)

I don't think correlation coefficient is the cosine of the angle between them. In this picture you can see that the middle row has 1 or -1 and yet different angles. Instead I think it's a measure of how well they correlate with each other. Maybe I'm crazy-wrong though. Edit: For only two variables you might be right, but I don't see correlation coefficients reported between just two variables.

I often see the entire covariance matrix reported, not just the correlation coefficient. This matrix has some handy properties. Its eigenstuff contains all the information about the arrows in this picture.

## comment by Sniffnoy · 2012-11-28T01:33:11.804Z · LW(p) · GW(p)

I don't think correlation coefficient is the cosine of the angle between them. In this picture you can see that the middle row has 1 or -1 and yet different angles.

Sorry, I guess I was unclear on what I meant. I didn't mean the angles pictured in that graph. Rather I meant what was described in the footnote -- consider covariance as an inner product and consider the angles you get this way.

In other words, I'm not claiming any sort of theorem; I'm not claiming that the correlation coefficient in fact tells you some other information that isn't obviously already in there. I've defined the angles here such that "the correlation coefficient is the cosine of the angle between them" is a tautology. I'm just suggesting that this geometric viewpoint might help with the intuition.

## comment by iDante · 2012-11-28T02:41:49.249Z · LW(p) · GW(p)

Whoa. I didn't have this geometric point of view down so I didn't understand at first what you meant, but now that I've thought about it for a few minutes it makes sense.

The prevalent intuitive meaning of correlation coefficient (at least the one I learned in the watered-down statistics-for-physics-students class) is "a measure of how well two variables correlate. 1 is well, 0 is not at all, -1 is backwards correlation." Hence, the first thing I thought of was that image. Many people who need to use this coefficient won't have taken linear algebra and it'll be a complication for them to learn that it's "the inner product of two random variablesies, so 0 means lots of correlation, pi means the opposite direction, and pi/2 means no correlation." Or maybe you use degrees, idk.

I like it, thanks for making this thread :D