post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by gjm · 2021-01-08T09:54:53.152Z · LW(p) · GW(p)

You've dropped a factor of  about half-way through your calculation. And then you've multiplied by  between two lines separated by "="; the idea is that both sides are zero so it kinda-sorta makes sense but it's super-misleading. If you restore the factor of  then your last equation ends up as .

But even this is wrong, I'm afraid. You can't multiply by  there at all. There is no  is not (except by coincidence, and in an ML application if this coincidence happens then you don't have anything like enough data) a square matrix and in general it has no inverse.

There are problems earlier in the derivation, too, which I think are encouraged by some of your nonstandard notation. E.g., you write  rather than  or , and this has fooled you into writing down something wrong for what you write as . That's also nonstandard notation; it's defensible but again it makes it easy to get things wrong by mixing up left and right multiplications. Let's do it with more standard and explicit notation, which will make it harder to make mistakes:

The  is constant and its derivative is zero. The terms linear in  are one another's transposes and readily yield . The second quadratic term is just  whose  is . The first quadratic term is similarly  which equals  whose  is .

So what ends up being zero is the th component of  and if you like you can write . But again you need to be very clear about what you mean by that;  means "the  such that to first order " and so actually the Right Thing to use for the "derivative" is the transpose of what I wrote down above.

Finishing off the correct derivation, we have

 so  so .

comment by gjm · 2021-01-08T09:58:15.072Z · LW(p) · GW(p)

I think it genuinely doesn't make sense to say that  reflects our prior expectation of ; the acolyte is correct. What  reflects is our prior on ; that regularization term corresponds exactly to a prior that makes  (multivariate) normally distributed with mean zero and covariance  times the identity (i.e., components independent and each component having variance ).