Posts

Comments

Comment by adzcai (alexander-cai) on No convincing evidence for gradient descent in activation space · 2023-04-13T06:03:35.349Z · LW · GW

Regarding "GD++": this is almost identical to the dynamics you'd expect when doing gradient descent on linear regression. See p 10 of these lecture notes for an explanation.

Given, here they're applying this linear transformation to the input data and not as an operator on the weights, but my intuition says there's got to be some sort of connection here; It's "removing" (part of) the component of $x$ that can be represented as a linear combination of the data. (Apologies for a half-formed response; Happy to hear any connections others make.)

(Edited to fix link formatting.)

Comment by adzcai (alexander-cai) on Cognitive Emulation: A Naive AI Safety Proposal · 2023-02-26T05:54:33.263Z · LW · GW

What do you see as the key differences between this and research in (theoretical) neuroscience? It seems to me like the goals you've mentioned are roughly the same goals as those of that field: roughly, to interpret human brain circuitry, often through modelling neural circuits via artificial neural networks. For example, see research like "Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation".

Comment by adzcai (alexander-cai) on Video/animation: Neel Nanda explains what mechanistic interpretability is · 2023-02-23T01:36:42.976Z · LW · GW

Or even better, finetuning an LLM to automate writing the code!