Posts

Reflections on Trusting Trust & AI 2023-01-16T06:36:24.761Z
Analogies between Software Reverse Engineering and Mechanistic Interpretability 2022-12-26T12:26:57.880Z

Comments

Comment by Itay Yona (itay-yona) on Inferring the model dimension of API-protected LLMs · 2024-04-06T16:17:10.463Z · LW · GW

The true rank is revealed because the output dimensionality is vocab_size, which is >> hidden_dim. It is unclear how to get something equivalent to that from the cortex. It is possible to record multiple neurons (population) and use dimensionality reduction (usually some sort of manifold learning) to learn the true dimensionality of the population. It is useful in some areas of the brain such as the hippocampal formation.

Comment by Itay Yona (itay-yona) on Analogies between Software Reverse Engineering and Mechanistic Interpretability · 2022-12-27T22:33:03.180Z · LW · GW

Thanks, that's a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs. 

In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)

since we share similar backgrounds and aspiration feel free to reach out:

https://www.linkedin.com/in/itay-yona-b40a7756/

Comment by Itay Yona (itay-yona) on Analogies between Software Reverse Engineering and Mechanistic Interpretability · 2022-12-27T22:19:45.257Z · LW · GW

I strongly agree! When you study towards RE it is critical to understand lots of details about how the machine works, and most people I knew were already familiar with those. They were lacking the skills of using their low-level understanding to actually conduct useful research effectively.

It is natural to pay much less attention to 1->2 phase since there are much more intermediate researchers than complete newbies or experts. It is interesting because when discussing with the intermediate researchers they might think they are discussing with person 1 instead of person 3.

 

Thanks you gave me something to think about :)

Comment by Itay Yona (itay-yona) on What if memes are common in highly capable minds? · 2022-06-06T22:10:27.417Z · LW · GW

[In my opinion]

Memes are self-replicating concepts (given you have enough humans to spread them). Highly capable minds are different as they contain predictive models of: world, self, and others. This allows them to manipulate both objects in the world, and other people to fulfill their needs. Since memes don't have these capacities, and even though they are related to human behavior, they should not be accounted as the cause of human behavior. Even if the best way to explain human behavior is through memes, they don't necessarily account of most of the decision-making process.

[/In my opinion]