praneetneuro

Posts
Comments

Posts

Comments

Comment by PraneetNeuro (praneetneuro) on Bridging the VLM and mech interp communities for multimodal interpretability · 2024-10-28T21:01:13.446Z · LW · GW

I'd be keen to see the TEXTSPAN method applied to the attention heads of CLIP's text encoder

It'd also be interesting to see the same applied to the audio encoder of CLAP. Really curious to know what your thoughts are about mech interp efforts in the audio space. It seems to be largely ignored.

P.S : Thank you for the excellent post.

Comment by PraneetNeuro (praneetneuro) on Bridging the VLM and mech interp communities for multimodal interpretability · 2024-10-28T20:48:38.421Z · LW · GW

However, GPT-4o gave totally off results , such as "the faces and bodies of various birds, the face of a rabbit, and the body of a dog

Trying the same image, and prompt with Claude 3.5 seems to work. Here's the response :

Important concepts:

Tree branches and foliage, particularly bright yellow-lit sections
Ground/grass in several upper images
Some small patches of sky

Comment by PraneetNeuro (praneetneuro) on Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems · 2024-03-15T17:24:21.259Z · LW · GW

I agree that in-context learning is not entirely explainable yet, but we're not completely in the dark about it. We have some understanding and direction or explainability regarding where this ability might stem from, and it's only going to get much clearer from here.

Comment by PraneetNeuro (praneetneuro) on Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems · 2024-03-14T04:08:39.209Z · LW · GW

However, it feels pretty odd to me to describe branching out into other modalities as crucial when we haven't yet really done anything useful with mechanistic interpretability in any domain or for any task.

I think the objective of interpretability research is to demystify the mechanisms of AI models, and not pushing the boundaries in terms of achieving tangible results / state of the art performance (I do think that interpretability research indirectly contributes in pushing the boundaries as well, because we'd design better architectures, and train the models in a better way as we understand them better). I see it being very crucial, especially as we delve into models with emergent abilities. For instance, the phenomenon of in-context learning by language models used to be considered a black box, now it has been explained through interpretability efforts. This progress is not trivial; it lays the groundwork for safer and more aligned AI systems by ensuring we have a clearer grasp of how these models make decisions and adapt.

I also think there are key differences in how these architectures function across modalities, such as attention being causal for language, while being bidirectional for vision, and how even though tokens and image patches are analogous, they are consumed and processed differently, and so much more. These subtle differences change how the models operate across modalities even though the underlying architecture is the same, and this is exactly what necessitates the mech interp efforts across modalities.

User info

Posts

Comments