Posts
Comments
Comment by
amirrahnama on
Decoding intermediate activations in llama-2-7b ·
2024-01-11T10:50:28.546Z ·
LW ·
GW
Thanks, Nina, for sharing the forward pass of Hugging face. I now realize I was skipping the input layer norm calculations. Now, I can reproduce your numbers :)
Comment by
amirrahnama on
Decoding intermediate activations in llama-2-7b ·
2024-01-10T14:38:07.456Z ·
LW ·
GW
Thanks for the nice tutorial.
I have a problem understanding your code (I am new to Pytorch). When you are calculating the activations of attention:
def forward(self, *args, **kwargs):
output = self.attn(*args, **kwargs)
if self.add_tensor is not None: output = (output[0] + self.add_tensor,)+output[1:]
self.activations = output[0] return output
What is the argument that is passed to the self.attn function?
I tried passing the following but cannot reproduce your code:
- model.layers.layers[0].self_attn(past_key_values[0][0].reshape(1, 10, 32* 128))[0]
- model.model.embed_tokens(inputs.input_ids.to(device))
Neither of these can reproduce your results. Can you clarify this?