amirrahnama

Posts
Comments

Posts

Comments

Comment by amirrahnama on Decoding intermediate activations in llama-2-7b · 2024-01-11T10:50:28.546Z · LW · GW

Thanks, Nina, for sharing the forward pass of Hugging face. I now realize I was skipping the input layer norm calculations. Now, I can reproduce your numbers :)

Comment by amirrahnama on Decoding intermediate activations in llama-2-7b · 2024-01-10T14:38:07.456Z · LW · GW

Thanks for the nice tutorial.

I have a problem understanding your code (I am new to Pytorch). When you are calculating the activations of attention:

def forward(self, *args, **kwargs):

output = self.attn(*args, **kwargs)

if self.add_tensor is not None: output = (output[0] + self.add_tensor,)+output[1:]

self.activations = output[0] return output

What is the argument that is passed to the self.attn function?

I tried passing the following but cannot reproduce your code:

model.layers.layers[0].self_attn(past_key_values[0][0].reshape(1, 10, 32* 128))[0]
model.model.embed_tokens(inputs.input_ids.to(device))

Neither of these can reproduce your results. Can you clarify this?

User info

Posts

Comments