Task vectors & analogy making in LLMs

post by Sergii (sergey-kharagorgiev) · 2024-01-08T15:17:58.992Z · LW · GW · 1 comments

This is a link post for https://grgv.xyz/blog/copycat2/

Contents

    Meaningful intermediate embeddings?
    Task vectors
      “bonjour -> hello, one -> un”
  Examples of task vectors
  Task vectors for copycat-like problems
  Interpretability of the task vectors
  references
None
1 comment

I have described the problem of analogy-making interpretability in the previous post: given the examples of transformed sequences of numbers, what’s the mechanism behind figuring this transformation out, and applying it correctly to the incomplete (test) sequence?

prompt: "0 1 2 to 2 1 0, 1 2 3 to 3 2 1, 4 5 6 to ", output: “6 5 4”

It was easy to check on which layer the correct answer appears, but tracing the sources of that answer to earlier layers turned out to be challenging.

Meaningful intermediate embeddings?

When I applied logit lens [LW · GW] [1] to the output of attention blocks, for the prompt that contained reversed sequences of numbers, I have noticed that the output contained “reverse” token (on last token position of layer 15).

I’m using a llama.cpp-based app (described in the previous post) for showing the logit lens output. Each row corresponds to a token position, and lists top 5 tokens sorted by the logit score:

./mia -m llama2.gguf --prompt "0 1 2 to 2 1 0, 1 2 3 to 3 2 1, 4 5 6 to" -n 5 --logit-lens kqv_out 5

Layer #15 kqv_out-15:
0: дар 0.35|oure 0.35|kar 0.33| Según 0.33|aki 0.3|
1:  dust 0.36|textt 0.36|elde 0.35|azzo 0.34| retro 0.34|
2: 典 0.37| Censo 0.35|oure 0.35| Aires 0.35| pó 0.34|
3: ḷ 0.39|ket 0.39| estaven 0.39|öß 0.39|oure 0.38|
4: zerw 0.62| estaven 0.51|cita 0.5| alberga 0.49|łow 0.48|
[...]
16: shal 0.84|ket 0.73|Assert 0.72|ając 0.66|sono 0.66|
17: ipt 0.95|кта 0.88|inal 0.86| inform 0.85| advanced 0.85|
18: кта 0.85|minipage 0.83| Mean 0.77|Assert 0.75| meaning 0.74|
19: ipt 0.78|Duration 0.76|zug 0.75|gemeinde 0.75|mannschaft 0.72|
20: shal 0.64|agy 0.64|prev 0.62| SA 0.6| Gay 0.58|
21:  revers 0.75| reverse 0.68|mat 0.67|shal 0.66|vat 0.66|

Although the task is about reversing, “reverse” is not mentioned explicitly anywhere in either input or output.

I have tried subtracting the embedding of the “reverse” token, effectively removing it, to check it is a part of the analogy-making mechanism. It did not affect the output, which means that this token is not critical for the generation, and might be just a side effect of some other mechanism.

Task vectors

While looking for more information, I have found that this topic is covered by two recent papers on so-called “task vectors” [1, 2]. These papers concurrently explore similar ideas, just approach the problem differently, complementing each other.

So, what’s a task vector? Suppose that the prompt is a set of examples of some transformation. For instance, pairs of English and French words, implying a translation task:

“bonjour -> hello, one -> un”

We expect that, by analogy, the model will complete next English words with French translations as well. In case of the prompt that has an incomplete test query in addition to training examples:

“bonjour -> hello, one -> un, yes -> “, output should be “oui”.

The idea behind “task vectors” is that in this case the model is working in two stages.

First, in the the earlier layers, the model creates an abstracted and compressed representation of the “translate into French” task, based on several train examples. This task description (task vector) is stored in the embedding space. Then, next layers use this task vector as guidance for what transformation to use for next relevant completions.

What follows is that these two stages can be split and applied separately. It’s possible to extract the task vector and use it instead of the train examples to get correct test predictions.

In [1] extraction and application of a task vector is as simple as copying and pasting an embedding vector for a specific token. In [2] the methodology is more complicated but the effect is similar in both cases.

Another interesting point is the observation in [1] that task vectors contain tokens that describe the tasks:

In multiple cases, we observe tokens that directly describe the task. Importantly, these terms never explicitly appeared in the context. For example in the task of translation from French to English, we observe tokens such as “English” and “translate”. This supports our view that θ carries significant, non-trivial semantic information about the task [1]

This explains finding the “reverse” token embedding in my experiments with the logit lens.

Examples of task vectors

For an example of using a task vector, let’s look at the prompt: “France ->”. Without any interventions, Llama2 output is “Italy” (again, using llama.cpp-based application):

./mia --model llama2.gguf --prompt "France ->"
output: " Italy"

Let’s use a task vector to modify the model’s state so that instead, it would output a capital city for a given country: “France -> Paris”.

First, to create the task vector, we need several examples of the target transformation (country -> capital):

Egypt -> Cairo, Norway -> Oslo, Estonia ->"

With these examples as the input, we can save the vector that corresponds to the last token, from the model’s residual stream on layer #14[1]:

./mia --model llama2.gguf --prompt "Egypt -> Cairo, Norway -> Oslo, Estonia ->" --save l_out-14 ~/tmp/l_out-14
output: " Tallinn"

And finally to apply the task vector to the initial prompt, need to patch the last token’s vector with the saved vector, on the same layer #14.

./mia --model llama2.gguf --prompt "France ->" --patch l_out-14 ~/tmp/l_out-14 --from-token-idx 13 --to-token-idx 2
output: " Paris"

It worked, the output was successfully modified, and the task vector patching induced the generation of the country’s capital.

Task vectors for copycat-like problems

Does it work for the copycat-like problems from the previous post? For example :

prompt: "0 1 2 to 2 1 0, 1 2 3 to 3 2 1, 4 5 6 to ", output: “6 5 4”

In this case, based on several train examples, the model reverses the last incomplete example, by analogy. Let’s check if reversal can be induced using a task vector instead of the examples.

Without any examples, the output is a straightforward continuation of the sequence:

./mia --model llama2.gguf --prompt "4 5 6 to 6 "
output: "7 8"

Creating a task vector based on several examples:

./mia --model llama2.gguf --prompt "1 2 3 to 3 2 1, 5 6 7 to 7 " --save l_out-14 ~/tmp/l_out-14
output: "6 5"

And, apply the task vector:

./mia --model llama2.gguf --prompt "4 5 6 to 6 " --patch l_out-14 ~/tmp/l_out-14 --from-token-idx 24 --to-token-idx 10
output: "5 4"

Works correctly for this example as well.

Interpretability of the task vectors

The next question is – what are the mechanisms behind the task vectors? Both parts: creation based on train examples, and application to novel test examples.

In [1], authors stop at analyzing the effects of task vector application, while [2] goes further, finding the set of attention heads that strongly affect the task vector. However, there is still no understanding of the sub-circuits and specific computational structures.

There are many other open questions, for example:

references

  1. interpreting GPT: the logit lens [LW · GW]
  2. R. Hendel, M. Geva, A. Globerson, In-Context Learning Creates Task Vectors. arXiv, doi: 10.48550/arxiv.2310.15916 (2023). https://arxiv.org/abs/2310.15916
  3. E. Todd, M. L. Li, A. S. Sharma, A. Mueller, B. C. Wallace, D. Bau, Function Vectors in Large Language Models | Papers With Code (2023). https://paperswithcode.com/paper/function-vectors-in-large-language-models.

Layer #14 is selected based on experimental data from [1], and on testing several options for the selected examples ↩︎

1 comments

Comments sorted by top scores.

comment by Chris_Leong · 2024-02-22T05:50:45.879Z · LW(p) · GW(p)

"Previous post" links to localhost.