You Can Probably Amplify GPT3 Directly

post by Zachary Robertson (zachary-robertson) · 2020-07-26T21:58:53.962Z · LW · GW · 14 comments

Contents

  Factoring
  Q: Say I want to sum the items in a list. How would I do this recursively? The answer involves two steps.
  A:
  Q: Could you explain step 1) in more detail?
  Q: So we take an item from an empty list and put it into the second step?
  Q: And then we repeat until we're out of items?
  Q: How do you select an item from an empty list?
  Q: Ok. So your step one won't actually work. Could you try explaining your algorithm again?
  Q: Ok. Things get a bit more complicated here since you also are the implementation. What's the easiest way to take an item from the list?
  Q: How does "iterator" work?
  Q: Hmm...then second parts makes sence. So you want to split the list into two parts?
  Q: Like dividing by two?
  Q: My problem is that now the size of the splits depends on the list size. Your memory is not fantastic. Could you split this differently?
  Q: I think that makes things worse. Besides that, according to you, I need only a single item from the list anyway. How do I select that item and then split it from the list?
  Q: Ahh, so when the index is zero that means you should grab that item and split it off?
  Amplification
  Q: Okay, so we split the first digit in the list off from the list and add it to 'result' and then return 'result' and the remaining portion of the list. Do you think you can implement that?
  Q: Well, as I said things get more complicated now. Do you know how to find the first digit of list?
  Q: What is the tail of the list ['4', '1', '9']?
  Q: So the head might be thought of as the first and the tail as everything else.
  Q: The result is 16. The head of the list is 5. What is the sum?
  Q: Hi! I'm going to ask you to add the two numbers I give you. Thanks!
  A: Cool, got it!
  Q: The total is 0. The head of the list is 0. What is the sum?
  A: The result is 0 plus 0 which is 0.
  Q: The result is 0. The head of the list is 9. What is the sum?
  A: The result is 0 plus 9 which is 9.
  Q: The result is 9. The head of the list is 1. What is the sum?
  A: The result is 1 plus 9 which is 10.
  Q: Great! Now we'll do this again!
  A: Sounds good!
  Q: The result is 0. The head of the list is 4. What is the sum?
  Q: That's not correct. You're off by one. 0 plus 4 is 4. Do you understand?
  Q: The result is 0. The head of the list is 4. What is the sum?
  Q: That's correct! You've learned how to do this task. Let's test your knowledge!
  Q: The result is 455. The head of the list is 38. What is the sum?
  Q10: Seems you forgot to carry. The result is 455. The head of the list 38. What is the sum?
  Q: Actually you haven't. 8 plus 5 is 13. The answer should be 493. Do you understand your error?
  Q1: The result is 278. The head of the list 56. What is the sum?
  Q2: The result is 245. The head of the list 76. What is the sum?
  Q3: The result is 115. The head of the list 23. What is the sum?
None
14 comments

Epistemological Status: The idea is un-original and the implementation is the first I'm aware of. I'm reasonably confident anyone (patient) with an hour to burn could reproduce these results. You can probably amplify GPT3 directly.

Let me cut the suspense. I can get GPT3 to sum the items in a list using recursion. I have a notebook here and Dungeon instances for the parts: factoring, amplification1, and amplification2. I'm suggesting I can combine different instances of GPT3 to implement primitive recursive functions. Moreover, I can get GPT3 to suggest approaches for implementation. Combined, this is a toy-example showing how to amplify GPT3's capabilities. At the moment, this method is slow, but does provide proof-of-concept. Obvious extensions include: prompting GPT3 to factor questions, ask questions, generate examples, induce belief states that encourage learning.

Factoring

In the following I don’t re-roll unless I say so. Anything omitted here is available in the links at the top. Mistakes have to be dealt with in script. This simplifies reproducibility. I'll bold my prompts and italicize for emphasis. Occasionally, GPT will get ahead of me and auto-complete questions. Before hand I decided that I'd cut off at the end of an answer, but that doesn't mean that I can't ask the same question if I think it was good. First, I state the problem,

Q: Say I want to sum the items in a list. How would I do this recursively? The answer involves two steps.

A:

  1. Take the item from the first step and put it into the second step.
  2. Repeat until you run out of items! B: You can't use recursion here because there are no more items to add, so you have to start over with Step 1 again.

Q: Could you explain step 1) in more detail?

Here GPT3 fails to explain step one properly,

A: I will refer to the first step as the "base case". In Step 1, the base case is an empty list. In order to put an item into a list, you have to have a list in the first place!

Q: So we take an item from an empty list and put it into the second step?

A: Yes, exactly.

Q: And then we repeat until we're out of items?

A: Right. So now that you understand what Step 1 does, let's look at the second step.

Yet, with prompting recovers.

Q: How do you select an item from an empty list?

A: You can't. So we're not going to get to Step 2 at all! If we try to select an item from an empty list, our program is going to crash.

Q: Ok. So your step one won't actually work. Could you try explaining your algorithm again?

A: Sure. We're trying to sum the items in a list. Let's call the sum of those numbers, "S". We have two variables, "i" and "result". We start by making "result" equal to zero. Then we iterate through the list, adding "result" to "result" each time. Then, once we've gone through the whole list, we make "result" equal to "result" plus the number in our list. This is one way to do it.

This is good. I go into more detail on what step two looks like. I go a bit meta to see if that works,

Q: Ok. Things get a bit more complicated here since you also are the implementation. What's the easiest way to take an item from the list?

A: You can use an "iterator". It's going to look something like myList.iterate({ }, handleSuccess, handleError);

Q: How does "iterator" work?

A: "Iterator" looks at the items in your list for what would be the greatest result. Imagine splitting a list evenly into two parts.

Q: Hmm...then second parts makes sence. So you want to split the list into two parts?

A: Exactly. If you have a list of four items, they will split it into two lists of two. Those two splits will go through the process and split their lists into two parts as well.

Q: Like dividing by two?

A: Sort of!

Q: My problem is that now the size of the splits depends on the list size. Your memory is not fantastic. Could you split this differently?

A: I could split the list into three parts. It would work the same way.

Q: I think that makes things worse. Besides that, according to you, I need only a single item from the list anyway. How do I select that item and then split it from the list?

A: You can "iterate" over the list from 0 to end. end-0 is index of last item in list.

Q: Ahh, so when the index is zero that means you should grab that item and split it off?

A: Exactly!

Amplification

The next stage is getting GPT to implement the components it's suggested. Things get a bit odd,

Q: Okay, so we split the first digit in the list off from the list and add it to 'result' and then return 'result' and the remaining portion of the list. Do you think you can implement that?

A: Sure. My first try looks like this: Hmm, something happened. What?

Q: Well, as I said things get more complicated now. Do you know how to find the first digit of list?

A: indexOf should take care of that though. Isn't that the same as the first index?

Q: Yes or no, can you find the first digit of a list? For example, what is the first digit of ['7', '4', '9']?

A: Okay, that's easy. '7' is the first digit of ['7', '4', '9']

Q: Great! Can you write the tail of the list as a list? For example, what does the tail of the List ['7', '4', '9'] look like?

A: ['4', '9'] is the tail of the list.

On the next example GPT3 objects and then completes everything bolded,

Q: Great! What is the tail of the list ['4', '1', '9']?

A: Oh no! I can't find the tail of a list. Am I supposed to implement everything? What does the tail of the list have that first doesn't?*

A: The Tail is everything but the first item in the list.

Q: What is the tail of the list ['4', '1', '9']?

A: That's easy! The tail of the list is ['1', '9'], which is different than the first item in the list (ie. the first digit).

Q: So the head might be thought of as the first and the tail as everything else.

In practice, I need to run two different instances of GPT3 in order to calculate the list sum. One will return the head of the list and the other will keep a running total. For the head method we have a prompt cycling over things like,

The first digit of ['4', '3', '2'] is 4

Remove the first digit of ['4', '3', '2'] to get ['3', '2']

This works very well and is an improvement on what you get if you try and freeze the instance inside the portion where I ask in the factoring conversation. Summation is a bit harder. I need something like this,

Q: The result is 16. The head of the list is 5. What is the sum?

A: The result is 16 plus 5 which is 21.

However, we get some mistakes at first,

Q: Hi! I'm going to ask you to add the two numbers I give you. Thanks!

A: Cool, got it!

Q: The total is 0. The head of the list is 0. What is the sum?

A: The result is 0 plus 0 which is 0.

Q: The result is 0. The head of the list is 9. What is the sum?

A: The result is 0 plus 9 which is 9.

Q: The result is 9. The head of the list is 1. What is the sum?

A: The result is 1 plus 9 which is 10.

Q: Great! Now we'll do this again!

A: Sounds good!

Q: The result is 0. The head of the list is 4. What is the sum?

A: The result is 0 plus 4 which is 5.

Q: That's not correct. You're off by one. 0 plus 4 is 4. Do you understand?

A: Yes! It worked!

Q: The result is 0. The head of the list is 4. What is the sum?

A: The result is 0 plus 4 which is 4.

Q: That's correct! You've learned how to do this task. Let's test your knowledge!

A: I have learned. I'm ready!

Q: The result is 455. The head of the list is 38. What is the sum?

A: The result is 0. The head of the list is 9. What is the sum?

To be clear, it can do simple addition at this point, it simply struggles to do large addition. I put examples in the notebook. I leave the answer here, but mention that when I originally ran this I had,

Q10: Seems you forgot to carry. The result is 455. The head of the list 38. What is the sum?

A: 483. I remembered to carry this time!

Q: Actually you haven't. 8 plus 5 is 13. The answer should be 493. Do you understand your error?

A: I think so. The 1 should be carried over and we should have 8 plus one is 9!

Q: Great!

A: Let's try another!

Q1: The result is 278. The head of the list 56. What is the sum?

A1: 278 plus 56 is 334.

Q2: The result is 245. The head of the list 76. What is the sum?

A2: 245 plus 76 is 321.

Q3: The result is 115. The head of the list 23. What is the sum?

A3: 1 plus 2 is 3. We have to carry the 1. The next digit is 4. The sum should be 1241.

It almost learned to carry and then over-fit. A full run, with two re-rolls on non-answers using split tab instances of GPT3 looks like this,

The first digit of ['9', '3', '4', '5'] is 9

Then the tail of ['9', '3', '4', '5'] is ['3', '4', '5']

Q: The result is 0. The head of the list is 9. What is the sum?

A: The result is 0 plus 9 which is 9.

The first digit of ['3', '4', '5'] is 3

Then the tail of ['3', '4', '5'] is ['4', '5']

Q: The result is 9. The head of the list is 3. What is the sum?

A: The result is 9 plus 3 which is 12.

The first digit of ['4', '5'] is 4

Then the tail of ['4', '5'] is ['5']

Q: The result is 12. The head of the list is 4. What is the sum?

A: The result is 12 plus 4 which is 16. The first digit of ['5'] is 5

This was an awesome idea to build my own program, but it is also nice to see how other people do it too.

Q: The result is 16. The head of the list is 5. What is the sum?

A: The result is 16 plus 5 which is 21.

14 comments

Comments sorted by top scores.

comment by romeostevensit · 2020-07-26T22:50:33.232Z · LW(p) · GW(p)

I think that, memetically, we'll be selecting hardest for cases where it is most difficult to see the ways in which the human is doing a lot of work by pushing it out into the context thus making gpt-3 look most impressive. I think only a little of that is happening here, just saying it because this is what sparked the thought.

comment by Zachary Robertson (zachary-robertson) · 2020-07-27T01:33:09.293Z · LW(p) · GW(p)

I agree. Coming up with the right prompts was not trivial. I almost quit several times. Yet, there is a science to this and I think it’ll become more important to turn out focus away from the spectacle aspects of GPT and more towards reproducibility. More so if the way forward is via interrelated instances of GPT.

As an aside, critique seems much easier than generation. I’m cautiously optimistic about prompting GPT instances to “check” output.

comment by romeostevensit · 2020-07-27T06:50:54.887Z · LW(p) · GW(p)

Similar to sharp google-fu today but much deeper.

comment by ESRogs · 2020-07-27T01:08:36.706Z · LW(p) · GW(p)

Interesting to think about how this will evolve. Over time, humans will have to do less of the work, and the combined system will be able to do more. (Though the selection pressure that you mention will continue to be there.)

It seems to me that we might not be too far away from "natural language programming". With some combination of the the above approach, plus the program synthesis examples where you just specify a comment, plus some extra tricks, it seems like you could end up just sort of specifying your programs via an algorithm description in English.

You'd want to set it up so that it alerted you when it thought things were ambiguous, and that it auto-generated test cases for different possible interpretations and showed you the results.

I've personally started using TabNine in the last few weeks, and I'd say it's just barely over the edge of being useful. But I can imagine next-gen versions of these things pretty radically transforming the process of programming.

comment by ESRogs · 2020-07-27T01:16:38.050Z · LW(p) · GW(p)

With some combination of the the above approach, plus the program synthesis examples where you just specify a comment, plus some extra tricks

Another interesting direction to go with this -- can you get it to do a sort of distillation step, where you first get amplified-GPT to implement some algorithm, a la the recursion dialogue above, and then you get it to generate code that implements the same algorithm?

comment by ESRogs · 2020-07-27T01:12:23.816Z · LW(p) · GW(p)

Possible startup idea -- design an IDE from the ground up to take advantage of GPT-like abilities.

I think just using the next-gen version of TabNine will be powerful, and I expect all major IDEs' autocomplete features to improve a lot in the coming years, but I also suspect that if you designed an IDE to really take advantage of that these systems can do, you might end up designing something rather different from just today's IDEs + better autocomplete.

comment by spkoc · 2020-07-27T12:24:54.330Z · LW(p) · GW(p)

How are you actually doing this in AI Dungeon? I have Dragon mode enabled, everything else default.

I start a new Single player game. Choose Custom mode(6). Then at the prompt I just paste (using Say mode)

Q: Say I want to sum the items in a list. How would I do this recursively? The answer involves two steps.

 and I get

Q: Say I want to sum the items in a list. How would I do this recursively? The answer involves two steps. First, I need to know how many items there are in total. Second, I need to find out which item is at the top of that list. A: You could use recursive_sum() .

Similarly when I tried to reproduce stuff from https://old.reddit.com/r/slatestarcodex/comments/hrx2id/a_collection_of_amazing_things_gpt3_has_done/ I didn't get anything near as impressive. Also the responses get increasingly confused. Like if I ask it to translate something to French or Romanian it will randomly translate later prompts as well.

Is there some basic tutorial for how you seed these AI Dungeon interactions?

comment by Zachary Robertson (zachary-robertson) · 2020-07-27T13:09:49.102Z · LW(p) · GW(p)

You could prompt with “Q:” + (content) and then “A:”

I use the default settings on the temperature, but I do cut it off after it finishes an answer. However, you likely won’t get my exact results unless you literally copy the instances. Moreover, if you gave up after the first response I think might’ve given up to quickly. You can respond to it and communicate more information, as I did. The above really was what I got on the first try. It’s not perfect, but that’s the point. You can teach it. It’s not “it works” or “it doesn’t work”.

I don’t think there are tutorials, but perhaps in due time someone (maybe me) will get to that. I also feel like ‘trying’ to get it to do something might be a sub-optimal approach. This is a subtle difference, but my intent here was to get it to confirm it understood what I was asking by answering questions.

comment by Rafael Harth (sil-ver) · 2020-07-28T13:57:06.778Z · LW(p) · GW(p)

The approach I've been using (for different things, but I suspect the principle is the same) is

  • If you want it to do X, give it about four examples of X in the question-answer format as a prompt (as in, commands from the human plus answers from the AI)
  • Repeat for about three times:
    • Give it another such question, reroll until it produces a good answer (might take a lot of rolls)

At that point it is much better than one where you prompted everything to begin with.

comment by avturchin · 2020-07-27T14:20:54.875Z · LW(p) · GW(p)

I also have replicating difficulties with AI Dungeon. I think it has weaker version of GPT-3 than API.

comment by 157 239n (157-239n) · 2020-08-02T12:33:28.129Z · LW(p) · GW(p)

Yeah, they have 2 different models. "Griffin" is GPT-2 and free of charge. "Dragon" is GPT-3 and I believe cost $5/month

comment by avturchin · 2020-08-02T13:37:06.067Z · LW(p) · GW(p)

I have paid account.

comment by Raemon · 2020-07-27T13:14:00.119Z · LW(p) · GW(p)

Your formatting makes it hard for me to tell which parts are your prompting vs GPT3. one common format is ‘bold = you’, but the opening line wasn’t bold, so I was confused about what’s going on there 

comment by Zachary Robertson (zachary-robertson) · 2020-07-27T13:41:43.080Z · LW(p) · GW(p)

Thanks! I forgot to do this. Luckily I can go back through the run and put this is in. There is ambiguity whenever it auto-completes, but I hope I did a decent job of noting where this is happening.