Analyzing the Problem GPT-3 is Trying to Solve

post by adamShimi · 2020-08-06T21:58:56.163Z · LW · GW · 2 comments

Contents

  The Promptp Search Problem
  The BestPromptp Optimization Problem
  Solving the Task
  Conclusion
None
2 comments

I think that taking a theoretical computer science perspective to GPT-3 might help clarify some part of the debate.

My background is in theoretical computer science, and that's my prefered perspective to make sense of the world. So in this post, I try to rephrase the task we ask GPT-3 to solve in terms of search and optimization problems, and I propose different criteria for solving such a task. I thus consider GPT-3 as a black-box, and only ask about how to decide if it is good at its task or not.

Epistemic status: probably nothing new, but maybe an interesting perspective. This is an attempt to understand the intuitions behind different statements about whether GPT-3 "solves some task".

The Search Problem

Let's fix a language (English for example). Then a very basic description of the task thrown at GPT-3 is that it's given a prompt and it needs to give a follow-up text such that is a good answer to the prompt. For example, might be a question about parenthesis or addition or a story to complete, and is GPT-3's answer. I thus define to be this search problem for a given prompt .

Now, what does it mean for an algorithm to solve ? Multiple possibilities come to mind:

Okay, so GPT-3 is solving iff it is a probabilistic solution to it. Is it enough?

Not exactly. In some cases, we don't want a binary choice between good and bad answers; there might be a scale of answers which are more and more correct. That is to say, it should be an optimization problem.

The Optimization Problem

To go from a search problem to an optimization problem, we need a measure of how good the solution is. Let's say we have a function . The optimization problem can then be defined with our probabilistic criterion in multiple ways:

This new criterion feels much closer to what I want GPT-3 to do when I give it a prompt. But I don't know how to write a measure function that captures what I want (after all, that would be a utility function for the task!). We can do without, by instead comparing two answers with each other. I probably have a couple of answers I am waiting for when writing my prompt; and when I receive an answer, I feel that comparing it to one of my expected answer should be possible.

In consequence, I give the following criterion. Assume the existence of a set of expected answers and of a comparison relation between answers . Then

(As Good As Expected) solves if , with .

I am personally happy with that criterion. And I think it is rather uncontroversial (with maybe the subtlety that some people would prefer a in place of the ).

But we're not done yet. This only constrains the algorithm about what it does for a fixed prompt. A big part of the discussion of GPT-3 turns around the number of prompts for GPT-3 should give an interesting answer. This requires a step back.

Solving the Task

When evaluating GPT-3, what I really care about is its answer for a specific task (like addition or causality). And this task can be encoded through many prompts. Let's say that we have a set of prompts that somehow capture our task (from our perspective). What constrains should an algorithm satisfy to be said to solve the task? We already have a meaningful criterion for answering correctly to a prompt: the "As Good as Expected" one. What's left to decide is for which prompts in should satisfy this criterion.

Contrary to the part about solving a prompt, the answer doesn't look obvious to me here. I also believe that much of the disagreement about GPT-3's competence is based on a disagreement about this part.

Conclusion

Thinking about GPT-3, and the sort of problem we want it to solve, through the lens of theoretical computer science, helped me understand slightly better the debate surrounding it. I hope it can be helpful to you too. And I'm also very curious of any error you might find in this post, whether conceptual or technical.

2 comments

Comments sorted by top scores.

comment by avturchin · 2020-08-07T10:26:38.895Z · LW(p) · GW(p)

What do you think about a possible comparison between GPT-3 and AIXI?

Oversimplification: If AIXI will get a sequence, it will produce a set of hypothesis about the most probable explanations of the sequence and based in them, a set of possible continuations can be generated. GPT-3 misses the hypothesis generating part and directly generate the continuations.

AIXI is known to be a model of truly general intelligence.

Could GPT-3 be used or regarded as an approximation of AIXI?