Invocations: The Other Capabilities Overhang?

post by Robert_AIZI · 2023-04-04T13:38:14.315Z · LW · GW · 4 comments

This is a link post for https://aizi.substack.com/p/invocations-the-other-capabilities

Contents

  Defining Invocations, and Examples
  Invocations Affect Capabilities
  AI Safety Implications
None
4 comments

Abstract: An LLM’s invocation is the non-model code around it that determines when and how the model is called. I illustrate that LLMs are already used under widely varying invocations, and that a model’s capabilities depend in part on its invocation. I discuss several implications for AI safety work including (1) a reminder that the AI is more than just the LLM, (2) discussing the possibility and limitations of “safety by invocation”, (3) suggesting safety evaluations use the most powerful invocations, and (4) acknowledging the possibility of an “invocation overhang”, in which an improvement in invocation leads to sudden capability gains on current models and hardware.

Defining Invocations, and Examples

An LLM’s invocation is the framework of regular code around the model that determines when the model is called, which inputs are passed to the LLM, and what is done with the model’s output. For instance, the invocation in the OpenAI playground might be called “simple recurrence”:

  1. A user provides an input string. The input to the LLM is this string, unchanged except for tokenization.
  2. Run the LLM on this input, producing logits.
  3. Predict the next token as some probabilistic function of the logits (ex: at temperature 0 the next token prediction is the argmax of the logits).
  4. Append this token to the end of the user’s input string.
  5. Repeat steps 2-4 with the new string until you get an [END_OF_STRING] token or reach the max token limit.
  6. Display the result as plain text.

Note how many steps in “using the LLM” do not involve the actual model! Here are some ways this invocation can be varied:

For closed-domain hallucinations, we are able to use GPT-4 itself to generate synthetic data. Specifically, we design a multi-step process to generate comparison data:

  1. Pass a prompt through GPT-4 model and get a response
  2. Pass prompt + response through GPT-4 with an instruction to list all hallucinations
    1. If no hallucinations are found, continue
  3. Pass prompt + response + hallucinations through GPT-4 with an instruction to rewrite the response without hallucinations
  4. Pass prompt + new response through GPT-4 with an instruction to list all hallucinations
    1. If none are found, keep (original response, new response) comparison pair
    2. Otherwise, repeat up to 5x

ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself.

Invocations Affect Capabilities

In this section I want to establish that invocations can improve capabilities. First, our prior from analogy to humans should support this claim - when solving e.g. math problems, access to scratch paper and a calculator makes a difference, as do “habits” such as checking your work rather than going with your first guess.

Furthermore, here are three examples of invocations affecting capabilities in the literature:

  1. The example of GPT-4 recognizing and correcting its own hallucinations (above) seems to be an “in the wild” admission that a more complicated invocation can improve a capability (in this case, reducing hallucinations).
  2. Chain-of-Thought prompting “improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks”.
  3. In Reflexion, an LLM agent can “reflect” based on a heuristic, allowing the agent to add to its working memory for the next run through an environment. This improved its performance in “decision-making tasks in AlfWorld environments” and “knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments”.

AI Safety Implications

  1. ^

    From the system card: “Closed domain hallucinations refer to instances in which the model is instructed to use only information provided in a given context, but then makes up extra information that was not in that context.”

4 comments

Comments sorted by top scores.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-04T22:31:01.701Z · LW(p) · GW(p)

What you call invocations, I called 'bureaucracies' back in the day [LW · GW], and before that I believe they were called amplification methods. It's also been called scaffolding and language model programs and factored cognition. The kids these days are calling it langchain and ReAct and stuff like that.

I think I agree with your claims. ARC agrees also, I suspect; when I raised these concerns with them last year they said their eval had been designed with this sort of thing in mind and explained how.

Replies from: Robert_AIZI
comment by Robert_AIZI · 2023-04-05T13:54:14.013Z · LW(p) · GW(p)

I'm not surprised this idea was already in the water! I'm glad to hear ARC is already trying to design around this.

comment by Max H (Maxc) · 2023-04-04T14:21:36.001Z · LW(p) · GW(p)

Yes, systems comprised of chains of calls to an LLM can be much more capable than a few individual, human-invoked completions. The effort needed to build such systems is usually tiny compared to the effort and expense needed to train the underlying foundation models.

Role architectures [? · GW] provides one way of thinking about and aligning such systems.

My post on steering systems [LW · GW] also has some potentially relevant ways for thinking about these systems.

comment by jacopo · 2023-04-04T20:06:47.040Z · LW(p) · GW(p)

Well stated. I would go even further: the only short timeline scenario I can immagine involves some unholy combination of recursive LLM calls, hardcoded functions or non-LLM ML stuff, and API calls. There would probably be space to align such a thing. (sort of. If we start thinking about it in advance.)