Will early transformative AIs primarily use text? [Manifold question]

post by Fabien Roger (Fabien) · 2023-10-02T15:05:07.279Z · LW · GW · 0 comments

Contents

  Operationalization
    AI power criteria
    “Primarily using text”
  How the answer to this question affects research prioritization
None
No comments

Will early transformative AIs primarily use text to store intermediate thoughts and memories?

I think the answer to this question matters for alignment research prioritization, and getting some common knowledge about what are the most commonly held views about this question would be helpful.

You can make bets on this question here: https://manifold.markets/FabienRoger/will-early-transformative-ais-prima

To make this problematic more concrete, here are two questions which will probably resolve sooner:

Operationalization

AI power criteria

By early transformative AIs I mean the first publicly announced AI system which can speed up AI capabilities progress by 30x: if cognitive labor done at major AI labs (including hardware manufacturers) was only done by humans assisted with 2020-level-AI, and resources were rearranged to maximize productivity (within AI labs, allowing new hires but within the same financial constraints), then it would take 300 days for AI labs to produce “stronger AIs” it actually created in 10 days, as judged by me.

The operationalization below are inspired by this Metaculus question, and aim at describing AIs weaker than early TAIs.

By an AI able to pass a 2-hour adversarial Turing test, I mean an AI system able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are AI experts (who understand AI weaknesses) instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of me. Several tests might be required to indicate reliability.

By an AI achieving 90% accuracy at hard APPS problems, I mean an AI system interacting with APPS like a human would (with access to any tool which doesn’t require access to the internet or to a human) which is able to submit a correct program on the first submission attempt in 90% of interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al.

“Primarily using text”

By primarily using text, I mean the conjunction of the following:

I will use speculations about AIs architectures to resolve this question. For example, GPT-4 generating 10k tokens would qualify as primarily using text.

(For reference, if it takes 1ms for a human neuron to process the incoming signal and fire, then the human brain can do 100k serial operations in 1’40’’, and 1M serial operations in 16’40’’.)

How the answer to this question affects research prioritization

  1. Some research directions are text-specific, such as research on steganography and chain-of-thought.
  2. Some plans for safe AIs are much more realistic when AIs primarily use text. For example, plans relying on monitoring AI "thoughts are much more likely to succeed if AIs primarily use text.
  3. Some research directions are more relevant if done on the same kind of AIs as the ones used for the first AIs which really matter, such as most research on the structure of activations and research on inductive biases. If the early TAIs are not primarily using text, such research should make sure to get results which transfer across modalities and architectures, or their conclusions might not be relevant.

You can find my takes about the implications, as well as arguments for and against future AIs primarily using text in this related post of mine: The Translucent Thoughts Hypotheses and Their Implications [LW · GW], which is about similar (but narrower) claims.

0 comments

Comments sorted by top scores.