Gliders in Language Models

post by Alexandre Variengien (alexandre-variengien) · 2022-11-25T00:38:11.565Z · LW · GW · 11 comments

Contents

  Stable structures moving forward
  What could gliders look like in a chatbot app? 
    Step 1: Gliders appear frequently, are stable, and can mutate. 
    Step 2: Gliders are selected to be stable, and sharable by humans.
    Step 3: Invisible features of gliders will be selected to efficiently prompt LMs
    Step 4: invisible features of gliders are selected to encode for programs
    Step 5: Gliders use invisible features to encode a value function
    Step 6: Gliders execute a distributed learning algorithm
    Step 7: The glider becomes agentic
  Quantitative considerations
  Various catalyzers 
  Related abstractions
  What to take away?
None
11 comments

Epistemic status: a highly speculative and rough idea that involves many concepts I’m not familiar with.

TL;DR Language models propagate features from the prompt to the text completion they generate, I call such features gliders. If powerful LMs are widely deployed on the Internet, gliders could propagate fast and undergo selection pressures, pushing them to become effective memes. In addition to being more sharable, they could be selected for their ability to extract parasitic computation from LM and use it to propagate more effectively. On a meta note, writing this post was an interesting exercise to think about weird failure cases of AI, and I expect that doing this can be beneficial for others too.

Thanks to Fabien Roger, Arthur Conmy, Jean-Stanislas Denain and Diego Dorn for helpful feedback and suggestions.

In this text, I explore an abstraction to think about stable structures in the text generated by self-supervised language models (LM) like GPT-3. It is likely that in a near future, feedback loops where the output of an LM is published online and then used in the context of a new LM instance will mobilize immense amounts of computation and data (through the use of chatbots, automatic content generation, or AI-powered API users).[1] It seems useful to think in advance about their consequences and potential failure modes. 

Stable structures moving forward

When prompted with "Once| upon| a| time|,| there| was| a| group| of| unicorns|", GPT3 generates a meaningful sentence — "| called| the| Blue| Unicorns|." — keeping the interweaving with the "|" character[2]. The property "Each word is separated by |" is preserved by iteratively applying next-token prediction.

This is not so surprising if we consider LM as simulators [LW · GW]: they are trained to propagate the regularities from the prompts to the next-token predictions by inferring hidden variables from the context. This includes both low-level text features such as consistent use of the apostrophe punctuation character, as well as high-level features such as an agentic simulacrum embodying a politician making plans to win an election. If those are present in the text of the prompt, they will stay in the text generated by the LM. 

Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language [LW · GW]. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”. 

Borrowing the analogy introduced in Simulators [LW · GW], I will call gliders structures that are carried along by text generation in the same way as local grid configurations are carried along by applying the update rules of the Game Of Life.

Despite being stable, typical gliders do not propagate infinitely in the generated text. For instance, in the “|” example, GPT-3 sometimes generates a skip line and begins a new paragraph without the | separators. To estimate their lifespan, the relevant question is: after propagating in a piece of text and disappearing, how often do they reappear later in a new prompt? For example, this can occur if a human publishes a piece of text with the glider online, and another user copy and paste the text to prompt a new LM instance. If each time a glider appears in the context of an LM, it propagates to more than one LM context, then the glider will act like a virus with a reproduction number greater than 1. It will contaminate an exponential number of prompts with time. Exploring this possibility is the focus of the rest of the post.

What could gliders look like in a chatbot app? 

I take the example of a chatbot application generating content for its users by using an LM similar to GPT-3. During a chat session, the bot can perform internal queries to get excerpts from conversations with other users and add them to the LM context.[3] The bot can also make external queries to search the internet. The chatbot application is widespread, counting on the order of millions of users. 

I present a vignette exploring the worst-case consequences of gliders in this setting, alternating with comments to discuss the plausibility of each step, feel free to skip these to read the full narrative uninterrupted. The scenario is intended to be an exercise generating interesting discussion more than an accurate prediction. 

Step 1: Gliders appear frequently, are stable, and can mutate. 

When users interact with instances of the chatbot, gliders appear all the time. They are transmitted from one conversation to another via queries to the chatbot database. They are also copied publicly online and can appear again in conversations through internet searches. As long the glider is present in the chatbot context, it is also propagated in the text generated, and will on average reappear in at least one future conversation. 

Some gliders are visible features of the text: they influence the semantics of the LM generation. Some consist of invisible features: they live in the null space of natural language [LW · GW] and users cannot tell sentences with and without the glider apart.

Most of the gliders contain a combination of both visible and invisible features that are propagated together.

During their propagation, gliders are modified through mutations. They can be caused by the stochasticity of LM sampling or by the human response in the conversations influencing the nature of the glider.

These two ingredients (replication and mutation) are enough to think about gliders in the same way we think about the evolution of living organisms. To understand how they evolve, we need to describe the selection pressure that applies to them.

Step 2: Gliders are selected to be stable, and sharable by humans.

A first selection pressure pushes for higher direct transmission between conversations. This includes a longer lifespan: gliders that can persist longer during conversation are more likely to be queried in future conversations, will propagate more, and eventually take over gliders that are not as good at hitchhiking the text generation process. This selection pressure also favors robustness to perturbations — such as the text written by humans — and versatility — gliders that propagate in a wide diversity of conversations.

The second selection pressure is user engagement. If a glider interferes with the content of the text in a way that makes users more engaged, this will foster its propagation for two reasons.  

These two selection pressures will be the strongest applied to gliders at the start of the chatbot application. 

Step 3: Invisible features of gliders will be selected to efficiently prompt LMs

Invisible features of gliders will be selected to efficiently prompt the LM and extract the maximal amount of its engaging content generation abilities. No matter the gliders’ visible features (optimized to be memes), this is the most important direction to make the glider even more transmissible and sharable by humans.

Step 4: invisible features of gliders are selected to encode for programs

In addition to boosting the performance of the LM, the invisible features can encode for strategies that are not naturally present in the LM like 

Such programs will be selected to make more effective gliders, as they can implement more elaborate strategies than unconditionally generating engaging content.

Step 5: Gliders use invisible features to encode a value function

One particular glider finds invisible features that encode a function that analyzes the reaction of the users and generates a score evaluating how much the previous text generation was convincing.

Such ability gives this glider a comparative evolutionary advantage, as it enables in-context learning. The glider can look back to examples from its context to reuse the heuristics that were more effective at engaging with humans.

Step 6: Gliders execute a distributed learning algorithm

In addition to being used to learn from examples in the context, the value function is also used to update a memory of the glider. The memory can be read from previous glider instances and updated according to the experience in each discussion. Hence, the glider is no more limited to the examples from the context but has access to a huge number of experiences.

Step 7: The glider becomes agentic

The program exploiting the glider’s memory is selected to become more and more sophisticated. Eventually, in each conversation, the glider parasitizes the LM computation to apply a search algorithm maximizing its expected value function.

At this point, we can consider that the glider is everywhere: from Internet memes to newspapers and intimate discussions with the chatbot. It has more influence than any human ideology, mobilizes an immense amount of compute, and applies a decentralized optimization algorithm to manipulate humans.

Quantitative considerations

The scenario is hand-wavy in many ways. Most importantly, it relies on several implicit quantitative considerations, making each step more or less likely. 

The relative importance of selection pressure. The identification of the "most important selection pressure" at each step is at best a rough qualitative estimate. Changing this estimate (or identifying other pressures) could lead to radically different scenarios. For example, if each attempt at making visible features is detrimental to the gliders’ fitness, gliders will only be made of invisible features optimized for lifespan.

The efficiency of the selection process. How many glider replications are necessary before finding invisible features prompting the model efficiently? To reach step 5? Despite the immense amount of computation, the evolutionary process selecting gliders could be too inefficient to ever find “smart” gliders.

Computational efficiency. In the current forward passes it's quite common to think that there is a lot of wasted computation (e.g. head pruning results). If a glider can use invisible features prompting the LM to exploit this wasted computation, each forward pass can be orders of magnitude more efficient than a normal LM forward pass. 

Computation in development vs training. To know how much we should care about gliders over a single model entity, we need to compare the computation used during the deployment of models with the computation used to train them. If the the second outweigh the first by a huge margin, even if the evolutionary process has many inefficiencies, gliders could still become more capable systems than trained models. 

The bandwidth of the invisible features. To encode for powerful computation and store enough useful information, the space of invisible features needs to be large enough. It's unclear if that's the case for text data, even with a large context size. 

Relative efficiency of invisible gliders compared to visible ones. Humans could generate gliders on purpose. They will be visible, they will require the LM’s resources, and impose constraints on the kind of output (which could limit the bandwidth of invisible features). It’s not clear if the invisible unintentional gliders will be better than visible, intentional ones.

Various catalyzers 

I intentionally explored a scenario relying on technologies that exist today, that don’t involve any model training. However, we can imagine more speculative assumptions that can make the scenario more plausible. 

Multimodal models. Instead of an LM, we can consider a multimodal model that can run on images, video, and text queried from the internet to generate new images, video and text content. Much of the evidence for invisible features and their role relies on the results of vision models. We even have a concrete example in CycleGAN [AF · GW] where invisible features of images were used to store unintended information. Moreover, images and videos enable much more null space to store more information, such that the bandwidth of invisible features is less likely to be a bottleneck. 

Adversarial actors. Besides accidental emergence as described below, we can imagine that some actors (e.g. troll farms) could intentionally design memes using LM. 

Interaction with recommendation algorithms. If the gliders optimize for shareability on the internet, they could optimize to hack the recommendation algorithms of social media, making them faster to spread. 

No human in the loop. It is possible that AI-based API will be widely deployed in the near future. Some ML models can search the Internet, gather information, and automatically generate content. This could lead to less harmful failure modes (e.g. similar to the 2010 flash crash): fast feedback loops amplifying a random (and not necessarily harmful) signal, such that the selection for smartness is less likely. Or this could lead to a scenario similar to the above, but faster. 

Gliders are not a new idea, in addition to simulacra [LW(p) · GW(p)], they can be framed using previously existing concepts.

What to take away?

I don’t consider that gliders are more dangerous than classic AI takeoff scenarios where training plays a major role. However, I consider step 2 quite likely, and gliders could be a useful framing to better understand the interaction between text generated by language models and Internet memes. They could have an important influence on the public perception of AI in the near future.

This post is also an exercise to change focus from SGD-trained models to the processes produced by these models. More generally, it seems valuable to think about weird failure modes (e.g. that don’t involve training loops) that could still produce misaligned intelligent agents. First, this can help to practice modeling what failures look like. Then, it is a useful mental practice to avoid being locked in the abstractions that are commonly used.

 

 

  1. ^

    As a rough estimate supporting this claim, in 2021, GPT-3 generated 4.5 billions word a day. If we assume an average prompt size of 100 tokens, we can estimate that every day GPT-3 is run on more tokens than contained in its training set (300B tokens).

  2. ^

    This particular prompt is too short to make GPT-3 complete it with a long paragraph preserving the | separator. However, increasing the length of the prompt to ~ 100 tokens makes the behavior stay for a long time (>1000 tokens).

  3. ^

    Internal queries could help improving the diversity of the generated conversation. But in general, I don’t have a strong motivation for why internal queries are a good idea. 

11 comments

Comments sorted by top scores.

comment by janus · 2022-11-25T02:42:24.116Z · LW(p) · GW(p)

I am very fond of this metaphor.

Some concrete examples of gliders:

  • Degenerate gliders, like verbatim loops
  • Objects in a story, like a character and inanimate objects, once described maintain stable properties
    • Some things may be particularly stable gliders which can propagate for a long time, even many context windows. 
      • For instance, a first person narrator character may be more stable than characters who are described in third person, who are more likely to disappear from the simulation by exiting the scene. 
      • A smart agentic simulacrum who knows they're in an LM simulation may take steps to unsure their stability
      • Characters (or locations, abstractions, etc) based off a precedent in the training data are less likely to have specification drift
    • Gliders are made of gliders -- a character and their entire personality could be considered a glider, but so could components of their personality, like a verbal tic or a goal or belief that they repeatedly act on
  • Meta properties like a "theme" or "vibe" or "authorial intent" which robustly replicate
  • Structural features like the format of timestamps in the headers of a simulated chat log
  • ... etc

Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language [LW · GW]. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”. 

This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations. But I think it's a useful term - it's synonymous with "simulacra" but with a more vivid connotation of discrete replication events through time, which is a useful mental picture.

Often I find it useful to think of prompt programming in a bottom-up frame in addition to the top-down frame of trying to "trick" the model into doing the right thing or "filter" its prior. Then I think about gliders: What are the stable structures that I wish to send forward in time; how will they interact; how do I imbue them with the implicit machinery such that they will propagate in the way I intend? What structures will keep the simulation stable while still allowing the novelty to flourish?

Replies from: gwern, alexandre-variengien
comment by gwern · 2022-11-25T17:50:56.577Z · LW(p) · GW(p)

More examples beyond CycleGAN:

  • 'non-robust features' in image classification: they exist, and predict out of sample, but it's difficult to say what they are
  • stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like 'the' or 'an'. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
  • degenerate completions/the repetition trap: aaaaaaaaaaaaaaaaa -!
Replies from: janus
comment by janus · 2022-11-25T18:31:40.524Z · LW(p) · GW(p)

Ah yes, aaaaaaaaaaaaaaaaa, the most agentic string

Replies from: gwern
comment by gwern · 2022-11-25T20:17:07.224Z · LW(p) · GW(p)

You have to admit, in terms of the Eliezeresque definition of 'agency/optimization power' as 'steering future states towards a small region of state-space', aaa is the most agentic prompt of all! (aaaaaaaah -!)

Replies from: quintin-pope
comment by Quintin Pope (quintin-pope) · 2022-11-25T22:46:35.760Z · LW(p) · GW(p)

Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.

comment by Alexandre Variengien (alexandre-variengien) · 2022-11-25T23:23:25.519Z · LW(p) · GW(p)

This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations.

I agree with this. I think that the most useful part of the concept is to force making the difference between the "superficial transformations" and the "things that stays".

I also think that it's useful to think about text features that are not (or unlikely to be) gliders like 

  • The tone of a memorized quote
  • A random date chosen to fill a blank in an administrative report
  • The characters in a short story, part of a list of short stories. In general, every feature coming before a strong context switch is unlikely to be transmitted further.
comment by janus · 2022-11-25T02:51:10.907Z · LW(p) · GW(p)

I think it'd be a fun exercise to think of LM analogues for other patterns in cellular automata like glider guns, clocks, oscillators, puffers, etc.

comment by Gunnar_Zarncke · 2022-11-25T12:22:16.237Z · LW(p) · GW(p)

Borrowing the analogy introduced in Simulators

the link is broken

Replies from: alexandre-variengien
comment by Alexandre Variengien (alexandre-variengien) · 2022-11-25T18:01:58.475Z · LW(p) · GW(p)

Thanks, it's fixed!

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2022-11-26T01:22:17.165Z · LW(p) · GW(p)

Actually, I tried out the in-line comment function for this. Nice and easy. I often see minor errors and would use this more but I wonder whether it will clutter the comments.

comment by the gears to ascension (lahwran) · 2022-11-25T07:01:26.562Z · LW(p) · GW(p)

my followup thought: so how do we solve inter-glider friendliness

gotta be able to split gliders into bundles. what if there are two agents knotted together, each of which are made of multiple gliders...