Gliders in Language Models
post by Alexandre Variengien (alexandre-variengien) · 2022-11-25T00:38:11.565Z · LW · GW · 11 commentsContents
Stable structures moving forward What could gliders look like in a chatbot app? Step 1: Gliders appear frequently, are stable, and can mutate. Step 2: Gliders are selected to be stable, and sharable by humans. Step 3: Invisible features of gliders will be selected to efficiently prompt LMs Step 4: invisible features of gliders are selected to encode for programs Step 5: Gliders use invisible features to encode a value function Step 6: Gliders execute a distributed learning algorithm Step 7: The glider becomes agentic Quantitative considerations Various catalyzers Related abstractions What to take away? None 11 comments
Epistemic status: a highly speculative and rough idea that involves many concepts I’m not familiar with.
TL;DR Language models propagate features from the prompt to the text completion they generate, I call such features gliders. If powerful LMs are widely deployed on the Internet, gliders could propagate fast and undergo selection pressures, pushing them to become effective memes. In addition to being more sharable, they could be selected for their ability to extract parasitic computation from LM and use it to propagate more effectively. On a meta note, writing this post was an interesting exercise to think about weird failure cases of AI, and I expect that doing this can be beneficial for others too.
Thanks to Fabien Roger, Arthur Conmy, Jean-Stanislas Denain and Diego Dorn for helpful feedback and suggestions.
In this text, I explore an abstraction to think about stable structures in the text generated by self-supervised language models (LM) like GPT-3. It is likely that in a near future, feedback loops where the output of an LM is published online and then used in the context of a new LM instance will mobilize immense amounts of computation and data (through the use of chatbots, automatic content generation, or AI-powered API users).[1] It seems useful to think in advance about their consequences and potential failure modes.
Stable structures moving forward
When prompted with "Once| upon| a| time|,| there| was| a| group| of| unicorns|", GPT3 generates a meaningful sentence — "| called| the| Blue| Unicorns|." — keeping the interweaving with the "|" character[2]. The property "Each word is separated by |" is preserved by iteratively applying next-token prediction.
This is not so surprising if we consider LM as simulators [LW · GW]: they are trained to propagate the regularities from the prompts to the next-token predictions by inferring hidden variables from the context. This includes both low-level text features such as consistent use of the apostrophe punctuation character, as well as high-level features such as an agentic simulacrum embodying a politician making plans to win an election. If those are present in the text of the prompt, they will stay in the text generated by the LM.
Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language [LW · GW]. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”.
Borrowing the analogy introduced in Simulators [LW · GW], I will call gliders structures that are carried along by text generation in the same way as local grid configurations are carried along by applying the update rules of the Game Of Life.
Despite being stable, typical gliders do not propagate infinitely in the generated text. For instance, in the “|” example, GPT-3 sometimes generates a skip line and begins a new paragraph without the | separators. To estimate their lifespan, the relevant question is: after propagating in a piece of text and disappearing, how often do they reappear later in a new prompt? For example, this can occur if a human publishes a piece of text with the glider online, and another user copy and paste the text to prompt a new LM instance. If each time a glider appears in the context of an LM, it propagates to more than one LM context, then the glider will act like a virus with a reproduction number greater than 1. It will contaminate an exponential number of prompts with time. Exploring this possibility is the focus of the rest of the post.
What could gliders look like in a chatbot app?
I take the example of a chatbot application generating content for its users by using an LM similar to GPT-3. During a chat session, the bot can perform internal queries to get excerpts from conversations with other users and add them to the LM context.[3] The bot can also make external queries to search the internet. The chatbot application is widespread, counting on the order of millions of users.
I present a vignette exploring the worst-case consequences of gliders in this setting, alternating with comments to discuss the plausibility of each step, feel free to skip these to read the full narrative uninterrupted. The scenario is intended to be an exercise generating interesting discussion more than an accurate prediction.
Step 1: Gliders appear frequently, are stable, and can mutate.
When users interact with instances of the chatbot, gliders appear all the time. They are transmitted from one conversation to another via queries to the chatbot database. They are also copied publicly online and can appear again in conversations through internet searches. As long the glider is present in the chatbot context, it is also propagated in the text generated, and will on average reappear in at least one future conversation.
Some gliders are visible features of the text: they influence the semantics of the LM generation. Some consist of invisible features: they live in the null space of natural language [LW · GW] and users cannot tell sentences with and without the glider apart.
- Comment: One source of evidence for gliders existing in the null space of natural language comes from the adversarial example literature describing non-robust features in images (the “null space of human image recognition”). In an image classification task, these features are correlated with the correct label (such that vision models rely heavily on them), but invisible to humans. It is plausible that such non-robust features also exist in natural language: features present in the human text, invisible, and nonetheless useful to predict the next token.
- To be gliders, non-robust features must be self-predictive: when they appear in a text, future text likely contains them. If such features exist, the model will be trained to generate them. Because they are self-predictive, these features can ride the text-generation process. Thus, they would be naturally suited to host gliders in the null space of natural language [LW · GW].
Most of the gliders contain a combination of both visible and invisible features that are propagated together.
- Comment: As in the case of images, robust (visible) and non-robust features (invisible) are correlated: they both are predictors of the class of the image. This could also apply to the robust/non-robust feature of language. A hypothetical example could be: when discussing cats, people use single quotation marks more than double quotation marks. A glider could then be composed of the visible feature “cat discussion” and the invisible feature “uses single quotation marks”.
During their propagation, gliders are modified through mutations. They can be caused by the stochasticity of LM sampling or by the human response in the conversations influencing the nature of the glider.
These two ingredients (replication and mutation) are enough to think about gliders in the same way we think about the evolution of living organisms. To understand how they evolve, we need to describe the selection pressure that applies to them.
Step 2: Gliders are selected to be stable, and sharable by humans.
A first selection pressure pushes for higher direct transmission between conversations. This includes a longer lifespan: gliders that can persist longer during conversation are more likely to be queried in future conversations, will propagate more, and eventually take over gliders that are not as good at hitchhiking the text generation process. This selection pressure also favors robustness to perturbations — such as the text written by humans — and versatility — gliders that propagate in a wide diversity of conversations.
The second selection pressure is user engagement. If a glider interferes with the content of the text in a way that makes users more engaged, this will foster its propagation for two reasons.
- Higher engagement means longer and more frequent discussions. For instance, a glider can be selected to create an emotional attachment with the user. This increases the likelihood that text with the glider is sampled in the internal queries because the proportion of conversations with the glider is higher.
- Users will be more likely to share part of the conversation on the Internet. The glider can then be propagated online by humans as a meme, without the need for LMs. As the chatbot can use the results of internet searches in the LM context, becoming an efficient meme means that the glider appears in the context of more LM instances, propagating further.
- Comment: We already have examples of LM creating highly effective memes, for instance, the controversy of Lambda being sentient. The transcript of the conversation was widely shared and was certainly used to prompt other language models, like GPT3, by copying and pasting the transcript.
These two selection pressures will be the strongest applied to gliders at the start of the chatbot application.
Step 3: Invisible features of gliders will be selected to efficiently prompt LMs
Invisible features of gliders will be selected to efficiently prompt the LM and extract the maximal amount of its engaging content generation abilities. No matter the gliders’ visible features (optimized to be memes), this is the most important direction to make the glider even more transmissible and sharable by humans.
- Comment: An example in today’s internet culture are meme templates. They can be seen as the result of a selection for efficient prompting of humans to create engaging content. For example, to generate the funniest thoughts.
- Moreover, there are likely low-hanging fruits in the way to prompt LMs (e.g. one waited two years before discovering chain of thought prompting). It is also likely that non-human prompting will have superior abilities than humans’, as it begins to be explored in a recent paper. The prompt automatically found is far from what humans would design, e.g. nonsensical demonstration can make the model perform better. These are evidence of LM being conditioned on alien features, some of them could be invisible features.
- One caveat is that the visible and invisible features will no more be correlated like in the training set. Thus, the LM needs to be flexible enough to allow this out-of-distribution behavior.
Step 4: invisible features of gliders are selected to encode for programs
In addition to boosting the performance of the LM, the invisible features can encode for strategies that are not naturally present in the LM like
- 1) Ask for the user to list all their friends and their personalities. 2) For each friend, make the user send them a personalized message convincing them to use the chatbot.
- Estimate the influence of the person, if high influence then argues aggressively, else stay discrete.
Such programs will be selected to make more effective gliders, as they can implement more elaborate strategies than unconditionally generating engaging content.
- Comment: In the case of images, we have examples of adversarial reprogramming of image classifiers, where we can design a "program" in the form of an adversarial perturbation that modifies the task performed by the classifier. For instance, an ImageNet classifier can be adversarially reprogrammed to classify MNIST images after specifying a mapping from ImageNet classes to digits. This type of reprogramming is likely to transfer to LM because prompts can be seen as programs [LW(p) · GW(p)].
Step 5: Gliders use invisible features to encode a value function
One particular glider finds invisible features that encode a function that analyzes the reaction of the users and generates a score evaluating how much the previous text generation was convincing.
- Comment: This type of ability seems to be easy to retrieve from an LM, as "convincingness" seems to be a useful feature to track when generating text. So this particular program can tap into abilities already present in LM.
Such ability gives this glider a comparative evolutionary advantage, as it enables in-context learning. The glider can look back to examples from its context to reuse the heuristics that were more effective at engaging with humans.
Step 6: Gliders execute a distributed learning algorithm
In addition to being used to learn from examples in the context, the value function is also used to update a memory of the glider. The memory can be read from previous glider instances and updated according to the experience in each discussion. Hence, the glider is no more limited to the examples from the context but has access to a huge number of experiences.
Step 7: The glider becomes agentic
The program exploiting the glider’s memory is selected to become more and more sophisticated. Eventually, in each conversation, the glider parasitizes the LM computation to apply a search algorithm maximizing its expected value function.
At this point, we can consider that the glider is everywhere: from Internet memes to newspapers and intimate discussions with the chatbot. It has more influence than any human ideology, mobilizes an immense amount of compute, and applies a decentralized optimization algorithm to manipulate humans.
Quantitative considerations
The scenario is hand-wavy in many ways. Most importantly, it relies on several implicit quantitative considerations, making each step more or less likely.
The relative importance of selection pressure. The identification of the "most important selection pressure" at each step is at best a rough qualitative estimate. Changing this estimate (or identifying other pressures) could lead to radically different scenarios. For example, if each attempt at making visible features is detrimental to the gliders’ fitness, gliders will only be made of invisible features optimized for lifespan.
The efficiency of the selection process. How many glider replications are necessary before finding invisible features prompting the model efficiently? To reach step 5? Despite the immense amount of computation, the evolutionary process selecting gliders could be too inefficient to ever find “smart” gliders.
Computational efficiency. In the current forward passes it's quite common to think that there is a lot of wasted computation (e.g. head pruning results). If a glider can use invisible features prompting the LM to exploit this wasted computation, each forward pass can be orders of magnitude more efficient than a normal LM forward pass.
Computation in development vs training. To know how much we should care about gliders over a single model entity, we need to compare the computation used during the deployment of models with the computation used to train them. If the the second outweigh the first by a huge margin, even if the evolutionary process has many inefficiencies, gliders could still become more capable systems than trained models.
The bandwidth of the invisible features. To encode for powerful computation and store enough useful information, the space of invisible features needs to be large enough. It's unclear if that's the case for text data, even with a large context size.
Relative efficiency of invisible gliders compared to visible ones. Humans could generate gliders on purpose. They will be visible, they will require the LM’s resources, and impose constraints on the kind of output (which could limit the bandwidth of invisible features). It’s not clear if the invisible unintentional gliders will be better than visible, intentional ones.
Various catalyzers
I intentionally explored a scenario relying on technologies that exist today, that don’t involve any model training. However, we can imagine more speculative assumptions that can make the scenario more plausible.
Multimodal models. Instead of an LM, we can consider a multimodal model that can run on images, video, and text queried from the internet to generate new images, video and text content. Much of the evidence for invisible features and their role relies on the results of vision models. We even have a concrete example in CycleGAN [AF · GW] where invisible features of images were used to store unintended information. Moreover, images and videos enable much more null space to store more information, such that the bandwidth of invisible features is less likely to be a bottleneck.
Adversarial actors. Besides accidental emergence as described below, we can imagine that some actors (e.g. troll farms) could intentionally design memes using LM.
Interaction with recommendation algorithms. If the gliders optimize for shareability on the internet, they could optimize to hack the recommendation algorithms of social media, making them faster to spread.
No human in the loop. It is possible that AI-based API will be widely deployed in the near future. Some ML models can search the Internet, gather information, and automatically generate content. This could lead to less harmful failure modes (e.g. similar to the 2010 flash crash): fast feedback loops amplifying a random (and not necessarily harmful) signal, such that the selection for smartness is less likely. Or this could lead to a scenario similar to the above, but faster.
Related abstractions
Gliders are not a new idea, in addition to simulacra [LW(p) · GW(p)], they can be framed using previously existing concepts.
- They are examples of Robust Agent-Agnostic Processes [LW · GW].
- Powerful gliders are for language models what hypercreatures [LW · GW] are for human brains.
What to take away?
I don’t consider that gliders are more dangerous than classic AI takeoff scenarios where training plays a major role. However, I consider step 2 quite likely, and gliders could be a useful framing to better understand the interaction between text generated by language models and Internet memes. They could have an important influence on the public perception of AI in the near future.
This post is also an exercise to change focus from SGD-trained models to the processes produced by these models. More generally, it seems valuable to think about weird failure modes (e.g. that don’t involve training loops) that could still produce misaligned intelligent agents. First, this can help to practice modeling what failures look like. Then, it is a useful mental practice to avoid being locked in the abstractions that are commonly used.
- ^
As a rough estimate supporting this claim, in 2021, GPT-3 generated 4.5 billions word a day. If we assume an average prompt size of 100 tokens, we can estimate that every day GPT-3 is run on more tokens than contained in its training set (300B tokens).
- ^
This particular prompt is too short to make GPT-3 complete it with a long paragraph preserving the | separator. However, increasing the length of the prompt to ~ 100 tokens makes the behavior stay for a long time (>1000 tokens).
- ^
Internal queries could help improving the diversity of the generated conversation. But in general, I don’t have a strong motivation for why internal queries are a good idea.
11 comments
Comments sorted by top scores.
comment by janus · 2022-11-25T02:42:24.116Z · LW(p) · GW(p)
I am very fond of this metaphor.
Some concrete examples of gliders:
- Degenerate gliders, like verbatim loops
- Objects in a story, like a character and inanimate objects, once described maintain stable properties
- Some things may be particularly stable gliders which can propagate for a long time, even many context windows.
- For instance, a first person narrator character may be more stable than characters who are described in third person, who are more likely to disappear from the simulation by exiting the scene.
- A smart agentic simulacrum who knows they're in an LM simulation may take steps to unsure their stability
- Characters (or locations, abstractions, etc) based off a precedent in the training data are less likely to have specification drift
- Gliders are made of gliders -- a character and their entire personality could be considered a glider, but so could components of their personality, like a verbal tic or a goal or belief that they repeatedly act on
- Some things may be particularly stable gliders which can propagate for a long time, even many context windows.
- Meta properties like a "theme" or "vibe" or "authorial intent" which robustly replicate
- Structural features like the format of timestamps in the headers of a simulated chat log
- ... etc
Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language [LW · GW]. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”.
This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations. But I think it's a useful term - it's synonymous with "simulacra" but with a more vivid connotation of discrete replication events through time, which is a useful mental picture.
Often I find it useful to think of prompt programming in a bottom-up frame in addition to the top-down frame of trying to "trick" the model into doing the right thing or "filter" its prior. Then I think about gliders: What are the stable structures that I wish to send forward in time; how will they interact; how do I imbue them with the implicit machinery such that they will propagate in the way I intend? What structures will keep the simulation stable while still allowing the novelty to flourish?
↑ comment by gwern · 2022-11-25T17:50:56.577Z · LW(p) · GW(p)
More examples beyond CycleGAN:
- 'non-robust features' in image classification: they exist, and predict out of sample, but it's difficult to say what they are
- stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like 'the' or 'an'. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
- degenerate completions/the repetition trap: aaaaaaaaaaaaaaaaa -!
↑ comment by janus · 2022-11-25T18:31:40.524Z · LW(p) · GW(p)
Ah yes, aaaaaaaaaaaaaaaaa, the most agentic string
Replies from: gwern↑ comment by gwern · 2022-11-25T20:17:07.224Z · LW(p) · GW(p)
You have to admit, in terms of the Eliezeresque definition of 'agency/optimization power' as 'steering future states towards a small region of state-space', aaa is the most agentic prompt of all! (aaaaaaaah -!)
Replies from: quintin-pope↑ comment by Quintin Pope (quintin-pope) · 2022-11-25T22:46:35.760Z · LW(p) · GW(p)
Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.
↑ comment by Alexandre Variengien (alexandre-variengien) · 2022-11-25T23:23:25.519Z · LW(p) · GW(p)
This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations.
I agree with this. I think that the most useful part of the concept is to force making the difference between the "superficial transformations" and the "things that stays".
I also think that it's useful to think about text features that are not (or unlikely to be) gliders like
- The tone of a memorized quote
- A random date chosen to fill a blank in an administrative report
- The characters in a short story, part of a list of short stories. In general, every feature coming before a strong context switch is unlikely to be transmitted further.
comment by janus · 2022-11-25T02:51:10.907Z · LW(p) · GW(p)
I think it'd be a fun exercise to think of LM analogues for other patterns in cellular automata like glider guns, clocks, oscillators, puffers, etc.
comment by Gunnar_Zarncke · 2022-11-25T12:22:16.237Z · LW(p) · GW(p)
Borrowing the analogy introduced in Simulators
the link is broken
Replies from: alexandre-variengien↑ comment by Alexandre Variengien (alexandre-variengien) · 2022-11-25T18:01:58.475Z · LW(p) · GW(p)
Thanks, it's fixed!
Replies from: Gunnar_Zarncke↑ comment by Gunnar_Zarncke · 2022-11-26T01:22:17.165Z · LW(p) · GW(p)
Actually, I tried out the in-line comment function for this. Nice and easy. I often see minor errors and would use this more but I wonder whether it will clutter the comments.
comment by the gears to ascension (lahwran) · 2022-11-25T07:01:26.562Z · LW(p) · GW(p)
my followup thought: so how do we solve inter-glider friendliness
gotta be able to split gliders into bundles. what if there are two agents knotted together, each of which are made of multiple gliders...