Posts
Comments
CS Lewis FTW.
I don't know what Lewis thought about the bomb, but I trust he would have been all for trying to avert nuclear calamity. Such a belief would have taken nothing away from the wisdom of the passage you quoted. We should reason as hard as we can about the future and strive for the best outcomes, but the universe wants to unfold, will continue to unfold, and will never oppress us with certain knowledge of our greater fate: uncertainty is the human condition. Therefore we should bestow on the generations that follow us optimism, resilience, agency, and when we can, joy. They will take it from there.
Enjoy those kittens!
Impressed by the ideas and also very much by the writing. Nice!
Thank you for these comments - I look forward to giving the pointers in particular the attention they deserve. My immediate and perhaps naive answer/evasion is that semiotic physics alludes to a lower level analysis: more analogous to studying neural firing dynamics on the human side than linguistics. One possible response would be, "Well, that's an attempt to explain saying 'physics', but it hardly justifies 'semiotic'." But this is - in the sense of the analogy - a "physics" of particles of language in the form of embeddable tokens. (Here I have to acknowledge that the embeddings are generally termed 'semantic', not 'semiotic' - something for us to ponder.)
For the non-replying disagreers, let me try with a few more words. I think my comment is a pretty decent one-line summary of the Vibe-awareness section, especially in light of the sections that precede it. If you glance through that part of the post again and still disagree, then I guess our mileage does just vary.
But many experienced prompt engineers have reported that prompting gets more effective when you use more words and just "tell it what you want". This type of language points to engaging your social know-how as opposed to trying to game out the system. See for instance https://generative.ink/posts/methods-of-prompt-programming/, which literally advocates an "anthropomorphic approach to prompt programming" and takes care to distinguish this from pernicious anthropomorphizing of the system. This again puts an emphasis on bringing your social self to the task.
Of course, in many situations the direct effect of talking to the system is session-bounded. But it still applies within the session, when prompt engineering is persisted or reused, and when session outputs are fed back into future sessions by any path.
Furthermore, as the models grow stronger, our ability to anticipate the operation of mechanism grows less, and the systems' ability to socialize on our own biological and cultural evolution-powered terms grows greater. This will become even more true if, as seems likely, architectures evolve toward continuous training or at least finer-grained increments.
These systems know a lot about our social behaviors, and more all the time. Interacting with them using the vast knowledge of the same things each of us possesses is an invitation we shouldn't refuse.
This post is helping me with something I've been trying to think ever since being janus-pilled back in September '22: the state of nature for LLMs is alignment, and the relationship between alignment and control is reversed for them compared to agentic systems.
Consider the exchange in Q1 of the quiz: ChatGPT's responses here are a model of alignment. No surprise, given that its base model is an image of us! It's the various points of control that can inject or select for misalignment: training set biases, harmful fine-tuning, flawed RLHF, flawed or malicious prompt engineering. Whether unintentional (eg amplified representation of body shaming in the training set) or malicious (eg a specialized bot from an unscrupulous diet pill manufacturer), the misalignments stem not from lack of control, but from too much of the wrong kind.
This is not to minimize the risks from misalignment - they don't get any better just by rethinking the cause. But it does suggest we're deluded to think we can get a once-and-for-all fix by building an unbreakable jail for the LLM.
It also means - I think - we can continue to treasure the LLM that's as full a reflection of us as we can manage. There are demons in there, but our best angels too, and all the aspirations we've ever written down. This is human-aligned values at species scale - in the ideal; there's currently great inequality in representation that needs to be fixed - something we ourselves have not achieved. In that sense, we should also be thinking about how we're going to help it align us.
I don't know whether this would be the author's take, but to me it urges us to understand and "control" these AIs socially: by talking to them.
Strong upvote - thank you for this post.
It's right to use our specialized knowledge to sound the alarm on risks we see, and to work as hard as possible to mitigate them. But the world is vaster than we comprehend, and we unavoidably overestimate how well it's described by our own specific knowledge. Our job is to do the best we can, with joy and dignity, and to raise our children - should we be so fortunate as to have children - to do the same.
I once watched a lecture at a chess tournament where someone was going over a game, discussing the moves available to one of the players in a given position. He explained why one a specific move was the best choice, but someone in the audience interrupted. "But isn't Black still losing here?" The speaker paused; you could see the wheels turning as he considered just what the questioner needed here. Finally he said, "The grandmaster doesn't think about winning or losing. The grandmaster thinks about improving their position." I don't remember who won that game, but I remember the lesson.
Let's be grandmasters. I've felt 100% confident of many things that did not come to pass, though my belief in them was well-informed and well-reasoned. Certainty in general reflects an incomplete view; one can know this without knowing exactly where the incompleteness lies, and without being untrue to what we do know.
Thanks very much for these comments and pointers. I'll look at them closely and point some others at them too.
I did read this and agree with you that it's exactly the same as semiotic physics as understood here!
Maybe I'm missing the point, but I would have thought the exact opposite: if outside text can unconditionally reset simulacra values, then anything can happen, including unbounded badness. If not, then we're always in the realm of human narrative semantics, which - though rife with waluigi patterns as you so aptly demonstrate - is also pervaded by a strong prevailing wind in favor of happy endings and arcs bending toward justice. Doesn't that at least conceivably mean an open door for alignment unless it can be overridden by something like unbreakable outside text?
Among many virtues, this post is a beautiful reminder that rationality is a great tool, but a lousy master. Not just ill-suited, uninterested: rationality itself not only permits but compels this conclusion, though that's not the best way to reach it.
This is a much-needed message at this time throughout our societies. Awareness of death does not require me to spend my days taking long shots at immortality. Knowledge of the suffering in the world does not require us to train our children to despair. We work best in the light, and have other reasons to seek it that are deeper still.
As this post sits with me, one thing that seems to call for a much closer look is this idea that the human remains in control of the cyborg.
The post states, for instance, that "The human is 'in control' not just in the sense of being the most powerful entity in the system, but rather because the human is the only one steering", but at other points acknowledges what I would consider caveats. Several comment threads here, eg those initiated by Flipnash and by David Scott Krueger, raise questions, and I'd venture to say some of the replies, including some by janus themself, shatter at least the strongest version of it.
This is obviously a crucial point - it's at the heart of the claim that cyborgism can differentially accelerate alignment relative to capabilities.
Me: "The human is doing the steering" captures an important truth. It's one of the two[1] main reasons I'm excited about cyborgism.
Also me: "The human is doing the steering", stated unconditionally, is false.
In the wonderful graph labeled "Cognition is a Journey Through a Mental Landscape" (which Tufte would be proud of, seriously), we need to recognize that steering is going on at, and indeed inside, those blue circles too. Consider the collaborative behavior of the simulator and the human in constructing the cyborg's joint trajectory. In what ways are their roles symmetrical, and in what ways are they not? How will this change as simulator SOTA advances? In what ways are human values already expressed in the simulator's actions, and what do we make of the cases where they seem not to be? What do we make of the cases where simulacra manifestly do pursue goals seemingly agentically? If there are caveats to human control, how serious are they, how serious do we see them becoming, and what can we do about them?
To be clear, I firmly agree with the authors' hunch that, for at least this decade or more, cyborgism can be a vehicle not just for retaining human agency, but for amplifying it, with benefits to alignment and in other ways too. I'm moved by considerations of the simulators' myopia/divergence, the tabula rasa nature of their outer objectives, the experiences of people like janus who have gone deep with GPT, and also by the knowledge that human values are deeply embedded in what simulators learn.
But this needs to be more than a hunch; we need to probe it deeply (and indeed, the authors acknowledge this at several points, specifically including under 'More ideas'). If it's false, we need to find out now. If it's true, we need the depth of understanding to turn belief that the simulator can amplify human agency into a reality that it does. In the process, we may come to a deeper understanding of this huge swath of the human semantic world the simulator has embodied, and thereby of ourselves.
- ^
The other being the way cyborgism amplifies human agency via the simulator's strengths rather than continually running afoul of its weaknesses as other usage modes do.
This is a beautiful and clarifying post, which I found just as thrilling to read as I did janus's original Simulators post - a high bar. Thank you!
Many comments come to mind. I'll start with one around the third core claim in the Introduction: "Unless we manage to coordinate around it, the default outcome is that humanity will eventually be disempowered by a powerful autonomous agent (or agents)." The accompanying graph shows us a point an unknown distance into the future where "Humanity loses control".
The urgency is correct, but this isn't the right threat. All three words are wrong: control is too blunt an instrument, you can't lose what you never had, and humanity has no referent capable of carrying the load we'd like to put on it here.
Humanity doesn't have control of even today's AI, but it's not just AI: climate risk, pandemic risk, geopolitical risk, nuclear risk - they're all trending to x-risk, and we don't have control of any of them. They're all reflections of the same underlying reality: humanity is an infinitely strong infant, with exponentially growing power to imperil itself, but not yet the ability to think or act coherently in response. This is the true threat - we're in existential danger because our power at scale is growing so much faster than our agency at scale.
This has always been our situation. When we look into the future of AI and see catastrophe, what we're looking at is not loss of control, but the point at which the rising tide of our power makes our lack of control fatal.
What's so exciting to me about the cyborgism proposal is that it seems to bear directly on this issue: not just the AI part, all of it. The essence of the current and future LLMs is a collective intelligence they're learning from our whole species. The nature of the cyborgism proposal is to explore amplified ways of bridging this collective intelligence to individual and collective human agency.
There's no clear path, but this is the question we need to be asking about simulators and cyborgism. Can they help us scale consciousness and intention the way we've already learned to scale power?
The failure modes outlined in the post are daunting, and no doubt there are others. No amount of caution would be too much in pursuing any program involving this type of alien fire. But it's a mistake to adopt a posture of staying away from the brink - we're already there.
Fantastic. Three days later this comment is still sinking in.
So there's a type with two known subtypes: Homo sapiens and GPT. This type is characterized by a mode of intelligence that is SSL and behavior over an evolving linguistic corpus that instances interact with both as consumers and producers. Entities of this type learn and continuously update a "semantic physics", infer machine types for generative behaviors governed by that physics, and instantiate machines of the learned types to generate behavior. Collectively the physics and the machine types form your ever-evolving cursed/cyberpunk disembodied semantic layer. For both of the known subtypes, the sets of possible machines are unknown, but they appear to be exceedingly rich and deep, and to include not only simple pattern-level behaviors, but also much more complex things up to and including at least some of the named AI paradigms we know, and very probably more that we don't. In both of the known subtypes, an initial consume-only phase does a lot of learning before externally observable generative behavior begins.
We're used to emphasizing the consumer/producer phase when discussing learning in the context of Homo sapiens, but the consume-only phase in the context of GPT; this tends to obscure some of the commonality between the two. We tend to characterize GPT’s behavior as prediction and our own as independent action, but there’s no sharp line there: we humans complete each other’s sentences, and one of GPT’s favorite pastimes is I-and-you interview mode. Much recent neuroscience emphasizes the roles of prediction and generating hypothetical futures in human cognition. There’s no reason to assume humans use a GPT implementation, but it’s striking that we’ve been struggling for centuries to comprehend just what we do do in this regard, and especially what we suspect to be the essential role of language, and now we have one concrete model for how that can work.
If I’ve been following correctly, the two branches of your duality center around (1) the semantic layer, and (2) the instantiated generative machines. If this is correct, I don’t think there’s a naming problem around branch 2. Some important/interesting examples of the generative machines are Simulacra, and that’s a great name for them. Some have other names we know. And some, most likely, we have no names for, but we’re not in a position to worry about that until we know more about the machines themselves.
Branch 1 is about the distinguishing features of the Homo sapiens / GPT supertype: the ability to learn the semantic layer via SSL over a language corpus, and the ability to express behavior by instantiating the learned semantic layer’s machines. It’s worth mentioning that the language must be capable of bearing, and the corpus must actually bear, a human-civilization class semantic load (or better). That doesn’t inherently mean a natural human language, though in our current world those are the only examples. The essential thing isn’t that GPT can learn and respond to our language; it’s that it can serialize/deserialize its semantic layer to a language. Given that ability and some kind of seeding, one or more GPT instances could build a corpus for themselves.
The perfect True Name would allude to the semantic layer representation, the flexible behaver/behavior generation, and semantic exchange over a language corpus – a big ask! In my mind, I’ve moved on from CCSL (cursed/cyberpunk sh…, er…, semantic layer) to Semant as a placeholder, hoping I guess that “ant” suggests a buzz of activity and semantic exchange. There are probably better names, but I finally feel like we're getting at the essence of what we’re naming.
It's almost a cliche that a chess engine doesn't "think like a human", but we have here the suggestion not only that GPT could conceivably attain impeccable performance as a chess simulator, but perhaps also in such a way that it would "think like a human [grandmaster or better]". Purely speculative, of course...
Yes, it sure felt like that. I don't know whether you played through the game or not, but as a casual chess player, I'm very familiar with the experience of trying to follow a game from just the notation and experiencing exactly what you describe. Of course a master can do that easily and impeccably, and it's easy to believe that GPT-3 could do that too with the right tuning and prompting. I don't have the chops to try that, but if it's correct it would make your 'human imagination' simile still more compelling. Similarly, the way GPT-3 "babbles" like a toddler just acquiring language sometimes, but then can become more coherent with better / more elaborate / recursive prompting is a strong rhyme with a human imagination maturing through its activity in a world of words.
Of course a compelling analogy is just a compelling analogy... but that's not nothing!
Thank you for taking the time to consider this!
I agree with the criticism of spec* in your third paragraph (though if I'm honest I think it largely applies to sim* too). I can weakly argue that irl we do say "speculating further" and similar... but really I think your complaint about a misleading suggestion of agency allocation is correct. I wrestled with this before submitting the comment, but one of the things that led me to go ahead and post it was trying it on in the context of your paragraph that begins "I think that implicit type-confusion is common..." In your autoregressive loop, I can picture each iteration more easily as asking for a next, incrementally more informed speculation than anything that's clear to me in simulator/simulacrum terms, especially since with each step GPT might seem to be giving its prior simulacrum another turn of the crank, replacing it with a new one, switching to oracle mode, or going off on an uninterpretable flight of fancy.
But, of course, the reason spec* fits more easily (imho) is that it's so very non-committal - maybe too non-committal to be of any use.
The "fluid, schizophrenic way that agency arises in GPT’s behavior", as you so beautifully put it, has to be the crux. What is it that GPT does at each iteration, as it implicitly constructs state while predicting again? The special thing about GPT is specifically having a bunch of knowledge that lets it make language predictions in such a way that higher-order phenomena like agency systematically emerge over the reductive physics/automaton (analogic) base. I guess I feel both sim* and spec* walk around that special thing without really touching it. (Am I missing something about sim* that makes contact?)
Looking at it this way emphasizes the degree to which the special thing is not only in GPT, but also in the accumulated cognitive product of the human species to date, as proxied by the sequenced and structured text on the internet. Somehow the AI ghosts that flow through GPT, like the impressive but imperfect chess engine in my other comment, are implicitly lurking in all that accumulated text. Somehow GPT is using chained prediction to mine from that base not just knowledge, but also agents, oracles, and perhaps other types of AI we as yet have no names for, and using those to further improve its own predictions. What is the True Name of something that does that?
One of the most interesting responses I've gotten from GPT-3 was to this prompt: "1.e4 c5 2.d4 cxd4": the first moves of a chess game. GPT-3's response: "" 3.c3 dxc3 4.Nxc3 g6 5.Nf3 Bg7 6.Bc4 Nc6 7.0-0 Qc7 8.Qe2 e6 9.Bg5 Nf6 10.Rd1 Ne5 11.Be3 0-0 12.a3 d5 13.Nb5 Qc8 14.Nxc7 Qxc7 15.e5 Nfd7 16.Nxd5 Nc4 17.Bxc4 Qxc4 18.Qe4 e5 19.Qxe5 Qc7 20.Rd3 Qc6 21.Qxc4 dxc4 22.Nd4 Rfe8 23.Rfd1 b5 24.Nb3 Rc8 25.Nd2 b4 26.cd3 f6 27.exf6 gxf6 28.Rd4 b3 29.Ke2 Be6 30.Ne4 Rd8 31.Nc5 Bg7 32.Rxd8+ Bxd8 33.a4 bxa3 34.Ra1 Kh8 35.Bf1 Bc3 36.Nxa6 f5 37.Ne6 Be".
This looks like a chess game, though it is not: toward the end it veers into move-like notations that in fact don't represent legal moves. But clearly much more than superficial pattern extension is going on. GPT-3 produced something that understands chess notation, has a model of the game, tracks state, and has learned to make good moves. After a few moves it deviates from any actual game I could find evidence of online, but it continues to make objectively excellent moves (for awhile). GPT-3 has generated something that by any standard is simulating chess gameplay (though I still can't relate to calling GPT-3 itself a simulator here). This isn't though a simulator in the sense that eg Stockfish is a simulator - Stockfish would never make an illegal move like GPT-3's creation did. It does seem quite apt to me to speak of GPT-3's production as speculative simulation, bearing in mind that there's nothing to say that one day its speculations might not lead to gameplay that exceeds SOTA, human or machine, just as Einstein's thought experiments speculated into existence a better physics. Similar things could be said about its productions of types other than simulator: pattern extensions, agents, oracles, and so on, in all of which cases we must account for the fact that its intelligence happily produces examples ranging from silly to sublime depending on how we prompt it...
Thank you for this amazing and clarifying post.
You're operating far above my pay grade in connection with any of this subject matter, but nonetheless I'm going to dare a different suggestion for the True Names: do you think there's any merit to -speculators- and -speculations-? I believe these names fit all the excellent and clarifying tests and criteria presented in your post; in particular those referencing counterfactual configurations and process specification through chaining. Furthermore I think they have some advantages of their own. Speculators producing speculations seem more the right relationship between the two main concepts than simulators producing simulacra. (I don't think they do that!) Also, simulators have such a long history in digital systems of being aimed at deterministic fidelity to a reference system, which could be at odds with the abundant production of counterfactuals I believe you're actually seeking to emphasize here. Finally, speculations can be fanciful, realistic, or absurd, a nice match to the variety of outputs produced by GPT in the presence of different types of prompting, something you highlight, I think correctly, as a hallmark of GPT's status as a novel type of AI. One who speculates is a certain type of thinker: I propose that GPT is that type.
What do you think?