Chinese room AI to survive the inescapable end of compute governance

rotatingpaguro

Chinese room AI to survive the inescapable end of compute governance

post by rotatingpaguro · 2025-02-02T02:42:03.627Z · LW · GW · 0 comments

  Compute governance, and the need to go beyond
  Naming of the idea: the Chinese room
  For enforceability, the allowed AI paradigm should robustly be a dead-end
  For economic incentives, the allowed AI paradigm should be competitive below human level
  Reasons for hope, reasons for despair
  First steps?
    Look at existing dead ends (forest chatbot)
    Believe in Yann LeCun
    Try to implement algorithm detectors with the current stack
  Acknowledgements
None
No comments

This post looks at the problem of how to make a cap on AI intelligence enforceable, using something different from current ideas in compute governance (mandated limitations to computational power).

The idea I present is finding an AI technique/architecture/paradigm which is a "tuned dead end": it can't be scaled up arbitrarily, for fundamental reasons that cannot be overcome with incremental adjustments; and reaches in practice a level short of human intelligence, but where it is economically useful. Then, above some compute threshold, use a regulatory regime to enforce the use of this architecture for general AIs.

Compute governance, and the need to go beyond

Enforced limits to AI intelligence are useful under the expectation that making AIs eventually more intelligent than humans is generally dangerous, and that AI developers as a whole can not be trusted to be cautious enough in increasing the intelligence level of their creations.

Concretely putting limits to intelligence is difficult because intelligence is not easy to quantify a priori, before having access to a functioning system to analyse empirically. To overcome this obstacle, compute governance starts from the very plausible expectation that increasing general AI capabilities eventually requires increasing the computational power used to create and run the AI system, so a rough limit to AI intelligence can be imposed by limiting computational power, which is easy to quantify.

My idea tries to go one step ahead, and limit AI intelligence more intrinsically. Although I most likely will, in hindsight, consider my specific idea bonkers, it is generally useful to research strategies beyond compute governance, because day after day compute governance becomes more difficult to enforce due to hardware and algorithmic progress: on the current trajectory, human-level AIs will, at some time in the future—which I guess might even be just 10 years away—run on small and cheap devices, which are less monitorable than a few large datacenters, making enforcement of compute caps very difficult.

Naming of the idea: the Chinese room

The "Chinese room" is a conceptual experiment about the consciousness of intelligent computers.

Consider an English-speaking person hidden from external sight in a room, tasked with conversing via letters with someone outside the room, in Chinese. The person in the room does not understand Chinese, but they are provided with access to the Chinese section of the library of Babel, containing all possible Chinese conversations. (This is not the original formulation, I’m taking some liberty.) So for every text message they receive, they can use the library cataloging system to find an appropriate response and continuation to the conversation, and copy it to reply.

The point of this construction is wondering if the person and the library, hidden together in the room, considered as a single entity, should be considered a conscious Chinese-speaking entity, as from the outside they are indistinguishable from an actual Chinese individual hidden in the room. Even though the actual person in the room does not understand anything of what they read and write, the correspondent, from its own point of view, is having an exchange with something that understands Chinese and converses with it as a Chinese-speaking person would.

The Englishman mindlessly following the rules of a cataloging system to compose replies is an allegory for a computer running a program. Since very recently, we have computer programs that can converse in any language they are trained on, so this conceptual experiment is outdated: we can directly wonder whether chatbots are conscious, or have personhood status, or some other human-like property.

Anyhow, I take inspiration from this experiment just to name my idea. An angle of analysis [LW · GW] of the Chinese room is that it is a supremely inefficient way—infeasible, in fact—to implement a Chinese-speaking person. Then the argument would go that this makes it too distant from realistic intelligent entities to say something meaningful about consciousness... but I only care that the Chinese room is intrinsically inefficient, due to relying on a list of all possible meaningful sequences of characters. This kind of insurmountable explosive combinatorial barrier is what I would like to engineer into an AI system in a controlled way.

For enforceability, the allowed AI paradigm should robustly be a dead-end

In this section I argue that, to make it realistically possible to check everyone is using only the allowed limited-intelligence AI technique, such AI technique should be something which is impossible to scale up in intelligence with progressive improvement.

The point of compute governance is that it's relatively easy to measure how many chips anyone has and is running, without meddling in their business so much as to impede it.

Putting rules at the level of which algorithms they are running on the chips is more complicated.

As an illustration, consider the extreme hypothetical of picking a current frontier deep learning LLM chatbot and proclaiming that it is the only generalist AI system ever allowed to run. Enforcing this effectively would require, I think, to force hardware companies to print ASICs hardcoding that specific AI system, and forbid the production of any other other general purpose computing device beyond a certain capacity. Independently of that, such a norm would completely halt progress in AI research, even at levels and along directions which do not pose any existential risk.

So, if we want to go there anyway, making the regulations feasible requires some way to check the rules are being respected, while leaving people free to buy general purpose chips of high power, and leaving AI developers free to try to squeeze as much as they can out of improvement to their AI systems.

Even if at this point I can't envision in arbitrary detail a way such a thing could work, reasoning in the abstract I think this would be somehow possible if there was an AI paradigm which was a dead end, yet worked well enough to be broadly economically useful.

Imagine if next year it turned out that, like some people are claiming right now (end of 2024), Deep Learning has hit a wall, and current AI systems can't be substantially improved: then all the AI development effort would go towards optimizations, refinements, and applications. There are probably many things that could be done with current AI tech and compute without substantially improving the general intelligence of the systems. It would be a good outcome, and there would be for a while no fear of a system generally surpassing human intelligence, with all risks implied by that.

In reality I expect, and most should at least hold plausible, that AI progress is not going to stall for long in this way. Such a slowdown would have to be engineered and enforced. That would look like finding an alternative way to make AIs, different from DL, that reaches the current AI capability levels, but can't be scaled up without throwing everything away and starting from scratch.

As a pretty abstract argument for how this would allow enforcement and enough free experimentation, consider the work of AI developers as some optimization process. The AI developers are trying to make some general ability measure of their systems go up (even if nobody can write down what that measure is, they know it when they see it). A failed AI paradigm is like a local maximum of the target measure in the space of possible AI designs: incremental improvements won't get the developers away from the local maximum, so the width of the maximum defines a playpen where free AI experimentation does not run the risk of arbitrarily increasing AI intelligence. The paradigm would come with its own idiosyncrasies, much like DL calls for hardware optimized to do matrix multiplications at a certain numerical precision. So all AI systems produced within this paradigm would likely share some broad properties that allow to distinguish them from other ways of implementing AI, like it would be probably possible to guess the kind of task a GPU is running by looking at general usage patterns of its memory and computational units.

As a best hope, maybe the hardware specialized for the allowed AI paradigm wouldn’t be very useful for DL or other ways of doing AI, producing some amount of hardware lock-in moat.

If limits monitored in such a removed way were put on AI systems within a paradigm that allows continuous improvement beyond those limits, there would always be some way to cheat and bypass the proxies used to measure whether the thresholds have been crossed. Try to picture doing this with DL, in the future where relatively small computers in the hand of individuals can train a 2024-level frontier AI system: you are a regulator that wants to plant a too-smart-AI-kill-chip in every GPU; said chip, to decide whether to trigger, can only access some broad diagnostics about the GPU because privacy; and people programming the AI software then get as many tries as they like at fooling the kill chip. This looks almost impossible.

Yudkowsky worries about a "removing the limits in the for loops" scenario, where someone takes an AI training procedure or AI system that's greenlighted for not crossing some safe capability threshold, and simply removes the limiters. This should not be possible by design: there should be no limiters to remove at all, the system or procedure should be intrinsically limited.

For economic incentives, the allowed AI paradigm should be competitive below human level

Compute governance is currently not a done deal. It sheepishly peeped out in US export restrictions on GPUs, the Biden Executive Order on AI [LW · GW], and in the vetoed California SB1047 regulation [LW · GW]. So, setting aside the hardware changes required for reliably enforced compute governance, the level of government involvement is still not up to the task. I take this as evidence that government action is a pain point. Another piece of evidence is that the field of AI seems to be currently locked in a race driven by profit. Thus I would like my stricter AI system governance proposal to not require more interventionism than a pure compute governance baseline.

I think a way this could hold is if there was enough economic incentive to switch to the dead-end paradigm, once it was clear that the government was going to restrict the general capability level. From the point of view of setting the compass of AI development, it should look like one has more to gain from switching immediately to the dead-end paradigm, if they think the government is determined to somehow stop them from scaling up the systems anyway at some point.

To make this the case, the new paradigm should, when properly studied and optimized, lead to more efficient AI systems than DL, below the threshold where it stops scaling. The alternative paradigm should allow to reach the same level of performance with less compute spent. For example, imagine the new paradigm used statistical models with a training procedure close to kosher Bayesian inference, thus having a near-guarantee of squeezing all the information out of the training data (within the capped intelligence of the model).

A specific aspect that the new paradigm should have to make it competitive is the possibility of scaling "horizontally", i.e., the possibility of spending arbitrary amounts of compute to make arbitrarily large systems that learn arbitrarily many facts and specific abilities. This should be possible even if the system can't scale "vertically", i.e., increase its general ability to put the pieces together by itself. Without horizontal scaling, there would always exist some specific use cases that some DL system can handle but no system of the alternative kind can. (This aspect plays into the "Chinese room" name, as the Chinese room works with a big store of knowledge rather than with smart algorithms.)

Reasons for hope, reasons for despair

Is there such a paradigm at all to be found? I have some indirect evidence. So far, it seems to me that general nonparametric regression methods (e.g., supervised ML methods and autoregressive LLMs) scale at best as , i.e., they use an amount of compute proportional to the square of the number of training examples. I think the cases where this law does not hold are either inefficient (i.e., exponent > 2) implementations that can be replaced by equivalent ones with the squared scaling, or special cases where additional restrictions allow better scaling (i.e., exponent < 2), or algorithms were the optimal exponent is < 2 but only because they are not able to use additional compute to squeeze more out of the data, and so they are eventually surpassed by other algorithms.

My pieces of evidence for the observation « $c o m p u t e \propto {d a t a}^{2}$ » are:

The Chinchilla scaling laws.
GPyTorch, an efficient implementation for generic Gaussian process regression.
Some half-formal arguments I made up, that I won’t report here for brevity.

Despite sharing this scaling behavior, the various regression methods display qualitatively different properties, and in particular Deep Learning is the only paradigm that scaled to the frontier AI systems we have today. So dead-end methods that make in some sense statistically efficient use of the data, but that are fundamentally limited in what patterns they search for in the data, do exist.

Can the two properties I requested in the sections above be satisfied?

For enforceability, I required robustness to incremental improvement. Assuming that the people working on the dead ends of the past and present try their best, such robustness exists in the wild.

For economic reasons, I required efficiency compared to DL before scaling stops. The example I can think of this is statistical models used in data analysis: DL, or NNs more in general, are typically not a good tool for the job, but the best methods do not scale arbitrarily in breadth and complexity of patterns they recognize in the data like DL.

Now that I’ve argued that a paradigm exists, the next question is: can it be found? Even though I managed to list some existing cases with qualitative properties I want, I'm asking for a rather specific and extreme version of that. This kind of research would be a first that I know of; but it’s a first probably because nobody ever had reason to deliberately search for an “inferior” technique, rather than because it is impossible: this is reason for optimism.

Ok then, even if the paradigm exists and can be found, is it enforceable (without excessive dystopia)? Turing machines gonna be Turing machines. Maybe I am wildly overestimating the difficulty an AI developer would have in bypassing any superficial check of algorithm type that is meant to enforce the use of the specific AI architecture. For example, someone could run NNs on CPUs instead of specialized hardware to bypass the checks; even if there was an efficiency penalty, the ability to reach a higher intelligence level may be worth it. So my reason for hope lies mostly in non-specialized hardware not being good enough to be convenient to anyone, at least for some decades.

Maybe it is too early to look at this problem, before compute governance has actually happened and taken shape. On the other hand, based on my AI safety reading diet, the absence of a story on how compute governance is going to survive technological progress looks to me like a significant hole. Compute governance is up there as one of the top practical solutions to potential danger by very intelligent AIs. For example, it seems to be the main goal of MIRI right now. But its edge of compute being centralized and bottlenecked won't last. I haven't made any estimates with numbers; my gut says we have 10-20 years before it stops working.

Another general con is that regulation at this level of strictness is not politically palatable due to the general libertarian attitude of Silicon Valley. I note this problem, but I don’t offer further commentary because political strategy is not my forte.

First steps?

I really don't know what peculiar AI technique would be a “Chinese room AI” as I want it. However, I see paths to start playing with the concept and see what happens. Below I list some first ideas to start the research.

Look at existing dead ends (forest chatbot)

Measure how existing dead-ends fail to scale. For example, I could make an LLM with some standard regression method based on decision trees (e.g., XGBoost), and compare it to GPTs. Which maximum level of textual sense-making would the tree-based LLM reach? Given a tree model with a certain size/compute, at which size/compute would a GPT match its performance? What are the scaling laws of the tree-based LLM?

(My honest expectation is that such an LLM would be so poor as to not allow to infer much in general.)

Believe in Yann LeCun

If Yann LeCun is right that LLMs are not gonna get us to AGI, maybe we can just say “only pure LLMs allowed”. Or to make a more precise hypothesis, maybe LLMs without search scaffolding are limited; so speaking of OpenAI, we would allow the gpt-x series, but ban the o-x series (h/t Christopher Ackermann).

For OpenAI, it’s pretty late isn’t it. Let’s set that aside.

Transformer LLMs ~are efficient universal Turing machines (see arXiv:2411.01992 “Ask, and it shall be given: Turing completeness of prompting”, which I have not read and I am taking at face value; h/t @Verglasz [LW · GW]). So maybe LLMs are not a dead end indeed, because it is possible to train anything into them. However, to do this training it may be necessary to have a more powerful AI of other sort to produce the data [LW(p) · GW(p)], so limits on that more powerful AI could indirectly work to limit current LLMs.

Try to implement algorithm detectors with the current stack

As a pilot study of enforceability, try to write a program that detects which AI algorithm is running on a GPU (h/t Drew Stelly).

A concrete version could look like: given some pretty limited pre-specified inspection access to a GPU, the blue team has to write a program that gathers as much information as possible on what’s being run on it, while the red team has complete access to the GPU and decides what to run. The process is iterative, where each iteration has red-blue-red turns: the red team decides what to do with the GPU, the blue team commits on a program to monitor the usage, and then the red team is allowed a final turn to try to defeat the blue team program (with open-source access to it). A judge tries to set the affordances given to the blue team to both 1) make the red team fail (more affordances), 2) stop the blue team from reading arbitrary data from the GPU (less affordances).

Acknowledgements

I wrote this post as final project for the BlueDot AI safety course. This idea had been fruitlessly floating in my mind for a while; I thank them for being an effective commitment device that made me sit down and work it out. I also thank my “classmates” and tutor for the actually helpful and concrete feedback.

0 comments

Comments sorted by top scores.

Chinese room AI to survive the inescapable end of compute governance

Contents

Compute governance, and the need to go beyond

Naming of the idea: the Chinese room

For enforceability, the allowed AI paradigm should robustly be a dead-end

For economic incentives, the allowed AI paradigm should be competitive below human level

Reasons for hope, reasons for despair

First steps?

Look at existing dead ends (forest chatbot)

Believe in Yann LeCun

Try to implement algorithm detectors with the current stack

Acknowledgements

0 comments