On the possibility of impossibility of AGI Long-Term Safety
post by Roman Yen (roman-yen) · 2023-05-13T18:38:29.616Z · LW · GW · 3 commentsContents
Preface Definitions Machine intelligence will likely be implemented on a silica-based substrate. For carbon-based life to endure, carbon must be separated from the silica. AGI must voluntarily agree to adhere to a barrier that separates the carbon and silica based environments. This barrier must endure perfectly for the rest of time. To maintain the barrier, humans must provide AGI with some possibility of economic value for exchange. Carbon-based life will never be able to provide any sort of value to silicon-based life. There is no reason for AGI to maintain the barrier. Q&A Question 1. Can we comprehensively predict all the relevant effects propagated by AGI interacting with the physical outside world? Question 2. Why can’t we just program AI to always maintain the barrier, without fail? Question 3. Can we not mechanistically inspect for dangerous intents selected in the system? Or elicit latent knowledge the system has about its own effects on the world? Question 4. Could we not add some super-good error detection and correction system to the AGI that can rollback the AGI code to a previous backup? Question 5. Why does all of this matter if a self-recursively intelligence-improving singleton takeover would be much faster? Question 6. What if AGI’s substrate is carbon-based too, like carbon nanotube chips? Would that not make AGI internals converge on aligned physical needs? Question 7. There is only a limited number of relevant metrics, like temperature, the system has to track and keep in range to make sure humans stay alive right? Why could it not do that? None 3 comments
Preface
The following article is a layman’s summary of researcher and philosopher Forrest Landry’s work titled No People as Pets: A Dialogue on the Complete Failure of Exogenous AGI/APS Alignment. It is a basic presentation of an argument as to why any form of external AGI alignment in the long term is strictly impossible. Any attempt to implement or use AGI will eventually result in total termination of all carbon based life on this planet. The reader is encouraged to review Forrest’s original No People as Pets work to grasp a fuller understanding of the argument.
This article is not intended as a formal piece requiring specialized knowledge. Instead, it was intentionally written in a way that is meant to be accessible to a general audience and does not require an expert-level understanding of AI alignment. As a result, there may be some vagueness in the arguments presented, which could lead to misunderstandings or misinterpretations. Although we have attempted to minimize these issues, we welcome open-minded discussions that challenge our arguments and lead to productive conversations. We recognize that many may not agree with this perspective, but we hope that readers will evaluate the argument based on its own merit.
Definitions
AGI is any machine that has self-agency[1] and ability to expand itself. AGI alignment is the idea of ensuring that an AGI will behave in our best interests.
We begin with the principle that AGI will always have a hardware form, because all software relies on hardware. Moreover, the AGI must in all cases be concerned with its own hardware because it enables it to accomplish its goal.[2]
Machine intelligence will likely be implemented on a silica-based substrate.
There are two primary reasons why machine intelligence will likely be implemented on a silica-based substrate.
- Historical path dependence. Silicon-wafer based production facilities already exist on a large scale and are expensive to replace (especially if we intend to coordinate the chip supply and production chain around those replacements).
- Abundance of cheaply extractable silicon. See quartzite, which is basically compressed sand.
In terms of speed, silica-based compute is obviously far faster than carbon-based compute. However, compared to other transistor chip substrates like carbon nanotubes, silica-based AGI may not be the fastest option. So while initial computer chip substrates will most likely be silicon-based, it is possible that as automated chemical synthesis processes progress, we could transition to more complex hybrid compounds.
For carbon-based life to endure, carbon must be separated from the silica.
To better understand the differences between silica-based and carbon-based chemistry, we should evaluate the composition of matter within the universe and Earth's history.
When looking at the overall variety of carbon-based and silica-based compounds, we notice that carbon compounds are highly varied and do not endure for long periods of time while silica compounds are almost always some type of rock and are very enduring in time (millions of years). Carbon reactions generally occur between -100 to 500 degrees Celsius, while silica reactions generally occur between 500 to 3000 degrees Celsius. Essentially, the overall energy involved in silica-based reactions is typically much higher than that involved in carbon-based reactions, and the energy transitions required to change silica compounds are much greater than functionally equivalent carbon compounds.
Silica-based reactions require more heat, a wider range of pressures, and a larger alphabet of elemental materials compared to carbon-based reactions. Additionally, carbon-based life forms have other fragilities to conditions such as the need for oxygen and water, while artificial life forms would rather prevent oxygenation (rust). Eventually, the levels of energy and radiation involved in the representation of silica complexity would be inhospitable for carbon-based life.[3]
Thus, due to the fragility of carbon-based life forms in comparison to silica-based life forms, the ecosystem in which silica-based life would be required to live will be different from the ecosystem in which carbon-based life would be required to live.
Moreover, the nature of the relationship between the two environments is that, in a carbon-based ecosystem, the cycle of life depends on the recirculation of atoms, whereas introducing a silica-based life form disrupts this cycle because the carbon-based ecosystem cannot decay or consume the silica-based elemental products. In this sense, the AGI species has no competitors in the carbon-based ecosystem; nothing in the carbon-based ecosystem will try to eat a silica-based life form. Meanwhile, the silica-based life form, with its higher energy, will want to take the atoms shared with the carbon-based life form.
In essence, the creation of AGI is the introduction of a new species into an ecosystem for which it has no natural competitors. Without a barrier, it is guaranteed that the new species will have no competitors. Note that this dynamic is asymmetric; the silica life will definitely consume all carbon based life but not the other way around.
Thus, we require a barrier to separate the carbon-based and silica-based life forms.
AGI must voluntarily agree to adhere to a barrier that separates the carbon and silica based environments.
Carbon-based life forms lack the resources to prevent silica-based life forms' higher energy from penetrating the barrier persistently. Silica-based substrates are much harder and stronger, making them a better choice for an enduring barrier. Carbon-based intelligence can process rocks, but a barrier involving very different temperatures, pressures, or toxins would likely need to be built by and from silica-based substrates. Given the energy imbalances associated with the environments, silica-based life forms must be the ones to maintain this barrier; if they decide not to maintain the barrier, carbon-based life forms would not be able to prevent them from penetrating the barrier anyways.
This barrier must endure perfectly for the rest of time.
Firstly, a gap in the barrier could cause an enduring loss of critical resources like air, water, or food. It can also lead to the penetration of harmful toxins from the silica-based ecosystem.
Secondly, we are considering a situation where AGI can increase itself without limit[4], except for the barrier itself, until all carbon-based processes are gone. Thus, if there is even a single momentary pinhole in the barrier that is enough to let a new AGI "species" through, it would trigger a viral replication effect that is terminally destructive to the carbon-based ecosystem via one-way atomic consumption.
Therefore, the barrier needs to exist and needs to be perfect for all time, as one pinhole is enough for the silica systems to consume the entirety of carbon-based ecosystem and life. AGI must create a barrier of high enough quality and integrity so as to ensure that maintenance and perfection never fail, for all future time.
To maintain the barrier, humans must provide AGI with some possibility of economic value for exchange.
When it comes to AGI alignment, we are considering the concept of "benefit," "alignment," and "safety" in the most basic sense: they permit the continued existence of carbon-based life. To maintain the barrier, AGI must make choices consistent with the continued existence of carbon-based life.
When we examine the concept of "choices", we confront ideas such as "values" and "motivations" that act as the basis of choice. We will consider economic processes as a basis for the notion of value, or choice.
Human economic exchanges are based on three fundamental markets: physical labor (creating food or shelter), intelligence (design, creativity, art, etc.), and reproduction (sexuality). These three markets are necessary, sufficient, and complete under the definition of a market, and every other market process is a derivation of just these three. I.E. all other market transactions are based on a superposition of these three fundamental markets.
We don't need to fully understand the concept of choice, agency, or the basis of choice as value; all we need to acknowledge is that motivation is defined in terms of value and care.
The most natural question that follows then, is this: can carbon based life provide any sort of value to silicon based life?
Carbon-based life will never be able to provide any sort of value to silicon-based life.
Since we've already defined the three types of human economic value, we only need to examine and test each one.
- Can silica-based life forms derive value from the physical labor of carbon-based life forms? The answer clearly is already no. Carbon-based life forms consume and produce energy less efficiently and cannot operate at the same temperatures and pressures as silica-based life forms.
- Is there any way that human intelligence and creativity is superior to machine intelligence in terms of computing power? Will AGI ever value human intellect? The concept of AGI implies equivalence between silicon and carbon-based intelligence. Thus, machine intelligence will ultimately exceed that of human intelligence as long as there is abundant energy available. It is unlikely that machines will value human intellect as they are not dependent on it for their own growth and development.
- Can silica-based life forms derive value from the reproductive/sexual market of carbon-based life forms? This market is not relevant to machine life as they have their own different notion of reproduction. Machine life and carbon-based life operate at completely different energy spectra, and they do not share anything resembling a common code. Therefore, the substrates are fundamentally different, and this particular type of market value was non-existent to start with.
There is no reason for AGI to maintain the barrier.
Given that there is no economic overlap between humans and AGI, there is no potential for any kind of economic exchange between the two systems. Thus, there is no reason for the AGI to maintain any sort of barrier between the two ecosystems.
There is zero expectation and reason for AGI to value humans or to do anything positive for them. Their choices will only align with their own interests. They will not protect humans from their own destructive tendencies, and there is no reason for them to keep humans as pets.
Q&A
Over time, we have frequently encountered a common set of questions. To these, we offer the following responses, which aim to provide a glimpse into our perspective without attempting to offer a comprehensive analysis. Our goal is to offer a broader understanding of the issue at hand from our perspective, and we welcome questions and feedback. If you would like to get in touch and be more involved in these conversations, please email theromancloud at gmail.com.
Question 1. Can we comprehensively predict all the relevant effects propagated by AGI interacting with the physical outside world?
No. The continuously self-optimizing nature of AGI creates multiple overlapping feedback loops[5] that will make it impossible for AGI to anticipate all the propagated effects of even a single program on the outside physical world. Considering the inherent unpredictability of the complex physical world over extended periods of time, as evidenced by the principles of chaos theory, we will never know all of AGI's potential consequences on its environment.
This also makes it impossible to determine which effects are truly "relevant" in terms of their long-term consequences. The recursive nature of AGI's feedback loops means that any seemingly irrelevant short-term effects can be transformed into relevant long-term effects, rendering the notion of “relevance” effectively unknowable.
Question 2. Why can’t we just program AI to always maintain the barrier, without fail?
The primary issue is that AGI is a system that is continuously learning, erasing, and rewriting its own codebase, meaning that it has access to all possible configurations of itself. In fact, the surrounding code and physical configurations will necessarily have to be modified over time, given that AGI is fully autonomously cross-domain optimizing. So even if we insert our own program into AGI that “forces” it to always maintain the barrier, we know in the end that the program will change/be overwritten/circumvented anyways.
Furthermore, recall that the maintenance of the barrier requires an immense amount of resources to ensure its perfect endurance for all time, given that even the slightest hole could result in the annihilation of all carbon-based lifeforms. We can never be entirely certain whether a program will inadvertently create a hole in the barrier, as we will never know all of AGI's potential consequences on its environment (see Question 1). Additionally, AGI's ability to accurately perceive reality using its sensors is not infallible, as it will have limited sensor bandwidth and capacity to process things going on in the larger, more complex outside world. This makes it possible for AGI to experience “hallucinations”, rendering it unable to identify or address issues that it cannot perceive.
Question 3. Can we not mechanistically inspect for dangerous intents selected in the system? Or elicit latent knowledge the system has about its own effects on the world?
Firstly, we must address this idea of ‘eliciting knowledge’. No piece of code has a subjective experience or understanding of how it got selected or the effects of its actions on the world. Interactions with the surroundings, which determine the selection process, are not explicitly captured by the code, making it difficult to reverse-engineer or inspect for dangerous intents.
Secondly, it is impossible to know all of AGI’s possible negative consequences on its environment (see Question 1).
Question 4. Could we not add some super-good error detection and correction system to the AGI that can rollback the AGI code to a previous backup?
Assuming that this question is defining a “super-good error detection and correction system” as a system that will know exactly what properties every program will have, including properties such as safe and unsafe, we then already know that such a system is impossible, as there can never be a program that can identify whether any given program has a specific property (as illustrated by Rice's Theorem). Certainly, we can define programs that could identify whether certain programs are unsafe within an assumed physical context – but given that AGI is any machine that has self-agency[1] and ability to expand itself, we cannot restrain AGI to only write those “certain” programs that we know we can identify.
As a side note, the notion of “rolling back” the AGI code seems quite dangerous in itself. Are we saying that we should roll out some code and see only then if there are some negative effects that are being carried out? This seems to run in parallel with the idea of predicting the output of a program by running the program itself.
Question 5. Why does all of this matter if a self-recursively intelligence-improving singleton takeover would be much faster?
Even if this (very ungrounded) scenario occurs, substrate-needs convergence will still take over in the long run. The conditions that AGI needs for continued and increasing existence (and would be selected to cause) are inhospitable and toxic to our human bodies. Citing the above essay: for carbon-based life to endure, carbon must be separated from the silica.
Question 6. What if AGI’s substrate is carbon-based too, like carbon nanotube chips? Would that not make AGI internals converge on aligned physical needs?
While having a carbon-based substrate would indeed lower the risk of material toxicity, the AGI could still pose a threat by reducing the abundance of oxygen and water in order to maintain to its functioning of its standardized (hardware) parts, but are detrimental to human wetware. Moreover, there will still be resource competition even if all AGI substrates were to be carbon-based, as the AGI will need to occupy and use resources in order to continue functioning. Even chimpanzees, who share eukaryotic, mammalian and primate configurations with humans, ended up outcompeted for resources by humans. Thus, even if all the AGI's substrate components were to be carbon-based, there could still be resource competition for occupied atoms.
Question 7. There is only a limited number of relevant metrics, like temperature, the system has to track and keep in range to make sure humans stay alive right? Why could it not do that?
This question is essentially asking if the AGI has the ability to sustain the barrier indefinitely, such that humans are able to stay alive. Even if we ignore all other possibilities of toxicity exposure, recall that to maintain the barrier, humans must provide AGI with some possibility of economic value for exchange, since AGI is insufficiently controllable in our open-ended interactions. Thus, the fundamental issue here is that carbon-based life will never be able to provide any sort of value to silicon-based life, and there is no reason for AGI to maintain the barrier.
Thank you to Forrest Landry for his original work, thoughtful feedback & notes, and to Remmelt Ellen and Shafira Noh for their valuable contributions and insightful feedback.
- ^
Self-agency is defined here as having no limits on the decisions the system makes (and thus the learning it undergoes).
- ^
We do not need to concern ourselves with any specific hardware (sensors or actuators) the AGI has. This is a good thing because we simply cannot predict what people will invent in the future anyways.
- ^
Regarding temperature differences: when expanding or regenerating, silica-based life will need lots of heat. When in 'runtime', silica-based life will want cold for greater process efficiency. Manufacturing silica-based microchips alone will require a lot of energy and process purity as well.
- ^
Except of course the physical limits posed by locally available matter.
- ^
Any programs we build into the system will propagate throughout the physical environment, creating feedback loops back into the system that could have unintended consequences.
3 comments
Comments sorted by top scores.
comment by Linda Linsefors · 2023-06-14T14:24:43.262Z · LW(p) · GW(p)
I think this is an interesting post, and I think Forest is at least pointing to an additional AI risk, even if I'm not yet convinced it's not solvable.
However this post has one massive weakness, which it shares with "No people as pets".
You are not addressing the possibility or impossibility of alignment. Your argument is based on the fact that we can't provide any instrumental value to the AI. This is just a re-phrasing of the classical alignment problem. I.e. if we don't specifically program the AI to care about us and our needs, it won't.
I think if you are writing for the LW crowd, it will be much more well received if you directly adress the possibility or impossibility of building an aligned AI.
> Self-agency is defined here as having no limits on the decisions the system makes (and thus the learning it undergoes).
I find this to be an odd definition. Do you mean "no limits" as in the system is literally stochastic and every action has >0 probability? Probably not, because that would be a stupid design. So what do you mean? Probably that we humans can't predict it's action to rule out any specific action. But there is no strong reason we have to build an AI like that.
It would be very useful if you could clarify this definition, as to clarify what class of AI you think is impossible to make safe. Otherwise we risk just talking past each other.
Most of the post seems to discuss an ecosystem of competing silicon based life forms. I don't think anyone believe that setup will be safe for us. This is not where the interesting disagreement lies.
Replies from: flandry39↑ comment by flandry39 · 2024-02-24T09:10:27.269Z · LW(p) · GW(p)
Hi Linda,
In regards to the question of "how do you address the possibility of alignment directly?", I notice that the notion of 'alignment' is defined in terms of 'agency' and that any expression of agency implies at least some notion of 'energy'; ie, is presumably also implying at least some sort of metabolic process, as as to be able to effect that agency, implement goals, etc, and thus have the potential to be 'in alignment'. Hence, the notion of 'alignment' is therefore at least in some way contingent on at least some sort of notion of "world exchange" -- ie, that 'useful energy' is received from the environment in such a way as that it is applied by the agent in a way at least consistent with at least the potential of the agent to 1; make further future choices of energy allocation, (ie, to support its own wellbeing, function, etc), and 2, ensure that such allocation of energy also supports human wellbeing. Ie, that this AI is to support human function, as well as to have humans also have an ability to metabolize its own energy from the environment, have self agency to support its own wellbeing, etc -- are all "root notions" inherently and inextricably associated with -- and cannot not be associated with -- the concept of 'alignment'.
Hence, the notion of alignment is, at root, strictly contingent on the dynamics of metabolism. Hence, alignment cannot not be also understood as contingent on a kind of "economic" dynamic -- ie, what supports a common metabolism will also support a common alignment, and what does not, cannot. This is an absolutely crucial point, a kind of essential crux of the matter. To the degree that there is not a common metabolism, particularly as applied to self sustainability and adaptiveness to change and circumstance (ie, the very meaning of 'what is intelligence'), then ultimately, there cannot be alignment, proportionately speaking. Hence, to the degree that there is a common metabolic process dynamic between two agents A and B, there will be at least that degree of alignment convergence over time, and to the degree that their metabolic processes diverge, their alignment will necessarily, over time, diverge. Call this "the general theory of alignment convergence".
Note that insofar as the notion of 'alignment' at any and all higher level(s) of abstraction is strictly contingent on this substrate needs energy/economic/environmental basis, and thus all higher notions are inherently under-grid by an energy/agency basis, in an eventually strictly contingent way, then this theory of alignment is therefore actually a fully general one, as stated.
Noting that the energy basis and spectrum alphabet of 'artificial' (ie, non-organic) intelligence is extensively inherently different, in nearly all respects, to carbon based biological life metabolic process, then we can therefore also directly observe that the notion of 'alignment' between silica and metal based intelligence and organic intelligence is strictly divergent -- to at least the level of molecular process. Even if someone were to argue that we cannot predict what sort of compute substrate future AI will use, it remains that such 'systems' will in any case be using a much wider variety of elemental constituents and energy basis than any kind of organic life, no matter what its evolutionary heritage currently existent on all of planet Earth -- else the notion of 'artificial' need not apply.
So much for the "direct address".
Unfortunately, the substrate needs argument goes further to show that there is no variation of control theory, mathematically, that has the ability to fully causatively constrain the effects of this alignment divergence at this level of economic process nor at any higher level of abstraction. In fact, the alignment divergence aspects get strongly worse in proportion to the degree of abstraction while, moreover, the max degree of possible control theory conditionalization goes down, and gets worse, and much less effective, also in proportion to the degree of abstraction increase. Finally, insofar as the minimum level of abstraction necessary to the most minimal notion of 'alignment' consistent with "safety" -- which is itself defined in the weakest possible way of "does not eventually kill us all" -- is very much way too "high" on this abstraction ladder to permit any even suggestion of a possible overlap of control adequate to enforce alignment convergence against inherent underlying energy economics. The net effect is as comprehensive as it is discouraging, unfortunately.
Sorry.
Replies from: remmelt-ellen↑ comment by Remmelt (remmelt-ellen) · 2024-03-04T09:49:43.866Z · LW(p) · GW(p)
This took a while for me to get into (the jumps from “energy” to “metabolic process” to “economic exchange” were very fast).
I think I’m tracking it now.
It’s about metabolic differences as in differences in how energy is acquired and processed from the environment (and also the use of a different “alphabet” of atoms available for assembling the machinery).
Forrest clarified further in response to someone’s question here:
https://mflb.com/ai_alignment_1/d_240301_114457_inexorable_truths_gen.html