Towards a Formalisation of Returns on Cognitive Reinvestment (Part 1)
post by DragonGod · 2022-06-04T18:42:24.612Z · LW · GW · 11 commentsContents
Disclaimers Some Prior Contextualising Work Introduction Some Necessary Apparatus Some Needed Terminology and Yet More Apparatus Some Assumptions and Lemmas Self-Modification Assumption (SMA) Self Succession Assumption (SSA) Successor Existence Lemma (SEL) Successor Parity Lemma (SPL) Successor Superiority Assumption (SSA) Aligned Children Assumption (ACA) Genealogy of Agents Head Tail Which Lineage? Some Next Steps Areas of Improvement There'll be a Sequel Right? Even Further Future Directions Estimating RCR From ML History Directly Measuring RCR in the Subhuman to Near Human Ranges Closing Remarks None 11 comments
Disclaimers
This is a rough draft of the first part of the nth post of what I hope to turn into a proper sequence investigating AI takeoff dynamics. It's not entirely self-contained material. There's a lot of preceding and subsequent context that I have not written.
If you don't immediately understand the problem I'm trying to solve, why it's important, or why I chose the approach I did, that may be why. I will try to briefly explain it though, and do what I can to contextualise it.
I've written some tangible material on one aspect of the problem and share it for feedback and validation.
Some Prior Contextualising Work
My vision for where this fits into my broader attempt to investigate takeoff dynamics is in a thesis about the "hardness" of intelligence, and especially the hardness of intelligence with respect to itself (reflexive hardness?). I haven't written up my thoughts on the hardness of intelligence in a version that I fully endorse, but here's an even rougher draft that sketches up what the concept means and how it might affect takeoff dynamics [LW · GW].
Do note that I don't fully endorse that draft, and my notation and formalisms in it are strictly superseded by any notations and formalisms I use here. A polished up and refined version of the ideas in that draft will probably serve as the introductory post in my sequence on investigating takeoff dynamics (once I get around to stitching together all my disparate thoughts on the topic).
In the broader literature, this post can be viewed as an attempt to formalise how to measure "returns on cognitive reinvestment" as outlined by Yudkowsky in "Intelligence Explosion Microeconomics".
A significant (and important) way in which I disagree with Yudkowsky's framing of "returns on cognitive reinvestment", is that I'm thinking almost entirely about architectural and algorithmic improvements and not improvements from access to more computational resources (FLOPs, training data, hardware, etc.).
I have some other disagreements with some of Yud's framing and approach, but they wouldn't be addressed in this post.
Introduction
I would like to describe how to measure returns on cognitive reinvestment (RCR) in a more structured manner. The aim is to define — or specify how one might define — a function that captured the concept. I am most interested in the shape of that function (is it sublinear, linear or superlinear? Does the shape change across different intervals on the capability curve?) and its complexity class (logarithmic, sublinear polynomial, superlinear polynomial, exponential, superexponential, etc?). An exact specification of the function would be informative, but it's not strictly necessary.
Specifically, I want to define upper bounds on RCR. My interest in upper bounds is for reasoning better about takeoff dynamics. An upper bound on RCR constrains takeoff dynamics and gives us some form of safety assurances. E.g. we might be able to make predictions like:
- Given that it took time to transition from 2022 SOTA to near-human generality, it will take at least time to transition from near-human generality to par human.
- Given that it took time to transition from the near-human domain to the par human domain, it will take at least time to transition from the par human domain to a strongly superhuman domain.
And retrodictions like:
- Given that it took time to get from sub-human to beginner human on this battery of cognitive tasks, we can expect that it would take at least time to transition from beginner human to average human.
- Given that it took time to get from beginner human to average human on this battery of cognitive tasks, we can expect that it would take at least time to transition from average human to expert human.
The accuracy of the retrodictions could inform how confident we are in the predictions. And if the model's retrodictions/predictions are sufficiently accurate, we could use its further out predictions as a form of safety guarantee.
An interest in determining what safety guarantees we do in fact have on AI takeoff dynamics is my sole motivation for this line of inquiry.
(The course(s) of action I would pursue in a world in which superhuman AI was 8, 20 or 50 years away are pretty different. To better plan what to do with my life, I want to have a better handle on how AI capability development will evolve).
Some Necessary Apparatus
Some apparatus that I'll be using in this piece:
- A function for measuring RCR
- I'll denote it as
- I'll specify desiderata for and demonstrate how we might measure/calculate later.
- Agents will be denoted as
- The set of all agents will be denoted
- A measure of intelligence or cognitive capability (I'll freely equivocate between the two).
- I'll denote it as .
- gives the intelligence of . If , then I could write to specify alongside its cognitive capability.
- If two agents are equivalently capable/intelligent then we'll represent it as:
- Characteristics of :
- If you have two pairs of agents then the difference in intelligence between and between is the same .
- Other apparatus would be introduced as and when needed.
Some Needed Terminology and Yet More Apparatus
Given a sufficiently intelligent agent, it would be able to instantiate another agent:
- The new agent would be a "child".
- The original agent would be a "parent".
- The process of instantiating the new agent is "procreation".
Let us consider the most capable child that a parent can create using all their resources within a given time frame to be the "successor" of that parent.
For a given agent , I will denote its successor thus: .
A successor can of course create its own successor, and the "growth rate" of cognitive capabilities across generations of agents is what we're trying to determine.
To allow a sensible comparison across generations of agents, we'd fix the given time frame in which a parent has to instantiate their successor. That is, the length of time defining a generation is held constant.
Let the length of a generation be: .
Some Assumptions and Lemmas
Self-Modification Assumption (SMA)
Self-modification is a particular case of procreation in which the child and the parent are the same agent. The cases of self-modification, that are most interesting to us are those in which the parent succeeds itself (this particular case is what arises in "recursive self-improvemen [LW · GW]t" [LW · GW]).
So for subsequent considerations of self-modification here, we'll be considering self-modification as succession.
In those cases, it can be treated analogously to other forms of self-modification without loss of generality.
Note that we'll be considering agents that undergo significant self-modification to be distinct from the original agent.
Given an arbitrary agent , if undergoes significant self-modification within a generation to succeed itself, the new agent will be represented as .
I will not be justifying this assumption, just take it as axiomatic.
Self Succession Assumption (SSA)
Suppose that we permit agents to take self modifications of "no significant action" within a generation, then the original agent (modulo whatever resources it had acquired) would become its own successor.
We'll grant this allowance and refer to the cases where an agent succeeds itself without significant self-modification as "self succession". Whenever self succession occurs, we'll represent the successor using the same symbol as the original agent.
Given an arbitrary agent , if succeeds itself during a generation, the resulting successor will be represented as .
We'll refer to cases where the agent does not succeed itself (including by self-modification) as "distinct succession".
The notation used to refer to the successor will allow us to distinguish self succession from distinct succession.
A case where self succession will prove useful will be if the agent was not able to create a more capable child within a generation. By allowing the agent to default succeed itself and acquire new resources, we can permit the agent to "roll over" the creation of a successor across generations. This will enable us to more accurately measure RCR even if the returns diminish over time such that distinct successors do not emerge in some generations.
SSA has many ramifications on our considerations for , and the measurement of . These ramifications will be considered at more length in a subsequent post.
Successor Existence Lemma (SEL)
That is: "for every agent, there exists a successor to that agent".
This follows trivially from SMA. Via SMA, the agent can succeed itself. If a self-modification of "no action" is taken during a generation then the resulting agent (the original agent) becomes its successor (assuming the agent does not create any more capable children during that generation).
I will refer to cases where the agent becomes its own successor without taking significant self modification actions as "self succeeding".
Successor Parity Lemma (SPL)
That is: "The successor of every agent is at least as intelligent as the original agent".
This follows trivially from SEL:
That is:
- In cases where the successor is the original agent, then they are both equivalently capable.
- In cases where the successor is not the original agent, then the successor is more capable
- This follows from the definition of the successor as the most capable child an agent can create within a generation.
Successor Superiority Assumption (SSA)
That is: "there exists an agent whose successor is strictly more intelligent than itself".
This assumption is not as inherently obvious as SMA, so it does need justification. It's not necessarily the case that agents are able to instantiate agents smarter than themselves.
However, the entire concept of AI takeoff dynamics (the sole reason for which I decided to investigate this topic) takes it as an implicit assumption that we will eventually be able to create par human (and eventually superhuman) AI. Perhaps we will not. But as I'm situating my investigation within the context of AI takeoff dynamics, I feel confident making explicit this implicit assumption.
Note: I'm not saying that all agents would have successors smarter than themselves, just that there is at least one such agent. (Even if there is only one such agent, then the assumption is satisfied).
I'll refer to those agents who have successors smarter than them as "SSA-satisfying agents" or "SSA-S agents".
That is: we're using to represent the set of all agents whose successor is more capable than them.
Aligned Children Assumption (ACA)
"All children are fully aligned with the values and interests of their parents."
This is not necessarily a realistic assumption. Nonetheless, I am choosing to make it.
My reason for this is that I'm (most) interested in upper bounds on RCR, and if all agents have aligned children within a generation, they can use said aligned children to build even more capable children (the most capable of which becomes the successor).
I guess this can be thought of as a best-case analysis of RCR (what's RCR like under the most favourable assumptions?). Analyses trying to demonstrate a lower bound to RCR or to measure it more accurately should not make this assumption.
Genealogy of Agents
I will refer to a line of succession involving an agent as a "lineage". I'll attempt to specify that more rigorously below.
For any two given agents , let:
For a given agent , let:
I would like to define two more concepts related to a given lineage: "head" and "tail" (as for why I chose those names, you might have noticed that a lineage can be modelled as a linked list.)
Head
For a given lineage , I'll denote the "head" (think root ancestor) as .
You could read this as "the head of the lineage containing ".
Tail
For a given lineage , I'll denote the "tail" (think final descendant) as .
You could read this as "the tail of the lineage containing ".
Which Lineage?
Our core investigation is the nature of the change in cognitive capabilities change across agent lineages (and explicitly for the purpose of reasoning better about takeoff dynamics). To a first approximation, we might pick a reference lineage to examine.
It seems that a natural method of inquiry is to pick a "head" and then investigate how cognitive capabilities change across its descendants with each generation.
Because of our interest in takeoff dynamics, I suppose that our initial starting agent must be a member of . This is because if it wasn't, its successor would be itself, and its lineage would only contain the original agent (the demonstration of this is left as an exercise for the reader).
One might even take a stronger position. We might insist that our starting agent be the least intelligent agent capable of creating a more intelligent successor. The reasons for this might be:
- Starting with less capable agents allows you to capture more of the "curve".
- Starting with the least capable agent will allow you to capture the entirety of the curve.
- An underlying assumption behind this view is something like: "it doesn't matter where you start your lineage from, in the limit, the intelligence of successors would converge to the same value (e.g. if they run into fundamental limits [thermodynamics, computational intractability, computational complexity, etc.])
- I did not specify this assumption earlier, because I am not convinced that it is true, and I want to make as few assumptions as possible.
- A way in which it might easily be false is some genealogies might get trapped in local optima.
- One seeks to be conservative in general.
- E.g. one might suspect that the first SSA satisfying agent we create is the simplest SSA agent possible.
- One might think the starting point doesn't matter and RCR will show similar behaviour for most SSA-S agents.
- Intuitively, the more capable SSAS agents are closer to the peak/plateau of capability.
- Agents that do not satisfy SSA are not able to create more capable successors. These agents will probably cluster in two groups:
- 1. Those too stupid to create successors.
- 2. Those too smart to create successors (they have plateaued in their lineage or have run against some fundamental limits).
- Thus, there's some concern that picking significantly more capable agents in which to root our lineage could be distortionary.
- Agents that do not satisfy SSA are not able to create more capable successors. These agents will probably cluster in two groups:
I am not fully convinced on all of the above reasons, but we do need to pick a particular member of A for \alpha_0, and the only choice for which there seems to be reasonable arguments is the least capable member of the set.
Thus, one potential definition of might be:
There are other ways we could potentially define , but I think I'll tentatively accept the above for now.
An extra constraint on that I find interesting is insisting that has a lineage which contains the global optima.
My reason for adding this extra constraint is again that I am most interested in an upper bound on RCR.
One way to formalise the above constraint is:
That is: our chosen "head" has a "tail" who's more capable than or equally capable as every other agent.
Some Next Steps
Some things I might like to do for my next part(s), or in future rewrites of this part.
Areas of Improvement
Places where I can improve my draft of this part
- Clean up the notation (my current notation for referring to arbitrary agents (e.g. ) and their successors () may be causing unneeded confusion.
- Alternatively, defend my current notation (one reason the subscript of the agent's is constrained to the Reals instead of the Naturals is that we might e.g. consider all "heads" to be denoted with subscripts between 0 and 1, and the rest of the positive Reals could be used to denote their descendants).
- It's possible that agent space might be larger than the Reals, but I don't think it particularly matters. I expect the Reals to be large enough to adequately address the kind of agents I would be interested in talking about).
- Other missing pieces from my notation:
- I don't have notation for referring to non-successor children created within a generation.
- This will be needed to talk about children assisting their parents.
- I don't have notation for referring to groups of agents (agencies).
- This would be needed for generalising to agencies.
- I don't have notation for indicating which generation a particular agent in a given lineage belongs to.
- I might also want to specify the generation length if I ever tried to specify lineages and the position of an agent within a lineage.
- I don't have notation for referring to non-successor children created within a generation.
- Consider some special cases of agents
- Agent who's ancestor is itself.
- Agent who's successor is itself.
- An agent that is both ancestor and successor to itself
- Consider some special cases of lineages:
- A "trivial lineage": only one member
- A vacuous lineage: no members
- Model lineages graphically
- My current idea is to use linked lists for direct successor chains.
- If we consider arbitrary children, we could use trees to model them.
- Generalise to agencies.
- Something somewhat pointing in this direction is the ACA which permits children to assist their parent within a generation.
- We may also want to consider other kind of agencies that didn't share common ancestors.
There'll be a Sequel Right?
Stuff I'd like to cover in sequels to this post:
- Highlight some considerations to be considered when choosing a value for .
- Consider their pros and cons.
- Recommendations for how to pick a value from empirical results.
- Specify desiderata for .
- Defend the desiderata.
- Highlight various approaches to measuring/calculating .
- Consider their pros and cons
- Pick the most compelling approach.
- Formalise a definition of .
- Using the apparatus I have highlighted.
- Specify what I meant by "shape" of the function.
- Specify what I meant by "complexity class" of the function.
- Formulate several theses for RCR in terms of :
- Proportionality thesis
- Some foom scenarios
- Some fizzle scenarios
Even Further Future Directions
Some stuff I might like to do (much) later on. I would like to eventually bridge this theoretical framework to empirical work with neural networks. I'll describe in brief two approaches to do that I'm interested in.
Estimating RCR From ML History
We could try to estimate the nature and/or behaviour of RCR across particular ML architectures by e.g. looking at progress across assorted performance benchmarks (and perhaps the computational resources required to reach each benchmark) and comparing across various architectural and algorithmic lineage(s) for ML models. We'd probably need to compile a comprehensive genealogy of ML architectures and algorithms in pursuit of this approach.
This estimation may be necessary, because we may be unable to measure RCR across an agent's genealogy before it is too late (if e.g. the design of more capable successors is something that agents can only do after crossing the human barrier).
Directly Measuring RCR in the Subhuman to Near Human Ranges
I am not fully convinced in the assumption behind that danger though. There is no complete map/full description of the human brain. No human has the equivalent of their "source code" or "model weights" with which to start designing a successor. It seems plausible that we could equip sufficiently subhuman (generality) agents with detailed descriptions/models of their own architectures, and some inbuilt heuristics/algorithms for how they might vary those designs to come up with new ones. We could select a few of the best candidate designs, train all of them to a similar extent and evaluate (the same computational resources should be expended in both training and inference). We could repeat the experiment iteratively, across many generations of agents.
We could probably extrapolate the lineages pretty far (we might be able to reach the near-human domain without the experiment becoming too risky). Though there's a point in the capability curve at which we would want to stop such experiments. And I wouldn't be surprised if it turned out that the agents could reach superhuman ability in designing successors (able to improve their architectures faster than humans can), without reaching human generality across the full range of cognitive tasks.
(It may be wise not to test those assumptions if we did decide to run such an experiment).
Closing Remarks
Such empirical projects are far beyond the scope of this series (and my current research abilities). However, it's something I might try to attempt in a few years after upskilling some more in AI/ML.
11 comments
Comments sorted by top scores.
comment by Nicholas / Heather Kross (NicholasKross) · 2023-02-04T21:10:21.626Z · LW(p) · GW(p)
For the record, I do think this is something worth mathematically formalizing. Perhaps someday you should come back to this, or restart this, or even "dump" your notes/thinking on this in an unedited form.
Replies from: DragonGod↑ comment by DragonGod · 2023-02-04T23:17:39.604Z · LW(p) · GW(p)
This is a terrible framework/approach to it. Very terrible, I don't often link to this post when I link to alignment stuff I wrote up. I think I was off base. Genealogy/lineage is not the right meta-approach/framework. A lot of premature rigour to it that is now useless.
I now have different intuitions about how to approach it and have some sketches (on my shortform the rough thoughts about formalising optimisation) laying some groundwork for it, but I doubt I'll complete that groundwork anytime soon.
Formalising returns in cognitive reinvestment is not a current research priority for me, but the groundwork does factor through research I see as highly promising for targeting the hard problems of alignment, and once the groundwork is complete, this part would be pretty easy.
It's also important for formalising my thinking/arguments re: takeoff dynamics (which aren't relevant to the hard problems, but are very important for governance/strategy).
Replies from: NicholasKross↑ comment by Nicholas / Heather Kross (NicholasKross) · 2023-02-05T01:17:06.311Z · LW(p) · GW(p)
Good to be improving your thinking on this and targeting the harder subproblems!
comment by DragonGod · 2022-06-05T12:20:28.032Z · LW(p) · GW(p)
It's crushing my motivation to see no engagement with this post.
I'd like to continue posting my thinking on takeoff dynamics here, but it's really demotivating when no one engages with it.
Replies from: gwern, delton137↑ comment by gwern · 2022-06-05T16:27:07.572Z · LW(p) · GW(p)
I don't see anything to engage with here. It's all setup and definitions and throat-clearing so far; of course one could argue with them, but that's true of every formalization of everything, they're always unrealistic and simplifying, that's the point of having them. Perhaps it leads to some interesting conclusion one doesn't want to believe, at which point one could go back and ponder the premises to think about what to reject or learn about the bare formalization itself, but as it is...
Replies from: DragonGod↑ comment by DragonGod · 2022-06-05T23:55:03.570Z · LW(p) · GW(p)
That's fair. I'll update on this for the future.
I do think/hope sequels to this would have more content to engage with.
Thanks for the reply.
P.S: I sent you a follow request on Twitter. My UN is "CineraVerinia".
I would be grateful if you accepted it.
Replies from: gwern↑ comment by delton137 · 2022-06-05T12:56:06.039Z · LW(p) · GW(p)
The thing you are trying to study ("returns on cognitive reinvestment") is probably one of the hardest things in the world to understand scientifically. It requires understanding both the capabilities of specific self-modifying agents and the complexity of the world. It depends what problem you are focusing on too -- the shape of the curve may be very different for chess vs something like curing disease. Why? Because chess I can simulate on a computer, so throwing more compute at it leads to some returns. I can't simulate human biology in a computer - we have to actually have people in labs doing complicated experiments just to understand one tiny bit of human biology.. so having more compute / cognitive power in any given agent isn't necessarily going to speed things along.. you also need a way of manipulating things in labs (either humans or robots doing lots of experiments). Maybe in the future an AI could read massive numbers of scientific papers and synthesize them into new insights, but precisely what sort of "cognitive engine" is required to do that is also very controversial (could GPT-N do it?).
Are you familiar with the debate about Bloom et al and whether ideas are getting harder to find? (https://guzey.com/economics/bloom/ , https://www.cold-takes.com/why-it-matters-if-ideas-get-harder-to-find/). That's relevant to predicting take-off.
The other post I always point people too is this one by Chollet.
I don't necessarily agree with it but I found it stimulating and helpful for understanding some of the complexities here.
So basically, this is a really complex thing.. throwing some definitions and math at it isn't going to be very useful, I'm sorry to say. Throwing math and definitions at stuff is easy. Modeling data by fitting functions is easy. Neither is very useful in terms of actually being able to predict in novel situations (ie extrapolation / generalization), which is what we need to predict AI take-off dynamics. Actually understanding things mechanistically and coming up with explanatory theories that can withstand criticism and repeated experimental tests is very hard. That's why typically people break hard questions/problems down into easier sub-questions/problems.
Replies from: DragonGod↑ comment by DragonGod · 2022-06-05T13:44:11.485Z · LW(p) · GW(p)
So basically, this is a really complex thing.. throwing some definitions and math at it isn't going to be very useful, I'm sorry to say. Throwing math and definitions at stuff is easy. Modeling data by fitting functions is easy. Neither is very useful in terms of actually being able to predict in novel situations (ie extrapolation / generalization), which is what we need to predict AI take-off dynamics.
I disagree. The theoretical framework is a first step to allow us to reason more clearly about the topic. I expect to eventually bridge the gap between the theoretical and the empirical eventually. In fact, I just added some concrete empirical research directions I think could be pursued later on:
Even Further Future Directions
Some stuff I might like to do (much) later on. I would like to eventually bridge this theoretical framework to empirical work with neural networks. I'll describe in brief two approaches to do that I'm interested in.
Estimating RCR From ML History
We could try to estimate the nature and/or behaviour of RCR across particular ML architectures by e.g. looking at progress across assorted performance benchmarks (and perhaps the computational resources [data, flops, parameter size, etc.] required to reach each benchmark) and comparing across various architectural and algorithmic lineage(s) for ML models. We'd probably need to compile a comprehensive genealogy of ML architectures and algorithms in pursuit of this approach.
This estimation may be necessary, because we may be unable to measure RCR across an agent's genealogy before it is too late (if e.g. the design of more capable successors is something that agents can only do after crossing the human barrier).
Directly Measuring RCR in the Subhuman to Near Human Ranges
I am not fully convinced in the assumption behind that danger though. There is no complete map/full description of the human brain. No human has the equivalent of their "source code" or "model weights" with which to start designing a successor. It seems plausible that we could equip sufficiently subhuman (generality) agents with detailed descriptions/models of their own architectures, and some inbuilt heuristics/algorithms for how they might vary those designs to come up with new ones. We could select a few of the best candidate designs, train all of them to a similar extent and evaluate. We could repeat the experiment iteratively, across many generations of agents.
We could probably extrapolate the lineages pretty far (we might be able to reach the near-human domain without the experiment becoming too risky). Though there's a point in the capability curve at which we would want to stop such experiments. And I wouldn't be surprised if it turned out that the agents could reach superhuman ability in designing successors (able to improve their architectures faster than humans can), without reaching human generality across the full range of cognitive tasks.
(It may be wise not to test those assumptions if we did decide to run such an experiment).
Conclusions
Such empirical projects are far beyond the scope of this series (and my current research abilities). However, it's something I might try to attempt in a few years after upskilling some more in AI/ML.
Recall that I called this "a rough draft of the first draft of one part of the nth post of what I hope to one day turn into a proper sequence". There's a lot of surrounding context that I haven't gotten around to writing yet. And I do have a coherent narrative of where this all fits together in my broader project to investigate takeoff dynamics.
The formalisations aren't useless; they serve to refine and sharpen thinking. Making things formal forces you to make explicit some things you'd left implicit.
Replies from: Tom Davidson↑ comment by Tom Davidson · 2022-06-13T03:51:01.118Z · LW(p) · GW(p)
Glad you added these empirical research directions! If I were you I'd prioritize these over the theoretical framework.
Replies from: DragonGod