Crunch Dynamics and Control Losspost by marc/er · 2023-05-24T13:03:04.583Z · LW · GW · 2 comments
TLDR Introduction Crunch Plans AGI is a Bad Deadline for Crunch Time The End. None 2 comments
- Crunch time is probably not a definitive slice of time but rather is gradual
- A ‘Crunch Plan’ might be a useful thing to include in alignment-related theories of change going forward
- AGI is a bad deadline for crunch time, and should be replaced with control loss, which also could be gradual
- Crunchiness of time and control loss are both very difficult to estimate, and while they should be modeled; this might be best done in a highly abstract way
- Tool development vs direct research might be best considered as a function of the crunchiness of time
The notion of ‘crunch time’ permeates the design of many alignment agendas, however it is often spoken of as though there is a discernible block of time that is either “crunchy” or otherwise. It is unlikely that this state is binary, and it seems far more probable that time will get gradually crunchier as we approach control loss. Beyond just the notion that when in crunch time we should all “put our heads down” and get to work, leveraging the increased efficiency granted to us by the useful-but-not-existentially-dangerous artificial intelligence tools available to us at the time; there seem to be other unconsidered factors.
This post [LW · GW] makes the distinction between ‘crunch time’ and ‘final crunch time’, but again implies that there is some line in the sand separating the two by making the distinction in the first place. I disagree with this framing. Here is a graph detailing a naive model of crunchiness vs time that I generally agree with:
Crunchiness of course probably doesn’t look like a smooth exponential, but this should communicate the point.
Due in part to the fact that the future is challenging to forecast, and that as a result timelines vary greatly between even highly specialized individuals; we will likely never develop a quantifiable metric for crunchiness. Even if we did have a globally unified timeline etched into the thinking of all humans; risk estimates I assume would still range from 0-100%. This could stem from difference of opinion [LW · GW] regarding takeoff [LW · GW], or even factors as minor as having an inherently higher risk tolerance.
Jacques Thibodeau [LW · GW] speaks here about the value of using the time we have currently to prepare for crunch time by developing tools to accelerate future alignment work. This begs the question: “How do you relate crunchiness to your work ethic?”. Let’s say we are here on our graph from earlier in terms of crunchiness:
We could do something like compute the fraction of the complete crunchiness scale that is currently exhausted, like this:
And then allocate a portion of researchers equivalent to the green fraction of crunch-space to tool development, and a portion equivalent to the red fraction to direct research. Using this model, as research time becomes more valuable and a wider array of tools are available, you would allocate more resources to taking advantage of these tools as opposed to producing new ones. This seems intuitive, but is highly unlikely to be the most optimal form of resource allocation. This is because it might be better to use the added value of “augmented humans” (not necessarily in the BCI-sense, just via the introduction of powerful cognitive technologies into the alignment workflow [LW · GW]) to produce tools more efficiently than it would be to apply them to direct research whilst having a smaller fraction of research prowess allocated toward direct alignment research. Also, how do you even quantify research application anyway? ‘Number of people’ is obviously a lacking proxy, as not all people are capable of outputting the same quality and quantity of research, and quantifying research quality in alignment is really hard irrespective of that.
There are an infinite number of potential configurations of resource distribution, and there is no reason for them to contain just this split between tool development and direct research. Also, as an independent researcher, you do not have access to a giant pool of people with a diverse skill set to assign to various tasks. If your productivity is multiplied by say 1.5x at the point marked on the graph above, but you are 3x better at direct research than you are at tool development, you would divide your labor differently to how an organization of people with access to skills across a broader domain would.
Another point to consider is that there may be a ceiling on tool capacity relative to AI capabilities. Really, creating robust models of what to do in a very complicated and uncertain situation relative to something you cannot know is likely better done in a still rigorous yet more abstract way. To provide a concrete example; relative to today where we have powerful language models and accompanying ecosystems (e.g. TaskMatrix.AI or plugins for OpenAIs GPT models), the scope for potential alignment-assisting tools would have been much narrower.
Different teams/people will have different skill spreads, but that doesn’t change the fact that a ‘Crunch Plan’ could be invaluable and is something that should probably be included in alignment-focused theories of change.
A Crunch Plan would describe:
- An organization/individual's current views on crunchiness, as derived from whatever metrics they used to arrive at that conclusion (although some of this information could be potentially exfohazardous [LW · GW], so handle with care)
- How said organization/individual intends to direct their research in accordance with changes in this estimation
AGI is a Bad Deadline for Crunch Time
Setting aside the fact that AGI is like the seventh most ambiguous term ever devised (because my god has that dead horse been beat), it is important to acknowledge that the point at which crunch time ends (and alignment research as a whole) is not at AGI deployment, it’s at control loss. Control loss could occur during, before or after the first deployment of general intelligence, and just like crunch time, in slow takeoff worlds at least; control loss is probably gradual.
This has an interesting shared dynamic with crunchiness, as dramatic increases in control loss could offset increases in alignment efficiency or vice versa. Both control loss and crunchiness alone are really complex phenomena, and I doubt I would conjure up anything that useful by rambling about their relationship beyond this. Despite that, I will say that I think that a graph of degree of control would look very weird. Whilst we may be closer time-wise to theoretic control loss than we were ten years ago, have we really any less control (on an existential level) due to artificial intelligence systems today than we did any period of time ago? The control loss gradient could be steep [LW · GW], but I would not expect it to have started yet. It's difficult to say what the pareto-frontier might look like in this context, but it seems intuitive to me that there will be multiple close-to-optimal points in time during which our capacity to make impactful alignment progress is at its zenith due to the counter-play between control loss and crunchiness.
Comments sorted by top scores.
comment by simon · 2023-05-24T15:55:15.022Z · LW(p) · GW(p)
I think you are talking about crunchiness and and crunch time from the perspective of humanity as a whole, rather than an individual AI project, but the post could be more clear on this point, particularly since the actual "Crunch Plans" are for specific actual teams.Replies from: marc/er