# The Planning Problem

post by Zachary Robertson (zachary-robertson) · 2019-08-04T18:58:55.186Z · score: 16 (8 votes) · LW · GW · 3 comments## Contents

A First Example A formal statement Examples a) b) General Strategy Redux Outlook None 3 comments

(Originally posted on my blog here. Not sure how mathematical this place is, but it was seamless to crosspost so I'll continue doing that and see what happens!)

It's fair to say that we spend a lot of our lives floating from task to task. Some are work and some are not. Nevertheless, they are tasks. Things that we want to do. Some are complicated are require planning. However, since planning is such an integral step in completing more complicated tasks it seems as though we might want to "meta-plan" or plan our planning. I became fascinated by the concept after I realized it should be possible to frame the whole thing as a sort of optimization problem. In practice, the problem remains complicated by the fact that optimization occurs in a necessarily online fashion. However, any reasonable method of planning is especially well behaved in the formalism I'll present. Moreover, even when the method is faulty due to human bias, as long as it is consistent we still end up being to prove that meta-planning, in a certain sense, is rational.

One particularly recursive aspect of the problem is that to carry out a task to completion we must already have a plan that we execute. The plan could be written in an ‘online’ manner, which is done in improvisation. Still, the line between planning and doing at this point becomes very blurry and quickly leads to a chicken/egg problem. Which came first, knowledge of a plan or how to execute it? I’ll bound the scope of this using a few concrete examples and then we’ll jump off into abstraction to step around the recursion.

# A First Example

For me, the most natural first example would be this article. I don’t want to spend too much time planning out the essay. On the other hand, I don’t want to plan so little that the organization and flow are poor. Usually, I spend a bit too much time in the planning phase so this article is an experiment in trying to do something a bit more optimal. What do I mean by more optimal? The general idea is that you are carrying out some task, such as writing an article and you can spend some amount of time thinking and planning out a strategy to achieve your goal. After that, you carry out the task or write the article. Imagine a task like the traveling salesman. If there are enough cities and enough distance between them most people would agree that having a clear itinerary is going to be more effective than just visiting cities at random. In this example, it's clear that there must be a trade-off given that the general algorithm for planning would be complete!

The intuition seems to be that there is a strategy that can balance the objectives of planning and executing optimally. However, over the years I’ve caught myself making the error of measuring the effectiveness of having a plan or strategy in terms of how much time it saves me in carrying out the task. Yet, I will commonly ignore the time spent on creating a strategy in the first place. For example, while I was writing this article I spent a significant amount of time crafting a uniqueness proof for the optimal strategy I’ll propose in the next section. Was that optimal? I had a picture of the proof done in under an hour. If you’d asked me right after I’d proved the theorem whether or not it was a good use of time I’d say "Yes, but it wasn’t necessary for the article." So it’s possible that deviations from this supposed optimal strategy might be part some other strategy for the optimization of a different quantity. However, the case remains that making the proof was not an optimal use of my time if I claim the purpose was for the completion of the article.

It's worth pointing out that people are notoriously bad at estimating how much time things will take (see planning fallacy). Moreover, we’re poor at estimating the amount of time we spend on tasks with variable time requirements. Thus, when we decide to create a plan we often neglect to estimate the cost of taking such an action and we often fail to measure whether or not the plan was successful. It’s a sort of recursive issue since usually, we are trying to use the plan to estimate the cost of taking another action. In some sense what I’m trying to do to facilitate analysis is contain the infinite-regress to a second-order meta-plan.

# A formal statement

Given that some vague notions are surrounding the ideas of planning, executing, and optimality let's put things on a more formal footing before going into detail. Assume that there is an oracle that could tell you how long a task would take without a dedicated plan. Say that for the task of writing this article it returned seven hours. If I think that I planning could help reduce this time I can take a bet with this oracle. I would place a bet that the time it takes to make a plan plus the time it takes to execute the resulting plan is less than the time returned by oracle. If not, I lose the bet. For example, I might spend 6 hours creating a detailed plan and then spend two hours executing for a total of eight hours. In this case, I lose. Even though I was able to significantly reduce the execution time, I spent way too much time planning to make the strategy effective and ultimately spent more time on the task than if I had just started executing from the start.

Same as the above, but with the level of formality necessary to make a rigorous argument. Let's say you have to complete some task . Suppose also that you have some sort of initial policy that delineates all the steps you need to take to complete . Moreover, you also have an oracle that will produce an estimate of you how long it will take to execute . Naturally, we’ll place a restriction on so that it’s bounded and always equal to or greater than zero. Now, say you can invest -time and plan in the sense that it’s always the case that . Thus, with a slight abuse of notation we can write the result of planning for -time as . Naturally, our goal is to minimize the total amount of time it takes to execute the planning and the resulting policy.

In the case where everything is nice and is convex. We know that it must be the case that . Thus, it should follow that exists and that . The point is that condition for the minima is easy to state and is equivalent to when . Thus, as long as the benefit rate of planning is faster than the rate at which time progresses there will be an optimal planning time .

# Examples

## a)

Let’s consider a semi-realistic example of a planning function. The most obvious requirement to try and model is that we should see diminishing returns on planning as . Thus, a crude model would be to model our planning function so that the reduction in execution time is proportional to the size of the current execution time.

We’re using to model the amount of time a task would take with zero planning and to model the rate of reduction. We can solve for the optimal using our discussion above since is convex and bounded. We end up with,

Thus, the planning to executing ratio ranges from zero to one and depends solely on . For we’ll have which means that planning is an ineffective option.

It's worth noting a connection between this simplistic model and the simplistic model we usually use to make plans. Usually, we make plans by creating lists of sub-tasks or steps that need to be completed. Sometimes, we iterate this process if the sub-tasks are difficult to complete. In an idealized model, we'd model the creation of a task list using a branching process and then claim that the execution time of a task is reduced proportionally to the number of leaves in the resulting tree.

## b)

There is an important quantity here that can be used to make a practical algorithm. From the first example, we showed that the task reduction quantity governs the effectiveness of planning. The requirement that is the same as demanding that the rate of reduction from planning be greater than one.

For many common tasks, it’s immediately obvious whether or not a plan should be constructed. Assume that this decision rule is a statistic of . It follows that we can estimate the reduction in total time by looking at the current plan for execution at each time step. Once this estimate of the total time reaches a minimum we should stop planning and start executing.

The pathological aspect of this algorithm is that there might be a critical threshold that needs to be reached before planning is effective. Consider that we estimate , but the true model for has a varying rate equation. However, even in the worst case, we’d only double the amount of time taken to complete the task. The reasoning is that for planning to be an effective strategy the minimum must be reached sometime before the estimated time for execution of the initial plan. If this assumption is violated our original hypothesis would be as well which means that are no guarantees about when or if a minimum might be reached.

It would seem as though we could layer more complexity onto these simple models to make them more effective. For instance, we can formalize what it means to determine whether or not our guess is correct by using statistical estimators. It also seems as though the convexity requirement on is a bit strong for our purposes since planning is a somewhat complicated and somewhat arbitrary/random affair. Finally, the fact that we can create plans that have sub-plans means that we can recursively call our meta-plan for making plans. We'll come back to this last point in a bit.

# General Strategy

As noted in the previous section, we’ve restricted so severely that we might be concerned it would tell us nothing about whether or not it’s a good idea in general to try and optimize . This is because we have no guarantee that our optima at is better than for . Now, lucky for us most of this has already been studied in the context of optimal persistent policy theory. The theory deals with optimal selection problems where the -th observation costs . Thus, we can use this tool and assume we can construct a sequence of plans and incur unit cost for each plan we create.

Let denote the expected time of planning plus execution for continuing to compress after the observation of . It follows that defines a stopping criteria for if we should expect to do worse unless we stop planning and start executing. In practice, for each we observe/estimate and pay a cost to do so. To accommodate error in estimation we introduce a transition function to map . It’s given as the probability of going to policy starting from and will be written as . All of our policies live in the decision space . The decision to continue planning incurs unit cost. If our was optimal for each we’d have,

**Theorem 3.1.** Suppose that is bounded on . Then equation has
at least one solution that satisfies
.

*Proof.* The trick is to define a sequence of partial solutions using recursion.
Define truncations of as,

This is saying that the expected execution will go down as we continue to refine our . It also follows that there is a limit to how far we can push this refinement process as we must have,

The intuition here is that we can only do as good as the best plan available. It follows from the Lebesgue Convergence Theorem that,

Okay, now we need to show that our solution is unique. The idea of the proof is that if you had more than a single solution, say and , there will have to be regions where one policy will go for continuing and the other won’t. If we consider a region where this means that the advantage of continuing will be greater if you use . On the other hand, over the region where the value of is greater than there will also have to be a maximum difference between the value functions. However, this would mean that the advantage function of will need to be greater than which is a contradiction.

Given the sketch, it sounds like we’ll have to demand that be compact. We’ll show after the proof that it’s possible to relax this condition to an assertion that it’s always possible to transition to a certain set of policies.

**Theorem 3.2.** Let be a compact topological space and for let
for all open sets in . Then there’s at most a
single solution such that is continuous and
.

*Proof.* For a function that measures execution time we define the
advantage as . Think of
this as a measure of the change in expected execution time. If it’s
positive the current state has an advantage over the next expected
state. We define things this way so that if is a solution of
then it follows that . To see this note
that subtracting from the left-hand side of needs to give
zero. Now, imagine we have two solutions and for .
Let,

We’ll show that . Define then for we have and for all ,

From this we conclude that . Since and we conclude that we have to have or else won’t be satisfied. We’re about to run into a problem. We know from the compactness of that that there exists such that,

This is a contradiction so it must be the case that . By symmetry of and we can also eliminate .

**Corollary 3.3.** Let be a measurable space where there exists a non-empty
such that and
for all . Then we have a single solution that is
measurable and bounded.

We can use the same argument as above. However, because we lost compactness we must look at the infimum which will have a value of say . We fix and then we can get a point inside of the set we took an infimum over. Now when we integrate over we can extract and everything will work out.

# Redux

Does this information help illuminate anything? Returning to the article example given at the beginning, it would seem as though I spent too much time planning the mathematical portion of the writing. I'm about done with the article and I see that the mathematical portion grew in scope way beyond what was intended in the original plan. This deviation was significant due to me not realizing that the article would need to be formatted. I think that if I had realized how much time it would take to format the article with late it would've become readily apparent that I should've been more sparing in the amount of math I used. Overall, I spent about 14 hours on this article while my original estimate was about 7. I spent several hours making mathematical proofs when I really should've been reading up on how to format an article with latex correctly. I also think that I could've spent a bit more time proof-reading everything before submitting.

What ultimately happened was that while I created an initial plan for the article, the unexpected addition of mathematical arguments broke all my estimates. I should've realized that formatting latex would take a long time since it was my first attempt. It would seem as though the idea of clean separation between planning and execution is an abstraction. The reality is that we all switch between planning and executing whenever we engage with sufficiently complex tasks. I only planned the structure of the article for about an hour because I was overly optimistic about how long it would take to finish the whole thing. I fell prey to the planning fallacy.

Does this mean this idea failed? Not exactly. Recall the idealized model of planning. When we make plans we oftentimes will create tasks that themselves require further planning. It seems perfectly reasonable that we could simply "call" our meta-plan on these sub-tasks to reduce the effect of human bias. I was overly optimistic in thinking about how long it would take to format the math. This problem could've been heavily mitigated if I had a plan that made the task of making the mathematical section a sub-plan. If I had done this, I would've had a chance to look at the bigger picture and most likely would've proceeded differently.

# Outlook

Ultimately, I think that the idea was a success because it was developed to enough generality to be useful in getting a concept handle on some pretty big questions surrounding the idea of meta-planning. First, what I am currently engaging in is the construction of a plan that when executed can manage the planning of plans. While I showed that for a broad class of tasks meta-planning is rational, for a much broader class of problems it isn't. Not all problems are compressible. This means that oftentimes the most effective plan will be to have no real plan at all. You can't plan for this either. Pursuing this line of reasoning is difficult as you quickly reach the limit of rationality due to the undecidability of the various questions involved. Perhaps this is a reason for less than perfect rationality in humans. Second, we're overly optimistic about how much time our plans will take to execute. However, when we realize that our plans were too optimistic we are more than capable of updating them because we naturally make hierarchical or tree-like plans. Using these three observations I conclude that the current model I've developed is sufficient to fine-tune itself for the plans that any systematic method could hope to be capable of tackling. There is no need to continue worrying about abstract details at the meta-level. I need to practice using what I came up with.

## 3 comments

Comments sorted by top scores.

Specific commentary:

W(x) - V(x) = max(W(x) - V(x)) > 0

I didn't follow this step (locally). Max(a) = a, since max can only return a quantity it has been given, and so it requires 2 inputs to have the option of not returning its first input.

Globally, I didn't follow the math, and probably have a lot more to learn in this space.

More general commentary:

Section 1. (of this comment.)

we naturally make hierarchical or tree-like plans.

Did this plan need to be a tree? Or could it have been done as follows below [1]:

(This is intended to address the 'Math took longer than expected' part)

1) write all of the article except the math (with a placeholder called "Math Section" or "Formalization).

2) The "Math Section" placeholder could be expanded to a description of what needed to be there.

(A very short version might be: "Let’s consider a semi-realistic example of a planning function.")

[4]

3) The (written parts of the) article could be formatted.

4) The Math Section could be written.

5) The math Section could be formatted.

This breaks down 'how long does it take to write this article' into three parts: a) 1-2. b) 3. c) 4-5. While 5 might be treated as its own (large) step, that is distinction is the benefit of hindsight.

Hindsight which offers further variations - formatting the math section could be viewed as a different process from formatting the rest of the article: Learning how to format with Latex.

Section 2. (of this comment.)

In the spirit of "plans are worthless, but planning is everything":

1) Outline article.

a) Write out what each section should contain.

b) Get an idea for what each section should look like. For everything, it means creating sections - a header and a sentence each or something. For math, this additionally means writing one equation in Latex. [5]

Section 3 - Overview

The plan in section 1 was supposed to illustrate that 'how long will it take to write this article' could be broken up into 2 questions 'how long will it take to write the text' and 'how long will it take to do the math'. (And maybe after that, 'how long will it take to write the conclusion'.) If you correctly predicted the length of the first, but not the second, then splitting those seems like a good fix.

The application to plans in general is that when you discover a new sub-action *A*, the original time estimate *T *should be figured as *T(doing everything except A)* instead of *T(doing everything.)* (This isn't quite perfect in the event there are other un-discovered actions - but LaTex seems like a sub-action of A.)

What ultimately happened was that while I created an initial plan for the article, the unexpected addition of mathematical arguments broke all my estimates. I should've realized that formatting latex would take a long time since it was my first attempt.

The plan in section 2 was supposed address if the main issue was LaTex. The more general rule is - do the thing that you've never done/know the least about how long it will take/will give you the most information (/about how long things will take).

The plan in [4] addresses formatting, among other things, by suggesting loops in place of stages. This can be treated as more general planning/executing advice. It's interesting that similar things did occur in the proof:

Thus, we can use this tool and assume we can construct a sequence of plans {πt}t≥0 and incur unit cost for each plan we create.

(for each sounds like a loop.)

To accommodate error in estimation we introduce a transition function to map πt→πt+1.

Though they were about the amount of time it takes to plan, rather than interleaving planning and execution (such as 'plan the next step', execute it, repeat until done).

I thought there'd be math in this section, but it seems I haven't formalized these models that much because I haven't (explicitly) tried out them out experimentally yet.

Overall, I spent about 14 hours on this article while my original estimate was about 7.

I thought that this was Hofstader's Law, but it doesn't seem to include the specifics I thought it did - namely the scale factor of 2, which seems to be applicable here. (Empirical study of this might be interesting.)

[1] I am aware that the end conclusion might rely on the results from the math section. This analysis is starting with a simpler case. (It is also worth noting that an idea might be expressed in multiple parts, such as *n* posts, rather than one post with *n* sections.)

[3] There are multiple ways of handling this - "A tree of what?" is the real question.

[4] This sounds a bit like a recursive process for article/post generation - a series of descriptions expanded in detail step by step, until complete. This does make more sense to handle as a tree [3], in order to enable getting things done within the desired amount of time (or length of post). While "Formatting the post" could be treated as a separate process from "writing the post", if the post is handled as a set of sections in this manner, as described in steps 1) and 2), and this [4], it might be easier to integrate formatting into writing the post.

[5] Some would break this down further, and have "go get a picture of an equation displayed via Latex" here, then have working out how to use it later. A similar process is 'write the smallest/fastest version of the article that you can', then a) 'decide if you want to make it bigger/spend more time on it' b) 'make the smallest/fastest change that's self-contained'. (Such a process might have suggested doing latex sooner because you thought it wouldn't take long.)

W(x) - V(x) = max(W(x) - V(x)) > 0

The maximum is over the domain. I'm not sure how your example is escaping from the hierarchy paradigm. I do consider the idea of having undetermined sub-tasks.

When we make plans we oftentimes will create tasks that themselves require further planning. It seems perfectly reasonable that we could simply "call" our meta-plan on these sub-tasks to reduce the effect of human bias.

You seem concerned about why I choose to characterize the policy by how well it compresses the task. While it was possible to do a sort of 'interleaving' as you suggest from a technical point of view it makes no difference since compression transitions are assumed to be Markov. This translates to an assumption that planning ability depends only on what you currently have planned.

Practically speaking I should assume that the transitions are Markov and depend on both what has been planned and what has been executed. My argument rests on the idea that in expectation there's no difference between the two strategies since what you plan for will in expectation match what happens.

The moment you start trying to build up a more complicated model it becomes clear that you can't simply account for what has been executed in terms of a scalar. Otherwise what I just said is reinforced. In that case, you need to model how tasks are being created, prioritized, and executed. This is difficult to the point of being useless as a tool to understand what I was interested in.

I think we agree that the only way forward is to simply assume that this 'meta'-policy can be invoked recursively. This is hard. Naively I'd hope for sub-task modularity/independence and additivity of the effectiveness 'meta'-policy.

Hopefully, it's clearer why it's impossible to go further without a good model for how tasks are sub-divided. It's all too easy to run into Zeno-like paradoxes where it's either impossible to plan due to compounding sub-task over-head or you can slice-up a task into infinitesimal dust. This is getting too long for a comment. I'll leave it there.

I should have split things up into multiple comments. Most (if not all) of that should be read as "I think this might be useful in practice* for planning/executing, rather than improving the model".

*Advice which if followed would or could have led to a) doing LaTex sooner, b) changed how math was handled or turned out, or making it less unpredictable w.r.t to time estimates, c) formatting the writing sooner.

Hopefully, it's clearer why it's impossible to go further without a good model for how tasks are sub-divided.

I suggested

1) that if writing the math* was a substantial piece which took longer than expected, then you might find it useful to have a personal model which says "Math* will take longer than I expect"/update future expectations based on the result - how things turned out for this post.

2) If you change the way you divide up tasks that might affect outcomes**, if not predictability.

*Or anything else this applies to.

**Such as how long things take, as well as how they turn out.