# An optimal stopping paradox

post by Yuxi_Liu · 2019-11-12T08:51:03.199Z · score: 11 (6 votes) · LW · GW · 10 commentsConsider an optimal stopping problem: a company at each time step grows by some constant, and has a certain probability of shutting down. You decide when to sell the company.

Since the math is cleaner in continuous time, we consider the continuous time. Then the company has a linearly increasing value βt, and an exponentially decaying survival curve e^(-αt).

Another framing of the paradox: Schrodinger wants to make a new record for the longest surviving cat, so he put a cat in the box with an atom that might decay and kill the cat, and waits. When should he open the box?

Since at each moment in time, you face the exact same problem (linearly increasing reward, α-exponentially decaying survival rate), if you decide to wait at t=0, you would decide to wait forever, and thus receive no reward.

There are several possible replies to this paradox, none of which is satisfactory to me:

- "This looks like St. Petersburg Paradox.". No, because at time t=0, the expectation is β/α^2. In fact, the payoff can grow faster than βt, such as like t^3, and it would still have finite expectation.
- Claim that expectation maximization decision theory is flawed. This doesn't stop the procrastination. As long as your decision is purely based on the future, and your rational decision process is constant in time, you either immediately sell the company or never sell the company.
- Try some kind of discounting, like exponential discounting. This doesn't stop the procrastination., since at any time, selling the company gives you 0 extra expected reward, and waiting gives you
*some*positive extra expected reward, no matter how much you discount the future. - Claim that there should be a finite lifetime. You can't wait forever. If there is a finite lifetime, then the same decision analysis would tell you to procrastinate until the very end. This effectively is procrastinating forever. It does not converge to a reasonable finite waiting time as your lifetime goes to infinity.
- Claim that one should stick to past decisions even when they don't make sense from a purely future-looking decision theory. Such decision theory seems to be just sweeping time-inconsistency under the rug, and I'm sure would suffer from serious paradoxes of their own.
- Claim that there is no paradox, and procrastination is really the rational action. I'd not claim a strategy that guarantees 0 reward to be rational.

Option 5 seems at least to have some meaning to it. Sticking to it would mean that, for example, one would at t=0 decide to choose T to maximize βT e^(-αT), then at t=T really sell the company, even though it's irrational, conditional on the company still alive at t=T.

## 10 comments

Comments sorted by top scores.

You need an exponentially increasing reward for your argument to go through. In particular, this doesn't prove enough:

Since at each moment in time, you face the exact same problem (linearly increasing reward, α-exponentially decaying survival rate)

The problem isn't exactly the same, because the ratio of (linear) growth rate to current value is decreasing over time. At some point, the value equals (is the right expression, I think?), and your marginal value of waiting is 0 (and decreasing), and you sell.

If the ratio of growth rate to current value is constant over time, *then* you're in the same position at each step, but then it's either the St. Petersburg paradox or worthless.

Let's change the problem a bit: assume you're starting with nonzero capital c, so the formula becomes (bt+c)e^(-at). If c>b/a, the derivative of that formula at t=0 is negative, so you need to stop immediately. That shows the decision to stop doesn't depend only on a and b, but also on current capital. So basically "at each moment in time you face the exact same problem" is wrong. The naive solution is the right one: you should stop when c=b/a, which means t=1/a in the original problem.

TLDR: The paradox goes away if you make price endogenous, i.e., it only occurs because your assumption about the value growth over time that is inconsistent with the profit flows.

The paradox stems from the fact that you've made inconsistent assumptions: that the value of the company increases linearly over time, and that the company never generates a flow of profits (i.e., the only value comes from the sale). If profits are zero, the equilibrium price is constant at zero, and investors are indifferent between holding the company and selling it at any point in time. More generally, if the company has some potential for profits (which can be modeled as a flow of profits per unit of time, or as a hazard rate of getting an instantaneous lump sum of profits), the equilibrium price will be set so that the marginal investor is indifferent between holding and selling.

I have a tongue-in-check resolution to the Schrodinger cat variant: if his goal is to set a new world record, he should open the box immediately after the old world record. More seriously, to resolve the paradox, you need to be more explicit about his utility function: how does the value he obtains increase with the amount by which he exceeds the old record? Depending on your choice of utility function, you may or may not have a paradox, and it may or may not be equivalent to the St. Petersburg paradox.

Differentiating the expected reward over time.

So the best time to sell is when .

if you have already waited time then the reward becomes The stopping time becomes

With a solution at . Nothing wierd is going on here, a plot of expected value vs sell time looks like this.

Suppose the exponential decay term was 1 day. After 1 second, waiting another second makes sense, it will double your value and the chance of a fail is tiny. After a week, you already have a large pot of value that you are risking. It is no longer worth waiting.

looks like this.

That link doesn't work.

Fixed.

Claim that there should be a finite lifetime. You can't wait forever. If there is a finite lifetime, then the same decision analysis would tell you to procrastinate until the very end. This effectively is procrastinating forever. It does not converge to a reasonable finite waiting time as your lifetime goes to infinity.

If I am a quasi-immortal who will live millions or billions of years, with, apparently, zero discount rates, no risk, and nothing else I am allowed to invest in (no opportunity cost), why shouldn't I make investment decisions which take millions of years to mature (with astronomical loads of utility at the end as a payoff for my patience), and plan over periods that short-lived impatient mayflies like yourself can scarcely comprehend?

If the growth is exponential, I still don't think there's a paradox - sure, you're incentivized to wait forever, but I'm already incentivized to wait forever with my real life investments. The only thing that stops me from real life investing my money forever is that sometimes I have things (not included in the toy problem) that I really want to buy with that money.

Reminds me of the thought experiment where you’re in hell and there’s a button that will either condemn you permanently, or, with probability increasing over time, will allow you to escape. Since permanent hell is infinitely bad, any decreased chance of that is infinitely good, so you either wait forever or make an arbitrary unjustifiable decision.

- Claim that expectation maximization decision theory is flawed. This doesn't stop the procrastination. As long as your decision is purely based on the future, and your rational decision process is constant in time, you either immediately sell the company or never sell the company.

I don't need to maximize the expected value of anything where I know I can get at least what I want. If I precommit to sell at $X or when the risk of failure in the next year goes above P%, that doesn't mean the actor that sells at $X+1 "wins": if we both got what we wanted, we *both win*. Likewise, Schrodinger doesn't need to set the best possible record for cat survival; he just needs to keep one alive for the duration of the previous record +1 interval.