Why sigmoids are so hard to predict

stuart_armstrong

Why sigmoids are so hard to predict

post by Stuart_Armstrong · 2021-03-18T18:21:51.203Z · LW · GW · 7 comments

  General sigmoids
    Differential equations
    The dampening term
    Different assumptions, different dampening
    Estimating the maximum value
None
7 comments

tl;dr: Predicting the behaviour of a sigmoid-like process is not fitting the parameters of a logistic curve. Instead, it's trying to estimate the strength of the dampening term - a term that might be actually invisible in the initial data.

A logistic curve is a function that starts out growing like an exponential, but then diminishing returns hits in, and the curve plateaus:

The logistic curve has a specific equation ; it has a turning point at $x_{0}$ (where it goes from accelerating growth to decelerating growth) and an (asymptotic) maximum at $L$ .

The broader category of "curves that look like logistic curves" are sigmoids, and they appear all over the place - population growth, virus spreading, and so on.

I've seen a few examples and arguments as to why it's so hard to predict the turning points and maximums of a sigmoid; I wrote one [LW · GW] myself.

But all these arguments, including my own, didn't really satisfy me. The fact is, a logistic curve has three free parameters, so three perfect samples will pin it down entirely. So the arguments then turn on how a little bit of noise can make these parameters very uncertain.

And those argument are correct: small uncertainties in the data do imply very different outcomes. I illustrated this by plotting the standard logistic curve with turning point at $0$ and maximum at $1$ , in blue, versus the exponential function $0.51 e^{0.69 x}$ (which has no turning points and no maximum), in purple:

Then all the arguments and demonstrations are more formal ways of saying "hey, these curves look similar, they're hard to tell apart". If the noise is independent, however, then you will eventually be able to distinguish the two curves if you get enough data points.

General sigmoids

I then started thinking about non-independent noise, and other similar ideas about how these curves would behave in the real world. But that was missing the key question: how do we know that the purple curve is not a sigmoid? It starts off as a pure exponential, but nothing stops me from completing it like this:

The purple curve is not a logistic curve, but it's a perfectly respectable sigmoid. The wikipedia sigmoid article lists a variety of sigmoid functions such as "Logistic functions", "Arctangent functions", "Gudermannian function", "Error functions", "Generalised logistic functions", "Smoothstep functions", and the wonderfully vague but huge category of "Some algebraic functions".

If we want to predict turning points in sigmoid-like functions, we have to do better than trying to fit the best logistic function. We need to ask where the behaviour of the sigmoids derives from.

Differential equations

The differential equation for the logistic curve is:

$f^{'} (x) = k (1 - f (x) / L) f (x) .$

When $f (x)$ is small, this is $f^{'} (x) \approx k f (x)$ . The equation $f^{'} (x) = k f (x)$ has solution $C e^{k x}$ for some constant $C$ .

This describes the initial period of exponential growth. More precisely, if we look in a range on which $1 - f (x)$ doesn't vary much, $f (x)$ will be approximately $C e^{k M x}$ where $M = 1 - f (x) \approx 1$ .

This shows why it's so hard to predict the maximum and turning point of the logistic curve from early data:

If $f (x_{1})$ and $f (x_{2})$ are small compared to $L$ , then, on the interval $[x_{1}, x_{2}]$ , we are trying to detect the logistic curve shifting from the exponential curve of type $C_{1} e^{k M_{1} x}$ to one of type $C_{2} e^{k M_{2} x}$ , with $M_{1} = 1 - f (x_{1}) / L$ and $M_{2} = 1 - f (x_{2}) / L$ .

So what we're trying to do, from the potentially noisy data, is pick up the difference between growing as an exponential of exponent $k M_{1}$ (which is very close to $k$ ) and of exponent $k M_{2}$ (which is also very close to $k$ ). So we're trying to pick up the difference between two exponentials of very similar coefficients, at the very beginning of their ranges.

But, as I said, if there was no noise, we could detect that from three data points. So let's look again at the differential equation.

The dampening term

The $f (x)$ operates as growth term, causing the curve to grow exponentially. This means that the growth something is proportional to its current size. This works well for modelling disease spread in its early phases: the amount of people that will be infected is proportional to the amount of people already infected, as these go about and meet their friends and acquaintances.

The $(1 - f (x) / L)$ acts as a dampening term. When $f (x)$ gets too large, the function can no longer grow as it once did. To go back to the diseases example, suppose that everyone has an equal number of friends, selected randomly from the total population $L$ , and that they meet these friends equally often, in equally dangerous situations. Then if $f (x) / L$ is small, you can consider that every friend they meet will be uninfected, so they can pass on the disease, potentially, to all of them.

But now see what happens if $f (x) / L$ is larger. If $f (x) = L / 2$ (half the population is infected), then $1 - f (x) / L = 1 / 2$ . So if someone infected meets their friends, they will find that only half of them are uninfected. So the growth rate, relative to the infected population, is half of what it was initially. And as $f (x)$ grows, $1 - f (x) / L$ shrinks rapidly, so the growth rate falls fast.

Different assumptions, different dampening

In the description above, I assumed:

suppose that everyone has an equal number of friends, selected randomly from the total population $L$ , and that they meet these friends equally often, in equally dangerous situations.

These assumptions are very unrealistic. People have different numbers of friends; these friends are likely to be friends of each other; people spend different amounts of time with different people, and, during a pandemic, people's behaviour changes - and different people's behaviours change differently. There is a time-changeable network of connections, that cannot be easily reduced to "N infected people are likely to infect kN other people".

So to model a more realistic pandemic - or any sigmoidic process - we want a dampening term that looks more like $1 - H (f (x))$ , where $H$ is some function that is small when $f (x)$ is small, and tends to $1$ as $f (x)$ grows towards $L$ .

I chose a few random possible $H$ s, to see how things would go:

$H (f (x)) = f (x)$ : the standard logistic curve (blue curve, $L = 1$ ).
$H (f (x)) = (f (x) / 0.5)^{2}$ : less initial dampening (purple curve, $L = 0.5$ ).
$H (f (x)) = \sqrt{f (x) / 2}$ : more initial dampening (green curve, $L = 2$ ).
$H (f (x)) = max [0, 3 f (x) - 1]$ : no initial dampening, which suddenly turns on later (orange curve, $L = 2 / 3$ ).
$H (f (x)) = 0$ : no dampening, full exponential (red curve, infinite $L$ ).

For all these $H$ s, I made the assumption that $f (- 5) = f^{'} (- 5) = 1 / 1000$ : the values of $f$ and its derivative were $1 / 1000$ at $x = - 5$ (hence they behave very similarly around that point). Then the plots of these curves are all over the place:

This is the core reason why the turning point and the maximums are so hard to predict from early data: we're not only trying to figure out the parameters of a logistic curve, but the functional form of the dampening function - a dampening function whose effect is insignificant in the early data.

In some cases, the dampening is impossible to distinguish in the initial data - for instance the no-initial-dampening versus the no dampening at all. If the initial dampening and the later dampening are due to different factors, then early data tells us nothing about the maximum and turning point of the curve.

For example, the technology adoption life cycle divides users into the categories of "innovators", "early adopters", "early majority", "late majority" and "laggards":

The turning point is the transition from early to late majority; the maximum value is the sum of all the categories (including laggards).

Early data, though, comes entirely from innovators and early adaptors. If these populations are actually distinct in how they behave and how innovations spread among them, then the early data will tell us nothing about the turning point or the maximum value.

Estimating the maximum value

Sometimes it is feasible to infer $L$ , the maximum value. To do that, we don't need to figure out what $H$ is; we just need to figure out what would cause the dampening term $1 - H (f (x))$ to fall towards $0$ .

A disease cannot infect more people than the total population, so we know that $L$ is a less than or equal to the total population $Ω$ . For diseases, $L$ can be inferred from the $R_{0}$ , the number of people each infected person infects. In ideal situations, this would give 'herd immunity' once this many people are infected:

$L = Ω (1 - \frac{1}{R_{0}}) .$

That formula assumes a homogeneous population that mixes randomly with each other, but you can get improved $L$ estimates by looking at similar diseases, estimating the network of spread, and so on.

So the maximum value of sigmoid can often be inferred based on 'meta' considerations, even if the turning point is extremely hard to predict.

If we replot all the above sigmoids with the same value for $L$ (in this instance, $L = 1$ ), we get the following graph^[1] (again, setting $f (- 5) = f^{'} (- 5) = 1 / 1000$ for all curves):

It seems that the best approach when $L$ is known is to first plot a pure exponential (in dotted red in that figure), then try and guess by how much the turning point will lag the pure exponential - maybe by making informed guesses as to the form of the dampening term $1 - H (f (x))$ .

Respectively setting $H (f (x)) = f (x)$ (blue), $H (f (x)) = f (x)^{2}$ (purple), $H (f (x)) = \sqrt{f (x)}$ (green), and $H (f (x)) = max [0, 2 f (x) - 1]$ (orange). ↩︎

7 comments

Comments sorted by top scores.

comment by gwern · 2021-09-18T16:04:32.510Z · LW(p) · GW(p)

Paper version: https://arxiv.org/abs/2109.08065

comment by supposedlyfun · 2021-03-19T07:58:41.738Z · LW(p) · GW(p)

The graphs here really helped me to understand the points you were making.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-03-19T11:23:55.348Z · LW(p) · GW(p)

Thanks for this!

It seems like some trends (GDP, for example) are on a superexponential trajectory. I'd be very interested to see this analysis done for that stuff too. Do you think the qualitative results would be the same -- it's hard to tell, best strategy is to try to guess the form of the dampening term?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2021-03-19T12:00:22.603Z · LW(p) · GW(p)

Possibly. What would be the equivalent of a dampening term for a superexponential? A further growth term?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-03-19T13:31:18.334Z · LW(p) · GW(p)

I don't know, that's one of the things I'm interested in. I guess the situation is something like: There are a bunch of positive feedback loops and a bunch of negative feedback loops. For most of human history, the positives have outweighed the negatives, and the result has been a more or less steady straight line on a log-log plot. [LW · GW]Though the slope of the line changes from period to period, presumably because at some times the positive feedback loops are a lot stronger than the negative and at other times only a little.

We know that eventually growth will be limited by the lightspeed expansion of a sphere. Before that, growth might be limited to e.g. a one-month doubling time because that's about as fast as grass can reproduce, or maybe a one-hour doubling time because that's about as fast as microorganisms can reproduce? Idk. Maybe nanotech could double even faster than that.

The question is whether there's any way to look at our history so far, our trajectory, and say "Aha! We seem to be past the inflection point!" or something like that. By analogy to the exponentials case you've laid out, my guess is the answer is "no," but I'm hopeful.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2021-03-19T15:55:12.821Z · LW(p) · GW(p)

"Aha! We seem to be past the inflection point!"

It's generally possible to see where the inflection point is, when we're past it.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-03-19T16:13:42.229Z · LW(p) · GW(p)

Ah, right, of course. Well, what about when the trend is noisy though? With periods of slower and faster growth?

What about "Aha! We are clearly nowhere near the inflection point!"?

Why sigmoids are so hard to predict

Contents

General sigmoids

Differential equations

The dampening term

Different assumptions, different dampening

Estimating the maximum value

7 comments