The Generalized Product Rule

aysajan

The Generalized Product Rule

post by aysajan · 2021-06-08T16:39:05.029Z · LW · GW · 7 comments

  Summary
None
7 comments

Imagine we have a company with investment projects A, B, C,....For instance, A might be a new high-speed Internet service, B might be a new advanced computer, C might be a new inventory management software, etc. We are interested in calculating the total return from these investments at the company. This calculation could be fairly complicated since returns are context-dependent - e.g., new computer B might have higher return in the context of new Internet service A than it would without the new Internet service. But let’s assume that the returns satisfy a few reasonable properties.

The total return can be calculated from the return of each individual project given projects before it - e.g., the return of Internet service alone, the computer given the Internet service, the software given the computer and internet, etc.
If the return of one project increases (given projects before it) while everything else stays the same, then the total return increases. For instance, if the Internet service gets cheaper, then the return of project A should increase with everything else the same. As a result, the overall return should increase.
We can group projects into subprojects without changing the overall return. For instance, we could think of the Internet service and computer as a single project, or we could think of the computer and software as a single project, and either way the total return should stay the same.

Surprisingly, given just these three properties, we can conclude that returns obey a “product rule” similar to the product rule in probability theory.

where $w$ is some transformation of returns (e.g., it could be log-return, return-squared, etc.)

This is essentially the first step in Cox’s Theorem, a theorem used (most notably by Jaynes) to ground the logicalist interpretation of probability. But as this post will illustrate, core ideas of Cox’s Theorem apply to many real-world systems which we don’t usually think of as “probability theory”.

Let’s unpack those assumptions a bit more for our investment return example by defining explicit variables on projects and returns. The three key properties are:

Return $R (A, B)$ of A and B together can be computed from the return $R (A)$ of A alone and the return $R (B | A)$ of B given A is done. For instance, the return on new high-speed Internet service and new computer together can be calculated from the return on the Internet service alone and the return on the computer given the Internet. Formally, $R (A, B) = F [R (A), R (B | A)]$ for some function $F$ .
If the return $R (A)$ goes up without changing $R (B | A)$ , then the total return $R (A, B)$ of A and B together increases, and the same conclusion holds for $R (B | A)$ increasing with $R (A)$ unchanged. For instance, if the cost of high-speed Internet service goes down, then the return $R (A)$ presumably increases without changing the return $R (B | A)$ of the new computer given the Internet service, and this should increase the overall return $R (A, B)$ . Formally, $F$ is increasing in both arguments.
We can group projects A and B, or B and C into subprojects without changing the overall return $R (A, B, C)$ of all three. For instance, if we want to compute total return $R (A, B, C)$ on new Internet service, computer and software, we could group together the internet and computer as one hardware-and-network project, then compute $R (A, B, C)$ from $R (A, B)$ (hardware-and-network return) along with $R (C | A, B)$ (return on the software given the hardware-and-network). Alternatively, we could instead group the computer and software as one hardware-and-software project with return $R (B, C | A)$ , and we should still get the same answer for the return of all three projects together. Formally, $R (A, B, C) = F [R (A, B), R (C | A, B)] = F [R (A), R (B, C | A)]$ .

The third rule implies that $F$ is associative. The key idea we derive here is that all one-dimensional, increasing and associative functions are either multiplication or some transformation of multiplication (e.g., addition/subtraction is log-transformation of multiplication).

Thus we get a product rule:

w [R (A, B)] = w [R (A)] w [R (B | A)]

where $w$ is some transformation (reversible) of $R$ .

More generally, to derive the product rule, we need some objects of interest like A, B, C,..., which serves as input. We also need some kind of real-valued measurement $R$ of those objects. Then the core requirements for the product rule are:

$R (A, B)$ is a function of $R (A)$ and $R (B | A)$ :

R (A, B) = F [R (A), R (B | A)]

for some $F$ .

F is increasing with respect to both arguments:

R (A^{'}) > R (A),

R (B | A^{'}) = R (B | A),

then

R (A^{'}, B) > R (A, B) .

Or, alternatively, if

R (B^{'} | A) > R (B | A),

R(A)=R(A),

then

R (A, B^{'}) > R (A, B) .

We can group objects together without changing the value of measurement $R$ :

$R (A, B, C) = F [R (A, B), R (C | A, B)] = F [R (A), R (B, C | A)]$

(Note that for the last assumption, we allow systems in which objects need to be kept in the same order - i.e., A before B before C. This is actually more general than the requirement for the product rule in probability theory, in which the objects are boolean logic variables, so “A and B” = “B and A”. If reordering is allowed, then our generalized-product-rule becomes generalized-Bayes-rule.)

The third assumption implies that $F$ is associative. The second implies that it’s increasing. The first implies that it’s one-dimensional. So, we get the generalized-product-rule.

What does this look like in the context of other real-world systems?

Example 1: Suppose I have an investment portfolio with stock A and bond B, and I want to calculate the standard deviation of portfolio return $R (A, B)$ as a proxy for risk measurement. This calculation is not trivial due to potential correlation of returns between stocks and bonds. For instance, the risk (measured in standard deviation) of investing in stocks alone is usually higher than the risk of investing in a portfolio with stocks and bonds. Let’s assume the risks exhibit three properties:

Portfolio risk $R (A, B)$ of stock A and bond B can be calculated from risk $R (A)$ of stock alone and incremental risk $R (B | A)$ (positive or negative) of adding bond B given stock A already in the portfolio.
If the risk $R (A)$ of the stock rises without changing the incremental risk $R (B | A)$ , then the portfolio risk $R (A, B)$ rises.
Let’s consider adding another asset C, an 8-week T-Bills (a type of cash-equivalents) to the investment portfolio. If we’re computing new portfolio risk of stock A, bond B, and T-Bills C, then we could group stock and bond together as one sub-portfolio, and compute $R (A, B, C)$ from $R (A, B)$ (non-cash-asset) along with $R (C | A, B)$ (incremental risk of adding T-Bills given the non-cash-asset). Alternatively, we could instead group the bond and T-Bills into one portfolio (non-equity-asset) with risk $R (B, C | A)$ , and we still get the same risk for all three assets together.

As a result, we can apply the product rule to investment risks:

w [R (A, B)] = w [R (A)] w [R (B | A)],

where $w$ is some transformation of incremental risk (e.g., exponentiated assuming that those incremental risks add).

Example 2: Let’s look at a different system in which we’re interested in calculating the contribution in points made by basketball players A, B, C,... in a game relative to total points made by the team. For instance, $R (A)$ could be 30%, meaning Stephen Curry contributes 30% of the total team points in a game, $R (B)$ could be 25%, meaning Klay Thompson contributes 25% of the total points made, etc. Again, we assume three properties:

The points contribution $R (A, B)$ of Stephen Curry and Klay Thompson together on the court can be calculated from the contribution $R (A)$ of Stephen Curry alone and incremental contribution $R (B | A)$ of Klay Thompson given Stephen Curry on the court.
If the contribution $R (A)$ of Stephen Curry increases without changing the incremental contribution $R (B | A)$ , then the overall contribution $R (A, B)$ to the team increases as well.
We can group Stephen Curry and Klay Thomspon together as one “splash” player, and compute $R (A, B, C)$ from $R (A, B)$ (contribution of the “splash”) along with $R (C | A, B)$ (incremental contribution of Draymond Green given the “splash”). Alternatively, we could group Klay Thomson and Draymond Green as one big-man player with the contribution $R (B, C | A)$ , and we will get the same total contribution for all three players together on the court.

Thus, we can have the product rule applied to basketball players’ shooting percentage:

w [R (A, B)] = w [R (A)] w [R (B | A)]

where $w$ is some transformation of player contribution.

Example 3: Let’s consider a modified version of the classic traveling salesman problem in theoretical computer science and operations research. We’re interested in finding the shortest travel time from an origin to cities A, B, C, …. Presumably the shortest travel time satisfy three assumptions:

The shortest time $R (A, B)$ of visiting cities A and B exactly once from origin can be computed from $R (A)$ of visiting city A and added time $R (B | A)$ of visiting city B given we already visited city A.
If the shortest time $R (A)$ of visiting city A increases without changing the additional travel time $R (B | A)$ , then the total traveling time $R (A, B)$ of visiting both city A and B increases.
Let's add another city C to our travel plan. We can group city A and B together as one region, and compute $R (A, B, C)$ of shortest travel time to visit A, B, and C exactly once from $R (A, B)$ of visiting the region with cities A and B along with $R (C | A, B)$ (additional time it adds to visit city C along with city A and B to the total trip time). We could also instead group city B and C together with shortest travel time $R (B, C | A)$ , and we will get the same answer for visiting every city exactly once in our trip.

With these three assumption above, we could apply the generalized product rule to the shortest travel time problem:

w [R (A, B)] = w [R (A)] w [R (B | A)],

where $w$ is some transformation (reversible) of shortest travel time (e.g., exponentiated shortest travel time).

Summary

The product rule in probability, $p (A B) = p (A) p (B | A)$ , states that the probability p(AB) of both A and B are true can be calculated by using probability $p (A)$ of A being true alone and probability $p (B | A)$ of B being true given A is true. The conditions of the product rule suggest possible avenues to extend the traditional product rule to deal with things that are not restricted to logical boolean type. In particular, this post suggests continuing to use the product rule to represent real-valued measurements of objects A, B, C,... that satisfy a few fairly reasonable properties and proposes a generalized form of the product rule $w [R (A, B)] = w [R (A)] w [R (B | A)]$ . $R$ is some kind of real-number measurement and $w$ is some transformation of $R$ . For instance, in the company investment project example we have $w [R (A, B)] = w [R (A)] w [R (B | A)]$ where $R$ represents the project return and $w$ can be log return.

7 comments

Comments sorted by top scores.

comment by JenniferRM · 2021-06-10T05:06:30.746Z · LW(p) · GW(p)

Kant thought that space being Euclidean was a priori logically necessary, hence determinable from pure thought, hence true without need for empirical fact checking... and in the end this turned out to be wrong. Einstein had the last laugh (so far).

I have wondered now and again whether it might be that Cox's Postulates are similar to Euclid's Postulates and might have similar subtle exceptional discrepancies with physical reality in practice.

It is hard to form hypotheses here, partly for a lack of vivid theoretical alternatives. I know of two claims floating around in the literature that hint at substantive alternatives to Bayes.

One approach involves abandoning at least one of Aristotle's three laws of thought (excluded middle, non-contradiction, and identity) and postulating, essentially, that reality itself might be ontologically ambiguous. If I had to pick one to drop, I think I'd drop excluded middle. Probably? Constructionist/intuitionist logic throws that one out often, and automated proof systems often leave it out by default. Under the keywords "fuzzy logic" there were attacks on these laws that directly reference Jaynes. So this is maybe one way to find a crack in the universe out of which we might wiggle.

The only other approach I know of in the literature is (for me) centrally based on later chapters in Scott Aaronson's "Quantum Computing Since Democritus" (try clicking the link and then do ^f bayes) where, via hints and aspersions, Aaronson suggests that quantum mechanics can be thought of as Bayesian... except with complex numbers for the probabilities, and thus (maybe?) Bayesianism is essentially a potentially empirically false religion? Aaronson doesn't just say this directly and at length. And his mere hints would be the place I left this summary... except that while hunting for evidence I ran across a link to what might be a larger and more direct attack on the physical reality of Bayesianism? (Looking at it: using axioms no less! With "the fifth axiom" having variations, just like Euclid?!)

So that arxiv paper by Lucien Hardy (that I missed earlier! (that was written in 2008?!?)) might just have risen to the top of my philosophy reading stack? Neat! <3

Maybe it is worth adding a third approach that I don't think really counts... When the number of variables in a belief net goes up, the difficulty of simply performing mere inference becomes very hard to compute, with relatively general assumptions the algorithms ending up in NP-hard. This "doesn't count as a real deep philosophically satisfying alternative to Bayes" for me because it seems like the practical upshot would just be that we need more CPU, and more causal isolation for the systems we care about (so their operation is more tractable to reason). Like... the practical impossibility of applying Bayes in general to large systems would almost help FIGHT the the other "possible true/deep alternatives" to Bayes, because it creates an alternative explanation for any subjective experience of sorta feeling like you had probabilities figured out, and then your probabilities came out very wrong. Like: maybe there were too many variables, and the NP-hardness just caught up with you? Would you really need to question the "laws of thought" themselves to justify your feeling of having been been in the physical world and then ended up "surprisingly surprised"? Seriously? Seriously?

Anyway. I was wondering if you, having recently looked at the pillars of pure thinking themselves, had thoughts about any cracks, or perhaps any even deeper foundations, that they might have :-)

Replies from: paragonal, aysajan

↑ comment by paragonal · 2021-06-11T18:33:05.475Z · LW(p) · GW(p)

You might be also be interested in "General Bayesian Theories and the Emergence of the Exclusivity Principle" by Chiribella et al. which claims that quantum theory is the most general theory which satisfies Bayesian consistency conditions.

By now, there are actually quite a few attempts to reconstruct quantum theory from more "reasonable" axioms besides Hardy's. You can track the refrences in the paper above to find some more of them.

↑ comment by aysajan · 2021-06-11T01:25:42.272Z · LW(p) · GW(p)

Thank you for your well-thought comment. One of the desiderata used to derive the original product rule is to use real numbers to represent the degrees of plausibility. So, it will be very interesting to see if the result still holds if we relax it to be a complex numbers.

comment by Jsevillamol · 2021-07-02T11:52:37.015Z · LW(p) · GW(p)

I really like this article.

It has helped me appreciate how product rules (or additivity, if we apply a log transform) arises in many contexts. One thing I hadn't appreciated when studying Cox theorem is that you do not need to respect "commutativity" to get a product rule (though obviously this restricts how you can group information). This was made very clear to me in example 3.

One thing that confused me in the first reading was that I misunderstood you as referring to the third requirement as associativity of . Rereading this is not the case; you just say that the third requirement implies that F is associative. But I wish you had spelled out the implication, ie saying that $F (F (R (A), R (B | A)), R (C | A, B)) = F (R (A), F (R (B | A), R (C | A, B)))$ .

comment by Pattern · 2021-06-11T01:17:42.221Z · LW(p) · GW(p)

Is this just 'expected value follows some of the same rules as probability' or is there more to it?

Replies from: aysajan

↑ comment by aysajan · 2021-06-11T01:27:11.525Z · LW(p) · GW(p)

It can be any real-valued measurement of objects, as long as we can reasonably assume the three assumptions are satisfied.

comment by M. Y. Zuo · 2021-06-16T17:16:42.684Z · LW(p) · GW(p)

“We can group projects into subprojects without changing the overall return”

What if this were not true? Would that make the problem intractable?

The Generalized Product Rule

Contents

Summary

7 comments