Non-Monotonic Infra-Bayesian Physicalism

post by Marcus Ogren · 2025-04-02T12:14:19.783Z · LW · GW · 0 comments

Contents

  Why Physicalism?
  Γ and 2Γ: Formalizing Instantiation
  Definitions
  Bridge Transforms and Monotonicity
  Properties of computationalist hypotheses
  Further reading and future research
None
No comments

Infra-Bayesian physicalism (IBP) is a mathematical formalization of computationalist metaphysics: the view that whether something exists is a matter of whether a particular computation is instantiated anywhere in the universe in any way. In this post, we cover the basics of IBP, both rigorously and informally, its relevance to agent foundations and the alignment problem, and present new definitions that remove the biggest limitation in the previous formulation of IBP: the monotonicity requirement. Having read the original post on IBP [LW · GW] is not required to understand this post, though prior familiarity with infra-Bayesianism is useful for understanding the technical parts.

Why Physicalism?

A physicalist agent - that is, an agent that uses IBP - does not regard itself as inherently special. This is in contrast to a Cartesian agent. A precise definition of an IBP agent is highly technical and can be found in the original IBP post, but here are some ways in which physicalist agents and Cartesian agents differ:

IBP naturally solves many problems in agent foundations:

While it's important to solve these problems for their own sake, perhaps a bigger point is this: Based on priors, it seems unlikely that an arbitrary solution to the problem of privilege would also address all these other problems incidentally. The fact that IBP does address these problems[4] is evidence that IBP is getting at some "deeper truth" in agent foundations, and (more speculatively) that agentic behaviors arising in AI systems (LLMs or otherwise) may eventually converge to behaving similarly to physicalist agents. Moreover, IBP provides the necessary framework for Physicalist Superimitation [LW · GW], a proposed solution to the alignment problem.

 and : Formalizing Instantiation

A physicalist agent cares about which computations are instantiated in the universe. This begs the question: What, exactly, does it mean for a computation to be instantiated? In this section, we provide a rigorous formalism that also allows for possibilities beyond the binary of "instantiated" and "not instantiated".

We denote the space of "computational universes" by . We think of an element of  as specifying the result of every computation.[5] For example:

However, we don't need to worry about all computational facts, so we take  to be a finite set.[6] Mathematically, that's all there is to it:  is a finite set whose elements encode conceivable results for all the computations we care about. (For most elements of , some of these encoded results will be incorrect.) In toy examples (such as the ones in the post on IBP and quantum mechanics [LW · GW]),  may consist solely of how the source code of the physicalist agent in question responds to various inputs.

Now we consider the set : the set of all possible subsets of . Basically, we view an element  as the set of all computational universes that are consistent with a particular history of the physical universe; we'll explain this connection more rigorously when we define bridge transforms. Here we discuss the interpretation of a given element of .

Suppose that for all  says that . This means that only a single result of the computation "What's ?" is consistent with , so we say that this computation is instantiated according to . If some  said that  and another  said that , it would mean that nothing in the physical universe could depend on which of these (if either) was. It would require that nobody put "" into a functional calculator, that nobody calculated  in their head, that 63 items were never arranged in a 7 by 9 grid, etc. Similarly, if every  in  gives exactly the same precise mathematical answer to "How would <gigantic mess of code that precisely describes all of your neurons and their interactions> respond to <precise mathematical description of the sensory inputs from being passionately kissed by a tall, strong man in the middle of a thunderstorm>?", it means that the computation of your brain experiencing a passionate kiss during a thunderstorm has been instantiated according to . Or, in layman's terms,  says you've been kissed by a tall, strong man in the middle of a thunderstorm.[7]

You can still have imperfect information about the result of a computation without it being instantiated. If you're a straight man who dislikes getting wet, you can conclude that being kissed by a man during a thunderstorm would be unpleasant without the full computation being run. Only knowing the mathematically precise manner in which it's unpleasant requires the computation being fully instantiated. This is how physicalist agents naturally avoid mindcrime; if a physicalist agent wants to know some facts about a computation but having that computation be instantiated could incur a significant loss, the agent may use some heuristics to narrow down the possible results of the computation, but would avoid conducting a detailed simulation that would tell it everything.

For later definitions, we'll need the set . This is the set of  pairs such that  is consistent with , where  can be anything in  is the domain of a physicalist agent's loss function.

Definitions

We start with some definitions from infra-Bayesianism.

Definition 1: A contribution  on a finite[8] set  is a function from  to the interval  with . The space of all contributions on  is denoted .

Basically, a contribution is a probability distribution, except the "probabilities" may sum to something less than one.

There is a natural order on : Given   iff .

Definition 2: An ultracontribution[9]  on  is a set of contributions on  that is:

  1. Closed (when viewed as a subset of )
  2. Convex (If  and )
  3. Downward closed (If  and , then )
  4. Nonempty (Due to downward closure, this equivalently means that  must contain the contribution )

The space of all ultracontributions on  is denoted .

None of the conditions for a set of contributions to be an ultracontribution is of immense conceptual importance; thinking of an ultracontribution as an arbitrary set of contributions is sufficient for developing an intuition.

For thinking about ultracontributions, it's helpful to compare Bayesian agents to infra-Bayesian agents. A Bayesian agent's worldview consists of a single probability distribution, and the Bayesian agent will take whatever actions will minimize the expected value of its loss function for this probability distribution.[10] An infra-Bayesian agent's worldview consists of a single ultracontribution - a set of contributions (which is like a set of probability distributions). An infra-Bayesian agent will take whatever actions will minimize the worst expected loss across all of these contributions. The infra-Bayesian agent regards every contribution in its ultracontribution as a possibility. There is no notion of one contribution being "more likely" than another; every contribution in the ultracontribution is on even footing.

Other concepts we will need:

Recalling our previous discussion of  and , we now have the tools for defining a computationalist hypothesis.

Definition 3: A computationalist hypothesis is a  such that

  1. For all mappings 

The first condition means that computational hypotheses are internally consistent. If  is the 3-element set  cannot contain a contribution that gives nonzero probability to an element like  that says, "Only computational universes 1 and 2 are consistent with the physical universe; also, computational universe 3 is the one that answers mathematical questions correctly."

The second condition means a computationalist hypothesis cannot assert with high confidence that a computation will have a given result without that computation being instantiated. More concretely, if  contains , condition (2) means that  must also contain the contributions  and . (These contributions can be obtained by letting  or .) It is possible for  to contain the contribution  without containing a contribution that gives nonzero probability to other possibilities. The bottom line: If  claims that a computational universe is consistent with the physical universe, it must incorporate the possibility that this computational universe is correct.  cannot know that  without  getting computed somewhere.

Bridge Transforms and Monotonicity

Given a description of the physical universe, how can we determine which programs are instantiated, i.e., select elements of ? To answer this question we use bridge transforms: functions that take joint beliefs over computational and physical universes as input and produce beliefs about which computations are instantiated.

We will use  to denote the set of all physical universes we're considering. As with has no mathematical structure beyond being a finite[12] set. Here are some examples of sets that could be used for :

However, the details of  are of minimal importance to understanding IBP. We don't care about  in its own right; it's just something we'll use to produce computationalist hypotheses. We formalize joint beliefs over computational and physical universes as elements of . Note that entanglements of beliefs about computations and physics are often quite natural. With  as a set of possible histories of a cellular automaton, different elements of  naturally correspond to computational universes that give different answers to "How will this cellular automaton evolve according to <rules>?". Straightforward entanglements also arise from considering agents; physical universes in which an agent chooses "cooperate" in the Prisoner's Dilemma will correspond to computational universes in which the agent's source code yields "cooperate" when given the appropriate input.

We now define the liberal bridge transform. The liberal bridge transform is a mapping . The resulting ultracontribution contains all contributions that are "consistent" with the ultracontribution over  in a sense that is closely related to the conditions of a computationalist hypothesis.

Definition 4: Let  and  be finite sets and let . The liberal bridge transform of  is , where  is an element of  if and only if:

  1. For all mappings ,  

Here,  is the projection operator; . These conditions closely mirror those for computationalist hypothesis. In fact, the liberal bridge transform provides an alternative characterization of computationalist hypotheses:

Proposition 1 is a computationalist hypothesis if and only if:

For using the bridge transform on  we take  is the diagonal operator that sends  to . In condition (2),  simply takes  to .

For the proof of Proposition 1 we'll be working with two copies of  in the bridge transform; for clarity, we'll denote the one that's treated as the physical universe as , the one produced by the bridge transform as , and write   and  for the subsets  and . First, however, we need a basic lemma:

Lemma 1: For ,

Proof: Consider a . The LHS sends this element

The RHS sends this element

proving Lemma 1 in the case that . Any contribution can be written as a linear combination of -contributions and all of the mappings in Lemma 1 are linear, so Lemma 1 holds for all 

Proof of Proposition 1: Condition (1) is the same in Proposition 1 and Definition 3, and both of them also ensure that  is an ultracontribution. Therefore, we only need to worry about condition (2).

First, let  be a computationalist hypothesis. Showing that  is equivalent to showing that, for all  and all , we have . Writing  for some , we have

The first equality is from Lemma 1. The inclusion on the final line follows from the definition of a computationalist hypothesis. To show that the conditions of Proposition 1 imply that  is a computationalist hypothesis, we know that

from condition (2) of Proposition 1 and the definition of the bridge transform. The above equalities show that this equals , so condition (2) of Definition 3 is fulfilled. 

For what follows, we will need to introduce the information order on . Given , we write  iff , and .[13] Then, given , we have  iff for all functions  such that  (that is, for all nondecreasing ), we have

This means that  if for every smidgeon of probability assigned by  there's a corresponding smidgeon of probability assigned by , and the smidgeon of probability assigned by  is assigned to an element of  that is the same as for , except the  component of  may be a superset of the  component of .[14]

We can get a computationalist hypothesis from  by projecting to . However, every computationalist hypothesis produced this way contains contributions that correspond to the possibility that every computation is instantiated. For example, if , it is easy to see that  from the definition of the liberal bridge transform. Such contributions claim that any given configuration of the physical universe will only be consistent with a single configuration of the computational universe, which implies that all computations are instantiated. More broadly, if  and , we must also have  (Proposition 2.4 [LW · GW] in the original IBP post). This was called the monotonicity principle. The original post on IBP discussed the consequences of only considering computationalist hypotheses that are derived from the liberal bridge transform, and therefore obey the monotonicity principle, in great detail. Here we define a conservative bridge transform that can yield any computationalist hypothesis, and thus is unbound from the monotonicity principle and the limitations discussed in the original IBP post.

Definition 5: Let  and  be finite sets and let . Denote the set of maximal elements of  in the information order by . Define , and let  denote the closed convex hull of .[15] The conservative bridge transform of , denoted , is the downward closure of  in the natural order on contributions, i.e. the set 

The big difference between the liberal and conservative bridge transforms is the step of only considering maximal elements of  in the information order; going from  to  is necessary to make the resulting set closed under , and the rest is just bookkeeping to ensure that we actually end up with an ultracontribution.

Proposition 2 is a computationalist hypothesis.

Proof: For condition (1) of Definition 3 (), we observe that  has its support on  is a closed, convex, and downward closed (in the natural order on contributions), so  satisfies this condition as well.

For condition (2) of Definition 3 (), we again note that it is satisfied by . What remains is showing that it is preserved by taking the downward closure of the closed convex hull. Here, noting that the projection commutes with the operations we're interested in, we let   be an arbitrary subset (not necessarily an ultracontribution) that satisfies  and prove that each of the following operations preserves this condition:

Convex hull: Let , so  for .

Closure: Let  be a convergent sequence and let . We need to show, for any , that there is a series in  that converges to . Such a series is given by , which proves closure.

Downward closure: If  and , then 

Lemma 2: If  is a computational hypothesis, the set of maximal elements of  in the information order is supported on the diagonal of , i.e. the set of elements of the form 

Proof: Let . First we show that for all  we have . Suppose that  with  containing some element . By downward closure, this means that  for some . Let  be the constant function. By the definition of the liberal bridge transform we must have . We compute

This violates the support condition of a computationalist hypothesis since , establishing a contradiction. This shows that for all  we have .

Now we show that for every maximal  we have  we have . Consider the function  defined by

Since  we have  for all  implies that  by Proposition 2.1 [LW · GW] of the original IBP post. Furthermore, Proposition 1 says that , so  is in  iff  is supported on the diagonal, so all maximal elements of  are supported on the diagonal. 

Proposition 3: If  is a computationalist hypothesis, (idempotence).

Proof: First, we show that . Since  is convex, closed, and downward closed, we only need to show that, for  as defined in Definition 5, . Let . Then  for some  in the set of maximal elements in . By Lemma 2,  is supported on the diagonal of , i.e. the set of elements of the form . From this it follows that  since  by Proposition 2.1 [LW · GW] of the original IBP post, and thus 

The inclusion follows from the definition of a computationalist hypothesis and . The final equality can be proved similarly to Lemma 1; all the mappings are linear,   sends

and  sends

Next we show that . By Proposition 1, . Letting  be an element of  and  will be a maximal element of  in the information order  if it is a maximal element of  in the natural order on contributions . This follows from Lemma 2: the maximal elements (in the information order) of  are supported on the diagonal, so the only way for  to not be maximal (in the information order) is to have an  such that  and , which is equivalent to  (in the natural order on contributions).

By Proposition 2.1 [LW · GW] of the original IBP post, we have , so a maximal  in the natural order on contributions must correspond to a maximal  in the natural order on contributions.[16] Looking at the definition of the conservative bridge transform, we see if  is a maximal element of   then . (To see this, just let  and remember that  is supported on . (as is clear from the definition of the conservative bridge transform), so we must have , and therefore  if  is maximal. Finally,  is downward closed, so if  is a non-maximal element of , i.e.,  for some maximal , the fact that  also ensures that 

Corollary 1: Every computationalist hypothesis can be derived as the conservative bridge transform of some  for some .

Unlike the liberal bridge transform, the conservative bridge transform can yield any computationalist hypothesis - including ones that say that certain programs aren't running. Using the conservative bridge transform instead of the liberal bridge transform eliminates the monotonicity principle.

Properties of computationalist hypotheses

We now state and prove some basic facts about computationalist hypotheses.

Proposition 4: Let  be a boolean function and  be a subset of . If for all  in the support of a computationalist hypothesis  we have  if and only if  is true, then for all  we have either  or .

Proof: Let  and suppose there is a  such that , but  and . Let  be supported on this . Define  by  and . Then  is in the support of  and  is in the support of . Both these contributions are in  by the definition of a computationalist hypothesis.\(\\)Both of these elements have the same  and exactly one of them has its -component in , contradicting the assumption of the proposition. 

We can let  be, "Does  say Computation #1 is instantiated?". If  is the set of all  which say Computation #2 yields a certain result (let's call it ) and  claims that Computation #1 will only be instantiated if Computation #2 yields the result , then Proposition 4 means that Computation #2 must be instantiated according to .[17] 

Proposition 5: The intersection of any family of computationalist hypotheses is a computationalist hypothesis.

Proof: The intersection of downward-closed closed convex sets is downward-closed, closed, and convex, so the intersection of (potentially infinitely many) ultracontributions is an ultracontribution. Next, let . Then , so we have , and therefore . We also have  from 

Proposition 6: The convex hull of two computationalist hypotheses is a computationalist hypothesis. 

Proof: The convex hull of two closed convex sets is closed and convex. For downward closure, let  and let  for some , and  . Write . The first term is , so it must equal  times some element of  by the downward closure of . The second term must be  because  and  since  is non-negative. By the downward closure of  and , this means that  for some  and , so .

Now we need to show that . Let  be an element of  is a linear operator, so we have

 since the  are computationalist hypotheses, so we have

Proposition 7: A finite mixture of computationalist hypotheses is a computational hypothesis. That is, if  are computationalist hypotheses and for all  we have  and , then   is a computationalist hypothesis.

Proof: We need to show that . As in the proof of Proposition 6, this follows from the linearity of  and the fact that the  are computationalist hypotheses. Now we only need to show convexity, closure, and downward closure:

Convexity: Let . Then

Closure: Consider a convergent sequence . We need to show that .We know that  is compact since   is finite, so our sequence must contain a convergent subsequence  for which the sequence  converges to some  for all  since  is closed, so .

Downward closure. From the proof of Proposition 6 we see that the mixture of two computationalist hypotheses is downward closed. That this holds for any  follows by induction since we can consider the mixture of  computationalist hypotheses as a mixture of  and 

If computationalist hypotheses did not obey any rules such as Propositions 4-7, that would be suggest that they were not a natural mathematical concept, and therefore not a promising tool for understanding agent foundations. While none of these properties are surprising, the fact computationalist hypotheses possess them is (weak) evidence of computationalist hypotheses being natural and useful.

This is far from a complete list of the properties of computationalist hypotheses. The properties of the liberal bridge transform, which is related to computationalist hypotheses via Proposition 1, are investigated more fully in the original IBP post.[18]

Further reading and future research

This post's novel[19] content has been the definitions of computationalist hypotheses and the conservative bridge transform, and proofs concerning some of their properties. The surrounding discussion of infra-Bayesian physicalism has been the minimum necessary to contextualize and understand these novel contributions. The original IBP post [LW · GW] provides many technical results for the liberal bridge transform and a more complete description of physicalist agents.[20] Readers who want to see IBP in action may be interested in the post on IBP and quantum mechanics [LW · GW], which provides several examples.[21] Readers who are interested in how IBP could offer a solution to the alignment problem are encouraged to read about Physicalist Superimitation [LW · GW].

With monotonicity resolved, the biggest hole in the theory of IBP is the current lack of a tie-in to learning theory, and creating learning theory for physicalist agents is the most important IBP-centric research direction. This will include defining a notion of physicalist regret that admits learnability and proving regret bounds for physicalist agents. This is essential since regret bounds are a critical ingredient for proving that an agent is safe and aligned. Regret bounds are also natural candidates for giving a rigorous definition of what it means for an algorithm to be an agent.

Another research direction is to create an axiomatic characterizations of a bridge transform - in other words, coming up with a list of properties and proving that there is only one potential bridge transform that satisfies all of them. This could lead to a new and improved definition of a bridge transform (like how the conservative bridge transform is an improvement over the liberal bridge transform in important ways) or give us more confidence that a bridge transform we have already formalized is "correct."

Finally, other directions in the learning-theoretic research agenda [LW · GW] - most notably compositionality and metacognition - will likely be relevant to IBP.

Thanks to Cole Wyeth for reading a draft and offering suggestions, and to Vanessa for coming up with all of the math, providing many rounds of feedback, and being a wonderful wife.

  1. ^

    The problem of privilege is discussed in more depth in the original IBP post [LW · GW].

  2. ^

    While physicalist agents find it easy to reason about embedded agency, certain assumptions (e.g., an uncorrupted memory) are required for formal guarantees and learning.

  3. ^

    A fear of mindcrime may interfere with learnability, however, so it may be necessary to add a small amount of willingness to risk mindcrime back into the physicalist framework, such as by having a physicalist agent by unconcerned about committing mindcrime so long as it's done on a single trusted computer. Even if this is a serious issue, however, IBP still provides a useful framework for formalizing the concept of mindcrime.

  4. ^

    Perhaps excluding mindcrime, which is more of a concern of AI alignment than for agent foundations in the abstract.

  5. ^

    For computations that don't halt, each element of  will still specify a result (e.g., the final state of a tape in a Turning machine). All such purported "results" for non-halting computations will be bullshit, but this is a nonissue since it will be impossible for anything that occurs in the physical universe to contradict such a purported result. In essence, whatever an element of  says about a particular computation boils down to, "If this computation halts, then its result will be such-and-such."

  6. ^

    We expect everything to generalize smoothly to infinite , but having  be finite simplifies the presentation. Additionally, with infra-Bayesianism it is very natural to focus on a finite quotient set of an infinite  while ignoring everything else. (Mathematically, we're just coarsening over all the results of all but finitely many computations; you don't need to understand what coarsening means to understand the rest of this post.)

  7. ^

    Caveat:  does not necessitate that a tall, strong man has experienced kissing you.

  8. ^

    This can be generalized to infinite  by letting  be a measure.

  9. ^

    These were called homogenous ultracontributions (HUCs) in the original IBP post.

  10. ^

    The "grain of truth" problem says that, in general, an embedded agent with a single probability distribution cannot assign nonzero probability to the true state of the universe. Hence the need for infra-Bayesianism.

  11. ^

    This abuse of notation is like how, given  and , we write  to denote the image of .

  12. ^

    As with , restricting our attention to finite  is purely for simplicity.

  13. ^

    We use the  symbol to distinguish the information order from the natural order on contributions, which we denote with .

  14. ^

    This claim is formalized in Proposition 2.2 [LW · GW] of the original IBP post.

  15. ^

    The convex hull of a set  is the set of all objects that can be written as ; it's the smallest convex superset of . The closed convex hull is the closure of this set, i.e., the set of its limit points.

  16. ^

    Note that projections preserve probability mass; .

  17. ^

    Computation #2 must be fully instantiated if it can yield exactly two results, but it may be only partly instantiated if it has more than two possible results.

  18. ^

    There is at least one property of the liberal bridge transform that the conservative bridge transform does not possess: the refinement of an ultracontribution (that is to say, replacing an ultracontribution  with some subset of ) used as an input to a bridge transform shouldn't add additional possibilities to the result of the bridge transform. Here's a counterexample: Let  and . Let  be the largest possible ultracontribution on , i.e. the set of all possible contributions on it. Let  be the downward closure (in the natural order on contributions) of , i.e. the set of all contributions that are supported on  is a refinement of , but  is supported on  and  is supported on  and  is supported on  (and everything else) due to the monotonicity principle.  can be interpreted as, "all conceivable combinations of computational and physical universes are possible".  , however, confidently states that there is absolutely no connection between computations and physics.

  19. ^

    Novel excluding Vanessa's shortform [LW · GW] on which it is based

  20. ^

    This post renders a significant fraction of the orginal IBP post obsolete, most notably the discussions of the monotonicity requirement. Additionally, non-monotonic IBP allows for many kinds of loss function for a selfish physicalist agent that has multiple copies running, while the monotonicity requirement had limited the original IBP post to considering agents that are solely concerned with minimizing the loss of their best-off copy.

  21. ^

    Several of the proofs use the monotonicity principle, so investigating quantum mechanics with non-monotonic IBP is a potential research direction.

0 comments

Comments sorted by top scores.