Goal-directedness: relativising complexity

post by Morgan_Rogers · 2022-08-18T09:48:40.864Z · LW · GW · 0 comments

Contents

  Relativising Kolmogorov complexity
    Conditional complexity
    Extension complexity
    Modification complexity
    Oracular complexity
    What is relativisation really?
  An algebraic example
    Relative complexity of group elements
    Conditional complexity
    Relativising to subsets
  Conclusions
None
No comments

This is the fourth post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. The funding has come to an end, but I expect to finish off this project as a hobby in the coming months.

My previous post [LW · GW] was all about complexity, and ended with an examination of the complexity of functions. In principle, I should now be equipped to explicitly formalise the criteria I came up with [LW · GW] for evaluating goal-based explanations [LW · GW]. However, several of the structures whose complexity I need to measure take the form of transformations from one structure of a given type to another. This brings us into the domain of relative complexity. I have chosen to present this in a separate post, not just because of the daunting length of my last post, but also because I think this is a subject which this community would benefit from some explicit clarification of. In future posts I shall finally put all of the pieces together and get to the bottom of this goal-directedness business.

Recall that last time that I described complexity as a quantity assigned to a structure that measures how many simple pieces are needed to construct it. The general concept which is under scrutiny this time is a variant of the above, where we consider instead a quantity assigned to a structure that measures how many simple pieces are needed to construct it from a given resource or starting point. A source of confusion lies in the fact that there are multiple ways to relativise complexity, which is why a closer examination is needed.

In this post I'll be discussing a few implementations on relativised notions of complexity.

Relativising Kolmogorov complexity

Conditional complexity

In this subsection I'll be using standard notation and definitions from Ming Li and Paul Vitányi's textbook,[1] which I will subsequently extend somewhat. They use the convention that the complexity measure is indexed by a partial computable function rather than a given Turing machine which computes it, which makes sense, since the details of the Turing machine are irrelevant in the computation of complexity beyond its input-output behaviour.

The definition of K-complexity for individual strings  which we saw last time involved a choice of universal Turing machine , computing partial function , say, and measured the length of the shortest input  (encoding a program) such that . This gave the plain Kolmogorov complexity, which I'll now denote as .

Conditional Kolmogorov complexity allows an auxiliary string  to be passed as a "free" input to the program, so that the corresponding complexity is , where  is the string obtained from  and  by concatenating them and adding a prefix consisting of  copies of  followed by a  (so that the Turing machine encoding  can reliably separate the concatenated inputs). For conciseness, I'll denote  as , interpreting  as an input to the program  (with  playing the role of an interpreter or compiler).

This notion of conditional complexity contains some distinct possibilities which I alluded to in the introduction and would like to separate out; hopefully the algebraic example in a later section will help to clarify the distinct cases I'll highlight here.

Consider a situation where . For many such cases, the shortest program  producing  given  is  := print. Similarly, if  contains many copies of , we can use "print(-)" to insert those copies into the output being produced. On the other hand, if  has nothing to do with , the most efficient program will be one that simply ignores the extra input and produces  directly. In this way, the input  can be thought of as a resource for obtaining the output .

Extension complexity

A special case of the above is when  is a prefix of , but where the remainder of  is not easily compressible[2] using . As long as  has non-trivial length, the most efficient program for producing  might be of the form "print and then append ", where  is a program producing the suffix of  with no input. In this situation we are using  as a starting point for obtaining the output . I can refine this into an independent definition by removing the cost of the initial print command: , where  is the concatenation of  and the output of  at . With this (non-standard) definition we can no longer use  as a resource, and we can only obtain strings which extend  (all other strings having infinite complexity). I hence call this the extension complexity.[3]

Modification complexity

An alternative option, for a set-up where we work with Turing machines having separate input and output tapes, say, is to initialize the output tape with a given string . We could also impose further constraints on the model, such as proposing that the machine cannot read from the output tape, only write to it, so that the initial output cannot be used as a resource in the sense discussed above. I'll choose a universal machine set up this way, , and write  to indicate that this machine is initialised with  in the output tape. Then we get a notion which I'll call modification complexity, , which measures the shortest program for generating the string  with initial output .

Assuming that the machine can also delete symbols in the output, all strings have a finite modification complexity, so this is a genuinely different measure than the extension complexity.

Oracular complexity

I explained last time that we can equip a Turing machine with an oracle, which can perform some decision problems "for free", or rather in a single step of computation. The details are a little technical. Given a subset , we can define a Turing machine with a reference tape which includes an oracle query state which writes a  to the output tape if the content of the reference tape is a member of , and writes a  otherwise. Naming the oracle after the subset defining it, we obtain a class of -computable partial functions, and corresponding -universal Turing machines, with a necessarily different enumeration of the programs than for ordinary Turing machines. Nonetheless, given the partial function  computed by such a universal machine, we can duplicate the definitions given above with all of the same conclusions.

Comparing complexities arising from different oracles might be interesting, but it's beyond the scope of this post.

What is relativisation really?

We saw last time [LW · GW] that even after fixing a collection of objects containing a particular object, there may still be many choices to make before a complexity measure is fully specified over that collection (and hence on that object). On the other hand, we also saw that some choices are natural to some extent, in that they themselves minimize some higher measure of complexity. Whether "natural" or not, there may be a default choice of complexity measure for some collection of objects. Given such a situation, the various types of relative complexity are ways of expressing modifications to the default complexity measure, using all of the features involved (implicitly or explicitly) in the definition of the default complexity measure. We can interpret the above examples in these terms:

It's worth noting that the cost of any of these contributions to complexity can vary, either because of the explicit implementation of using the resources or operations being provided, or by artificially attaching a cost to such operations. I explored variations in costs last time [LW · GW]. The takeaway is that I'm only scratching the surface of the range of notions of relative complexity that are possible. It's time for some more instances in a different setting!

An algebraic example

Relative complexity of group elements

Recall the word length complexity measure on elements in [LW · GW] a group [LW · GW] which I presented last time. Explicitly, I am given some generators  of a group ; the word length complexity  of an element , is the length of a shortest presentation of  as a product of the generators.[7]

In this and other constructions of elements in algebras, we take the "empty" construction as the starting point by default, such that the identity element always has a complexity of 0, being identified with the empty product of generators. This is a natural choice insofar as it is the only choice which can be made independently of any further details of the structure of the group. However, such a choice may not always be available. To illustrate this, I'll remind you how algebraic objects can act on other mathematical objects, including sets, spaces or other algebraic objects.

For concreteness, I'll specifically talk about (right[8]) group actions. Suppose I have a group  as above acting on a set , which formally means that each element  and  determines an element  in such a way that  for all elements . Given two elements , we can measure the complexity of "getting to  from ," denoted , to be the complexity of the (unique) element  such that , if it exists, and define the complexity to be infinite otherwise.

The crucial feature of this situation which differs from the complexity measures we saw last time is that we must choose not one but two elements of , and there is in general no distinguished choice for either element which might allow us to eliminate one of the choices. A sensible reaction to this example is to entertain also a relative version of complexity of elements in an algebra, where we may choose the starting point for the construction to be something other than the identity element. This is justified in this particular case by the fact that the group  acts on its underlying set, which I'll denote by , by multiplication, and hence as a special case of the above we can define the complexity of "getting to  from ," denoted , to be the complexity of the (unique) element  such that ; in this case such an element always exists, and is equal to 

Generalizing to other algebraic contexts, we may no longer have uniqueness or existence of the element . This may be the case in a monoid, for example. In that case, we must minimize the complexity across elements satisfying the equation, which makes this relativized complexity measure more interesting.

I have chosen notation consistent with that for modification complexity introduced above, since one way of interpreting  is as "the cost of the most efficient way to modify  to obtain ", and because it amounts to choosing a starting point for the construction other than the default of the identity element. We evidently have . As such, for any third element , we have a general inequality , which is a directed version of the triangle inequality, and taking  to be the identity element we get the special case, .

The above is the discrete case, but it works just as well for the continuous case: the topological group  of rotations in the plane is homeomorphic as a space to the circle . A conventional identification of  with the unit circle in the plane sends the identity element ( rotation) with the point , but any other choice would be valid. If we measure the complexity of a rotation as the size of the smallest angle of rotation from the identity producing it, the relative complexity  of two rotations is the smaller angle separating them as points on a (uniformly parametrized) circle.

Conditional complexity

Alternatively, we could consider the "cost of building  using ", in the sense of adding  to the set of generators. The resulting complexity measure depends on how much each copy of  costs[9]. If the cost is greater than or equal to , the complexity measure collapses to the original complexity measure in terms of the generators . At the other extreme, if the cost is zero, the complexity is a measure of how close  is to a power of ; we shall return to this case below. Between these is the case where the cost is one, where we recover the ordinary complexity of  after adding  to the set of generators. As ad hoc notation, if the cost of each copy of  is , I'll write  for the corresponding conditional complexity of . For any value of , we have .

If the relative complexity  treats  as a new starting point for constructions, the conditional complexity treats  as an extra building block which we can use in the construction, without shifting the starting point; it's altering the default choice of generators. Unlike the default of starting from the identity, there may be no default choice [LW · GW] of generators for a given group, so remember that this notation assumes a choice has been established previously. Again, relative complexity measures are really just a way of expressing complexity measures by comparison with some pre-existing (default) choice.

Relativising to subsets

Rather than individual elements, we could consider subsets which we are allowed to freely choose from. Let  be a subset; then we can define the relative complexity  to be . As particular cases relating to the previous definitions, we have that , and if  is the subgroup of  generated by , then .

We can also define the extension complexity  in the evident way. Indeed, if  is actually a subgroup of , or if we replace  with the subgroup it generates, the -cost version of the extension complexity coincides with the relative complexity in the quotient right -action , or as an equation, .

I'm hoping that if you have some experience in algebra, you might be able to come up with some further examples of relativised complexity measures for yourself, or to identify how this example can be extended by analogy to another relativised version of Kolmogorov complexity.

Conclusions

If the equations and inequalities I've sprinkled through this post aren't enlightening, the general takeaway from them is that relative notions of complexity allow us to interpolate between complexity measures, or see one type of complexity measure as an extreme case of another. They also allow us to acknowledge and adjust parameters which a default point of view on a setting would take for granted.

Mine is not a conventional take on complexity. Complexity seems too often to be assumed as an intrinsic property of an object of study, with various desirata attached regarding how this quantity should behave. I hope that I have managed to illustrate through sheer breadth of possibility that this is a reductive point of view. There is a vast web of notions of complexity and a deep foundation of assumptions underlying any given complexity measure. Understanding the greater structure should promote understanding of any particular choice of complexity one selects.

  1. ^

    Do not mistake this for an endorsement of said textbook. The second chapter opens with the sentence, "The most natural approach to defining the quantity of information is clearly to define it in relation to the individual object [...] rather than in relation to a set of objects from which the individual object may be selected," a sentiment which I am strongly opposed to (as my exposition in the present and previous post hopefully make apparent). Having expressed this opinion, however, the authors go on in a later paragraph to set up their notation with the sentence, "Denote the set of objects by , and assume some standard enumeration of objects  be natural numbers ," in blatant opposition to their "natural approach"...

  2. ^

    I didn't introduce the notion of compressibility last time, but for this conditional case it amounts to saying that having  at our disposal does not shorten the length of the program required to output the suffix of .

  3. ^

    I would have called this "prefix complexity", but that is already the name of a variant of Kolmogorov complexity explained in Chapter 3 of Li and Vitányi's textbook, involving prefix-free codes.

  4. ^

    It's hard to make the cost reduction precise, although I could probably provide some bounds on it. The recursive nature of Turing computation means that any string produced in the running of the program may subsequently be used by the program an effectively unlimited number of times with a corresponding similarly reduced complexity cost; providing  as a basic building block only negates the initial cost of generating the string. 

  5. ^

    Digital files contain a header specifying some parameters of the file before the actual body of the file (the data you see when you open it) begins. If you write a program that outputs a file, that header will usually take a default value, so that the length of your programme is an upper bound on some instance of extension complexity.

  6. ^

    Continuing the above example, you could write a program which produces a file starting from a template file, at which point the length of your program is an upper bound on the modification complexity from the template to the final file produced, now ignoring the file header.

  7. ^

    I'm aware that this is not great notation for the complexity, since it clashes with the notation conventionally used for centralizers in group theory. Please invent original notation if you use this concept in your own work!

  8. ^

    If we instead used left actions, we would get a dual definition of relative complexity later on.

  9. ^

    For the purposes of this post, I will assume that the costs are constant, although I mentioned in the last post that this need not be the case. Indeed, if we allow any function of the number of instances of  in the construction, we can also express the relative complexity  as an instance of the resulting more general type of complexity measure.

  10. ^

    This example is hiding some subtleties of formalization. I'm implicitly computing the relative complexity of the graphs as the minimum number of some basic operations (which include edge deletion and duplication) required to produce an output graph isomorphic to the target graph. The transformation I describe is hence not a graph homomorphism, and if I were to enforce the rule that transformations should be graph homomorphisms, then the resulting composite would in fact not by the identity homomorphism and hence would carry complexity greater than 0.

0 comments

Comments sorted by top scores.