Integers as Compression

post by Chris_Leong · 2020-10-28T07:36:09.192Z · LW · GW · 7 comments

I would like to propose a conception of what integers are. This is something of a misnomer, I don't actually want to claim that there is a one true conception of what integers are, but nonetheless this is a convenient way of describing what this post is about. I actually believe that there's multiple starting points from which we can produce a concept of integers, nonetheless, this appears to me as a particularly beautiful approach and highlights some aspects that other conceptions may not.

Suppose we have the following sequences:

aaaa aaaa

bbbcc bbbcc

abcd abcd

If sequences of this form are fairly common, then the number 2 will be useful for compressing information about these sequences. In particular, we can imagine writing the first sequences as 2{aaaa} instead.

Next let's imagine taking the sequence 2{aaaa} and appending 3{aaaa}. This would give us the sequence 5{aaaa} and leads to a natural notion of addition. 4{2{aaaa}} = 8{aaaa} and so now we have arithmetic as well.

We aren't limited to exact matches either. Consider:

aab aac aax

If the last character is irrelevant, we could write this as 3{aa_}. This is useful for producing a conceptualisation of objects as objects are almost never exactly the same. If we have a photograph, we can imagine utilising a form of lossy compression to represent the data present. For example, we might only care about the fact that there are three people in the photo and not about any of the details. This can further be extended into video and streams of experience. Alternatively, instead of compressing streams of experience, we could imagine compressing some kind of mental model.

One major advantage of talking about compression and sequences instead of objects is that to talk about objects you need to be able to set up a significant amount of metaphysical infrastructure. In contrast, sequences are fairly simple objects. We don't need these sequences to exist in an abstract sense, but only as objects in our mind. We could try defining numbers using abstract objects instead, but these don't naturally provide a notion of addition without something like a notion of space. Sets can provide a useful conception of numbers as well, but the abstract sets in our head are difficult to relate to the concrete sets in the world, while the compression conception links back fairly naturally. But beyond this, I don't believe that we need a unique conception of numbers and that different conceptions can illuminate different aspects of them.

One objection might be that we can't use compression to define numbers because that requires sequences and we need numbers to define sequences formally. This doesn't actually seem like that significant a flaw to me. If someone proposes laws of logic, the only way that we can check that these laws of logic are reasonable is to combine them using some kind of logic or meta-logic. Similarly the only way to check the reliability of our sense (like sight or hearing) is to use sense-data itself to check for consistency. And we can't falsify our assumptions about the world without making assumptions about how the world works in order to interpret our experimental results.

At this point, it is naturally to ask about whether this is useful or just kind of cool. Since as far as I'm concerned this is just a useful conception, not the one true conception of numbers the direct applications of this are somewhat limited. On the other hand, sometimes these conceptions can have indirect applications. For example, the set theory conception of numbers suggests the possibility that "four" mightn't have any meaning outside of something more specific like "four cats", "four dogs" or "four apples".

The compression conception of integers emphasise how numbers appear out of our processing of data, rather than from our experiences of the world. I'm still unsure whether this constitutes evidence for integers existing in the map rather than the territory. But it's interesting to think that we would be able to define the notion of numbers even if there didn't exist an external world.

7 comments

Comments sorted by top scores.

comment by noggin-scratcher · 2020-10-28T19:03:41.255Z · LW(p) · GW(p)

Seems quite similar in spirit to the way of defining numbers into existence around the 'successor' function.

https://en.wikipedia.org/wiki/Successor_function

Replies from: Chris_Leong
comment by Chris_Leong · 2020-10-29T03:36:57.994Z · LW(p) · GW(p)

That's a good point. Successor functions are one form of compression. I guess one neat thing with this conception is that it also links up with the definition of numbers in terms of objects. As discussed above we can think about representing the objects in an image as a form of lossy compression. So it ties these two conceptions together quite neatly.

comment by Slider · 2020-10-29T14:48:19.336Z · LW(p) · GW(p)

Python has semantics which has int * string to be a string repeated in times. However this is expansion rather than compression.

There + is concatenation so "hi"+"world" becomes "hiworld". Which is somewhat compatible with "a"+"a"="aa"=2*"a".

However "a"+"b"!="b"+"a", it doesn't naturally commute. So while you might get a system that is a lot like arithmetic it might be different. This difference migth or might not matter.

Sequences are not free of ontological package. It is not clear that if you have two apples nearby you that they are in any particular sequence. Sequences might be natural in that a brain usually goes from one state to another, that is a time series is often relevant. Another sense proessing alternative would be to say that a + b = ab means something like the big experience of ab holds as parts the experience of a and experience of b without there being a before or after relation between a and b. 

Replies from: Chris_Leong
comment by Chris_Leong · 2020-10-29T21:34:19.069Z · LW(p) · GW(p)

Well we could add another notion:

<3{aa}, 4{bb}> 

Which would stand for 3aa's and 4bb's but possibly interspersed in any order.

comment by kithpendragon · 2020-10-29T10:58:48.114Z · LW(p) · GW(p)

To check understanding by way of grossly oversimplified illustration: The appropriate mind meat might identify a "person" in some sensory scene by pattern-matching against stored archetypes. The presence of the person could be indicated on the map as a single character (such as a stick figure) that points to the archetype, rather than the long-form description of the chunk of sense data that makes up the person in my perception. "People", then, can be the stick figure character and a scalar indicating how often in the scene the "person" archetype can be matched. During decompression (memory recall), the appropriate module can use the scalars attached to the pointers to fill in the parameters for the repeat command while building person objects from the archetype.

That took a turn for the computer-sciencey, but that's my background. For what it's worth, your conception of "what is Number" extends neatly and intuitively on what I was already using by explaining why the brain might benefit from tagging patterns with a scalar. Thanks for the detail!

Replies from: Chris_Leong
comment by Chris_Leong · 2020-10-29T21:39:55.011Z · LW(p) · GW(p)

Yeah, sounds like you understand it. I suppose I should add that if you're using the repeat command you won't be able to rebuilt the scene because you won't know where the objects are. But maybe all you need to know about a scene is that there are 10 horses and 5 sheep and 1 farmer and the actual positions are irrelevant.

Update: Actually on reflection, I suppose we could add a new notation that said repeat n times at these co-ordinates. Then we'd be able to actually reconstruct the scene.

Replies from: kithpendragon
comment by kithpendragon · 2020-10-30T11:54:38.988Z · LW(p) · GW(p)

Agreed. The quantity scalar is certainly not the only metadata that could be stored. If I was actually writing a program, the objects_in_scene array would probably be allowed to contain as many details as the system decided was relevant. Then the scalar would be the size of the array of objects, with each object having a pointer to an archetype and properties defined for any of those details. In fact, the size need not actually be stored, but can be reconstructed easily by examination of the array itself using something like objects_in_scene.count(). For other objects, it might make sense to count more explicitly. An object referring to a group archetype, for example, might be given a size property if the system cared to do so. From what I've read, it seems likely that (in human brains) this property will mostly store an exponent rather than trying to determine an exact number.

I expect this can get extremely complicated in a human brain! Evolution isn't much for intra-system optimization just for efficiency's sake, after all.