Probabilistic reasoning for description and experience
post by Q Home · 2022-09-27T10:57:06.217Z · LW · GW · 0 commentsContents
Connected houses Kindness Some more analogies Classification Properties into probabilities Intro Rob Gonsalves Crash Bandicoot 1 Jacek Yerka Metrics Crash Bandicoot 3 Negative objects Hyper objects Universe of orders Superrationality Connection with recursion Theory How is this related to anything? Formalization Equivalence of properties Super-correlated form Examples Universe of universes Onion layers Argumentation, hypotheses Causal rationality, Descriptive rationality 2 types of rationality should be connected Brainstorming/spitballing None No comments
For some time I wanted to apply the idea of probabilistic reasoning (used for predicting things) to describing things, making analogies between things. This is important because your hypotheses (predictions) depend on the way you see the world. If you could combine predicting and describing into a single process, you would unify cognition.
Fuzzy logic and fuzzy sets is one way to do it. The idea is that something can be partially true (e.g. "humans are ethical" is somewhat true) or partially belong to a class (e.g. a dog is somewhat like a human, but not 100%). Note that "fuzzy" and "probable" are different concepts. A dog is somewhat like a human, not "probably" a human. But fuzzy logic isn't enough to unify predicting and describing. Because it doesn't tell how descriptions can depend on each other (unlike probability: probability tells how events can depend on each other). For example, how can the definition of "animal" depend on the definition of "human"? Fuzzy logic doesn't answer such questions.
I have a different principle for unifying probability and description. Here it is:
Properties of objects aren't contained in specific objects. Instead, there's a common pool that contains all possible properties. Objects take their properties from this pool. But the pool isn't infinite. If one object takes 80% of a certain property from the pool, other objects can take only 20% of that property (e.g. "height"). Socialism for properties: it's not your "height", it's our "height".
How can an object "take away" properties of other objects? For example, how can a tall object "steal" height from other objects? Well, imagine there are multiple interpretations of each object. Interpretation of one object affects interpretation of all other objects. It's just a weird axiom. Like a Non-Euclidean geometry. This sounds strange, but this connects probability and description. And this is new.
What do we need this axiom for? In general, for describing things happening simultaneously in your mental state. Because in mental states things tend to mix together. More specifically, my axiom can be used in classification and argumentation. Before showing how to use it I want to explain it a little bit more with some analogies. Core claims of the post:
- There are two important types/usages of probability. One of them is the usual probability and the other is my "axiom".
- There are 2 main types of things in a mental state: things that don't share properties and things that do. A special type of probabilistic reasoning can explain the difference between your subjective experiences (qualia). And people's experiences.
- There are 2 types of rationality: one focuses on causality and the other focuses on describing ideas. We need to connect those 2 types.
Connected houses
Imagine two houses, A and B. Those houses are connected in a specific way.
When one house turns on the light at 80%, the other turns on the light only at 20%.
When one house uses 60% of the heat, the other uses only 40% of the heat.
(When one house turns on the red light, the other turns on the blue light. When one house is burning, the other is freezing.)
Those houses take electricity and heat from a common pool. And this pool doesn't have infinite energy.
Kindness
Usually people think about qualities as something binary: you either has it or not. For example, a person can be either kind or not.
For me an abstract property such as "kindness" is like the white light. Different people have different colors of "kindness" (blue kindness, green kindness...). Every person has kindness of some color. But nobody has all colors of kindness.
Abstract kindness is the common pool (of all ways to express it). Different people take different parts of that pool.
Some more analogies
Theism. You can compare the common pool of properties to the "God object", a perfect object. All other objects are just different parts of the perfect object. You also can check out Monadology by Gottfried Leibniz.
Spectrum. You can compare the common pool of properties to the spectrum of colors. Objects are just colors of a single spectrum.
Ethics. Imagine that all your good qualities also belong (to a degree) to all other people. And all bad qualities of other people also belong (to a degree) to you. As if people take their qualities from a single common pool.
Buddhism. Imagine that all your desires and urges come (to a degree) from all other people. And desires and urges of all other people come (to a degree) from you. There's a single common pool of desires. This is somewhat similar to karma. There's also the concept of "values handshakes" [? · GW]: when different beings decide to share each other's values.
Quantum analogy. See quantum entanglement. When particles become entangled, they take their properties from a single common pool (quantum state).
Subdivision rule. Check out Finite subdivision rule. You can compare the initial polygone to the common pool of properties. And different objects are just pieces of that polygone.
Classification
To explain how to classify objects using my principle, I need to explain how to order them with it.
I'll explain it using fantastical places and videogame levels, because those things are formal and objective enough (they can be modeled as 3D shapes). But I believe the same classification method can be applied to any objects, concepts and even experiences.
Basically, this is an unusual model of contextual thinking. If we can formalize this specific type of contextual thinking, then maybe we can formalize contextual thinking in general. This topic will sound very esoteric, but it's the direct application of the principle explained above.
Properties into probabilities
First, we need to start thinking about properties in terms of probabilities. How?
Imagine that a property is like a slider. The whole slider is 100%. Let's say we think about height. "Very tall" means that the slider is higher than 50%. "Gigantic" means that the slider is closer to 100%. "Small" means that the slider is lower than 50%. "Microscopic", "teensy-tinsy" means that the slider is close to 0%.
But that's just how we think about properties when we use language! We always (by default) talk about relative quantities in a certain context. So, this idea should be familiar.
Also notice interesting properties of concepts such as "genius" and "fool". We can't imagine a universe with 100 geniuses and 1 fool and 1 person between the fool and the geniuses. That would be like a mountain standing on a toothpick, it feels weird and disproportionate. Human concepts usually assume something stronger than simply relativity of properties: different objects "fight" for having a certain amount of a property. 99% of people can't be geniuses. 99% of objects can't have 99% of a property. Or that's very unlikely. My idea just takes this to the extreme. And I think this is a hint that people can do a lot of probabilistic reasoning not (directly) covered by Bayes' rule.
Intro
(I interpret paintings as "real places": something that can be modeled as a 3D shape. If a painting is surreal, I simplify it a bit in my mind.)
Take a look at those places: image.
Let's compare 2 of them: image. Let's say we want to know the "height" of those places. We don't have a universal scale to compare the places. Different interpretations of the height are possible.
If we're calling a place "very tall" - we need to understand the epithet "very tall" in probabilistic terms, such as "70-90% tall" - and we need to imagine that this probability is taken away from all other places. We can't have two different "very tall" places. Probability should add up to 100%. (The actual rule of distributing properties may be more complicated, but that's a good enough intuition.)
Now take a look at another place (A): image (I ignore the cosmos to simplify it). Let's say we want to know how enclosed it is. In one interpretation, it is massively enclosed by trees. In another interpretation, trees are just a decorative detail and can be ignored. Let's add some more places for context: image. They are definitely more open than the initial place, so we should update towards more enclosed interpretation of (A). All interpretations should be correlated and "compatible". It's as if we're solving a puzzle.
You can say that properties of places are "expandable". Any place contains a seed of any possible property and that seed can be expanded by the context. "Very tall place" may mean Mt. Everest or a molehill depending on context. You can compare it to a fractal: every small piece of a fractal can be expanded into the entire thing. And I think it's also very similar to how human language, human concepts work.
You also may call it "amplification of evidence": any smallest piece of evidence (or even absence of any evidence) can be expanded into very strong evidence by context. We have a situation like in the Raven paradox, but even worse.
Rob Gonsalves
Rob Gonsalves. Rest in Peace.
Places in random order: image.
My ordering of the places: image.
I used 2 metrics to evaluate the places:
- Is the space of the place "box-like" and small or not?
- Is the place enclosed or open?
The places go from "box-like and enclosed" to "not box-like and open" in my ordering.
But to see this you need to look at the places in a certain way, reason about them in a certain way:
- Place 1 is smaller than it seems. Because Place 5 is similar and "takes away" its size.
- Place 2 is more box-like than it seems. Because similar places 4 and 6 are less box-like.
- Place 3 is more enclosed than it seems. Because similar places 4 and 6 "take away" its openness.
- Place 5 is more open than it seems. Because similar places 1 and 2 "take away" its closedness.
Almost any property of any specific place can be "illusory". But when you look at the places in context you can deduce their properties via the process of elimination.
If you see this, then it doesn't matter if you understand every position in the order or not. You see the places the same way I see them and think about them the same way.
Crash Bandicoot 1
Crash Bandicoot N. Sane Trilogy
My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6.
I used 2 metrics to evaluate the levels:
- Is the level stretched vertically or horizontally?
- Is the level easy to separate into similar square-like pieces or not? (like a patchwork)
The levels go from "vertical and separable" to "horizontal and not separable".
But to see this you need to note:
- Level 1 is very vertical: it's just a vertical wall. So it "takes away" verticality from levels 2 and 3.
- From levels 1-3, level 3 is the most horizontal. Because it's the least similar to the level 1.
- Levels 4-6 repeat the same logic, but now the levels are harder to separate into similar square-like pieces. Why? Because levels 1 and 2 are very easy to separate (they have repeating patterns on the walls), so they "take away" separability from all other levels.
Any question about any property of any level is answered by another question: is this property already "occupied" by some other level?
Jacek Yerka
Places in random order: image.
My ordering of the places: image.
I used 2 metrics to evaluate the places:
- Can the place fit inside a box-like space? (not too big, not too small)
- Is the place inside or outside of something small?
The places go from "box-like and outside" to "not box-like and inside".
But to see this you need to note:
- Place 1 could be interpreted as being inside of a town. But similar Place 5 is inside a single road. So it takes away "inside-ness" from Place 1.
- Place 2 is more "outside" than it seems. Because similar Place 6 fits inside an area with small tiles. So it takes away "inside-ness" from Place 2.
- Place 3 is not so tall as it seems. Because similar Place 6 is very tall. So it takes away height from Place 3.
If you feel this relativity of places' properties, then you understand how I think about places. You don't need to understand a specific order of places perfectly.
Metrics
When we use 2 metrics, those metrics are not independent of each other, they overlap.
And one of the metrics needs to be more important (have the "emphasis"). The emphasized metric is kind of like a prior for the other metric.
But we also have other "priors": our initial impressions about the places.
How to choose the metrics for an order? What happens if we change which metric is emphasized? I'll talk about it (much) later in the post.
Crash Bandicoot 3
My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, Level 7
I used 1 metrics to evaluate the levels:
- Does the level create a 3D space (box-like, not too big, not too small) or 2D space (flat surface) or 0D space (shapeless, cloud-like)?
Levels go from 3D to 2D to 0D.
But to see this you need to note:
- Levels 6 and 7 are less box-like than they seem. Because similar levels 1 and 2 already create small box-like spaces. So they take away "box-like" feature from levels 6 and 7.
- Level 3 is more box-like than it seems. Because levels 4 and 5 create more dense flat surfaces. So they take away flatness of Level 3.
Each level is described by all other levels. This recursive logic determines what features of the levels matter.
Negative objects
When objects take their properties from a single pool of properties, there may appear "negative objects". It happens when objects A and B take away opposite properties from a third object C (with equal force). For example, A may take away height from C. But B takes away shortness (anti-height) from C. So, "negative objects" are like contradictions. You can't fit a negative object anywhere in the order of positive objects.
Let's get back to Crash Bandicoot 3 and add two levels: image. Videos of the levels: Level -2, Level -1
- Take a look at Level -2. It's too empty for levels 6 and 7 (and too box-like). But it's too big and shapeless for levels 1 and 2. And it's obviously not a flat surface. So, it doesn't fit anywhere. Maybe it's just better to place it in its own order.
- Similar thing is true for Level -1. It's too different from levels 6 and 7 and it's too small for levels 1 and 2.
- Levels -2 and -1 are also both inside some kind of structures. This adds confusion when you compare them to other levels.
Note that negative levels are still connected with all the other levels anyway: their properties are still determined by properties of all other levels, just in a more complicated way.
You can order negative levels by using the metrics for positive levels. In the case above, you can do it like this:
- Take negative levels. Cut out their larger parts. Now they're just like the positive levels.
- Order them the same way you ordered positive levels.
Hyper objects
There are also "hyper objects" (hyper positive and hyper negative objects). Such objects take "too much" or "too little" from the common pool of properties compared to normal objects.
How do hyper objects appear? I may not be able to explain it. Maybe a hyper object appears when an object takes a property (equally strong) from objects with very different amounts of that property. This was very confusing and vague, so here's an analogy: imagine a number that's very-very, but equally far away from the numbers 2 and 5. It has distance 10 from both 2 and 5. How can this be? This number should go somewhere "sideways"... it must be a complex number. So, you can compare hyper objects to complex numbers.
An example of hyper levels for Crash Bandicoot 3: image. Video of the levels: "Bye Bye Blimps", "N. Gin"
- "Bye Bye Blimps" is like a flat surface, but utterly gigantic. But it's also shapeless like levels 6 and 7, yet bigger than them/equally big, but in a different way.
- "N. Gin" is identical to "Bye Bye Blimps" in this regard.
Universe of orders
Here's my ordering of all levels of Crash Bandicoot 3: image. With some connections: image. The ordering has 4 parts: normal/hyper, positive/negative. Videos of additional levels: Level 1a, Level -4a, Level -3a, Level -2a.
Different orders are like different universes, but they still affect each other. Without context of the other orders, we wouldn't know if Level 1a is greater or smaller than the Level 2a.
Hyper negative order (-4a, -3a...) can be analyzed in terms of its own context, because it has enough levels. But it also can be analyzed in terms of its connections to another order (normal positive order). You can draw connections between levels in different orders, but levels are going to fight for those connections. Level -4a could be connected to Levels 7 and 8, but this connections is taken away by Levels -2a and -1a. (And even by Level 2a from another order.)
This can lead to a sort of circular reasoning: you derive order B from order A. Order B then verifies its own structure independently and can be used to derive A. So, we get "If A, then B; if B, then A"
You also can compare a positive and negative order pair to the Möbius strip. Just imaging connecting the ends of those orders. And when you add hyper orders... maybe you can compare the resulting structure to the Klein bottle.
By the way, when you make an order B based on order A... you can treat places of order A as the property for order B! It's another layer of recursion/fractal-like properties of all of this. And a (major) hint how we could formally model this.
P.S. And here's a potentially important analogy for understanding different orders, even though it's very weird. Imagine that the difference between pain and pleasure is subjective. And the only reason why you don't interpret your experience as constant extreme pain is because your experience is separated into special chunks. And extremely painful experiences are pushed far away outside of the universe of normal experiences. I think this conveys the idea about the function of the orders.
Superrationality
You can imagine that the common pool of properties is the pool of "resources" and different objects fight for those resources. And splitting a single pool into "positive/negative" and "normal/hyper" pools is just the best way to split resources between the objects.
https://en.wikipedia.org/wiki/Superrationality#Probabilistic_strategies
An eccentric trillionaire contacts 20 people, and tells them that if one and only one of them send him or her a telegram (assumed to cost nothing) by noon the next day, that person will receive a billion dollars. If they receive more than one telegram or none at all, no one will get any money, and communication between players is forbidden. In this situation, the superrational thing to do (if it is known that all 20 are superrational) is to send a telegram with probability p=1/20—that is, each recipient essentially rolls a 20-sided die and only sends a telegram if it comes up "1". This maximizes the probability that exactly one telegram is received.
Connection with recursion
Recursion. If objects take their properties from the common pool, it means they don't really have (separate) identities. It also means that a property (X) of an object is described in terms of all other objects. So, the property (X) is recursive, it calls itself to define itself.
For example, imagine we have objects A, B and C. We want to know their heights. In order to do this we may need to evaluate those functions:
- A(height), B(height), C(height)
- A(B(height)), A(C(height)) ...
- A(B(C(height))), A(C(B(height))) ...
With negative objects (-A, -B, -C) and hyper objects (A', B', C') things may get even more complicated:
- A(-A(A'(-A'(height))))...
A priori assumptions about objects should allow us to simplify this and avoid cycles.
Fractals. See Coastline paradox. You can treat a fractal as an object with multiple interpretations (where an interpretation depends on the scale). Objects taking their properties from the common pool = fractals taking different scales from the common range. Orders turning into properties for other orders = fractals.
Theory
How is this related to anything?
You may be asking "How can ordering things be related to anything?" Prepare for a little bit abstract argument.
Any thought/experience is about multiple things coexisting in your mental state. So, any thought/experience is about direct or indirect comparison between things. And any comparison can be described by an order or multiple orders.
- If compared things don't share properties, then you can order them using "arithmetic" (absolute measurements, uncorrelated properties). In this case everything happening in your mental state is absolutely separated, it's a degenerate case.
- If compared things 100% share properties, then you can order them using my method (pool of properties, absolutely correlated properties). In this case everything happening in your mental state is mixed into a single process.
- If compared things partially share properties, then you can use a mix between "arithmetic" and my method. In this case everything happening in your mental state partially breaks down into separate processes.
So, "my orders + arithmetic orders" is something like a Turing machine: a universal model that can describe any thought/experience, any mental state. Of course, a Turing machine can describe anything my method can describe, but my method is more high-level.
Formalization
I know that what I described above doesn't automatically specify a mathematical model. But I think we should be able to formalize my idea easily enough. If not, then my idea is wrong.
We have those hints for formalization:
- The idea about the common pool of properties. Connection with probability.
- Connection with recursion.
- The idea of "negative objects" and "hyper objects". Connection with superrationality/splitting resources.
- Equivalence between orders of objects and properties of objects: any property of a bunch of objects can be turned into another bunch of objects.
- Equivalence of properties. Another equivalence between different orders. (See below.)
- We can test the formalization on comparing 3D shapes (maybe even 2D shapes). Easy to model and formalize.
- Connection to hypotheses, rationality. To Bayes' rule. (See below.)
- We can try a special type of brainstorming/spitballing based on my idea. (See below.)
To be honest, I'm bad at math. I based my theory on synesthesia-like experiences and conceptual ideas. But if the information above isn't enough, I can try to give more. I have experience of making my idea more specific, so I could guess how to make the idea even more specific (if we encounter a problem of formalization). Please, help me with formalizing this idea.
Equivalence of properties
(If examples with places confuse you too much, skip this part of the post.)
It's an important part of my "axiom" about the common pool of properties. It partially restates the "axiom" and partially strengthens it:
An object has only 1 property. All properties of an object are equivalent/correlated. All properties of an object are versions of a single property. An object is described by a "sum" of an infinity of versions of the same property. But the same property plays different roles in different objects, like a word in different contexts. Each object is "absolutely unique".
So, not only equivalent properties of different objects are correlated, but also different properties of a single object are correlated.
When I was talking about different orders above, you might've been asking "how do we choose the properties for an order, does our choice matter?". This part of the "axiom" is needed to answer that question.
Super-correlated form
You can describe properties of an object in a "super-correlated form": as if any property is caused by any other property. By itself it's just a linguistic trick. It doesn't reduce the complexity of description.
For example, imagine you see a small red cube in the desert. That's a random object in a random place. But you can see it like this:
- It's red because red doesn't contrast too much with yellow.
- It has sharp edges because the desert doesn't have sharp edges.
- It's solid and whole because the desert isn't solid and consists of particles.
- It's small so it contrasts with big empty space of the desert.
You could cook up a similar story about any other thing in the desert. And the story could go on forever. Each correlation has multiple versions and you can combine correlations into new ones.
It seems like a funny, but useless way to describe objects... but it's not useless when we combine it with our method of classification.
Examples
Let's get back to the paintings of Jacek Yerka: image.
We can describe all properties of those places in the "super-correlated form". For example, in terms of volume.
- Volume of the place is in the center of the place's world.
- The city sits on a side of a volume.
- The city creates a volume.
- The big mountain under the city has some volume.
- The place has layers of different types of volume.
- This place is inside of a volume created by itself.
- In a way the place sits of multiple sides of a volume (the volume of trees).
- The place has "fractal volume": the global volume of the forest, the volume between the trees, the volume inside the trees.
- The place sits on a side of the global volume.
- Most of the global volume of the place is empty.
- The balconies create pieces of volume.
- The place sits on a side of a volume (or the global volume) but it's fundamentally inverted compared to the Place 3.
- Volume of the place is in the center of the place's world.
- Smaller elements of the place (various wooden stuff) create the volume.
- Most of the volume of the place is empty, but in a different way compared to the Place 3.
I hope you get the idea. Any property of the place can be described in terms of "versions" of a single property. And the same property/version is woven into a unique context in every unique place.
If you leave only 2 versions of the property, you'll get a simple metric for ordering the places. For example, we may leave "Does the volume occupy the center of the world?" (center) and "Is the volume created by smaller pieces of the place?" (size): the places in the order go from "big and not in the center" to "small and in the center". Emphasis is on the size metric.
The same would happen if we chose many other pairs of versions. So, if the places fought for every version of the property, the result would converge to the same outcome. (But we can simplify the infinite calculation by "ignoring" most of the versions of the property. This way we skip an infinity of steps of the calculation.)
Or would it? What if we chose a different emphasis, for example?
Universe of universes
You may say that the result actually converges to an infinity of possible outcomes: but each outcome corresponds to its own order (normal/hyper, positive/negative, etc.).
Remember I said there's a universe of orders? You could say there's a universe of universes. Each universe has the same content (places), but this content is arranged in different ways. Because metrics of different universes have different emphases.
For example, this metric:
- The places in the order go from "big and not in the center" to "small and in the center". Emphasis is on the size metric.
Creates a universe with such orders: image. But this metric:
- The places in the order go from "big and not in the center" to "small and in the center". Emphasis is on the center metric.
Creates a universe with such orders: image. Negative order is equivalent to the positive order in a different universe.
Different universes can learn from each other, the same way like different orders do, I guess. If we share more or less the same priors, we don't risk to run into completely different universes. But if we do, we can understand each other anyway.
Onion layers
All properties of an object are versions of a single property. An object is described by a "sum" of an infinity of versions of the same property.
One way to imagine this is to imagine a fractal. A fractal has an infinity of scales and each scale has a slightly different "version" of fractal's properties (e.g. length, area). Like an infinite onion. Likely it isn't a perfect analogy, but I hope it helps.
P.S.: and I feel that Leibniz' Monadology should be mentioned again. His philosophy is the most similar thing to my idea.
(III) Composite substances or matter are "actually sub-divided without end" and have the properties of their infinitesimal parts (§65). A notorious passage (§67) explains that "each portion of matter can be conceived as like a garden full of plants, or like a pond full of fish. But each branch of a plant, each organ of an animal, each drop of its bodily fluids is also a similar garden or a similar pond". There are no interactions between different monads nor between entelechies and their bodies but everything is regulated by the pre-established harmony (§§78–9). Much like how one clock may be in synchronicity with another, but the first clock is not caused by the second (or vice versa), rather they are only keeping the same time because the last person to wind them set them to the same time. So it is with monads; they may seem to cause each other, but rather they are, in a sense, "wound" by God's pre-established harmony, and thus appear to be in synchronicity.
Leibniz uses the notion of a fractal and rejects the idea of causality (it will be important later) in favor of very-very strong global correlation.
Argumentation, hypotheses
You can apply the same idea (about the "common pool") to hypotheses and argumentation:
- You can describe a hypothesis in terms of any other hypothesis. You also can simplify it along the way (let's call it "regularization"). Recursion and circularity is possible in reasoning.
- Truth isn't attached to a specific hypothesis. Instead there's a common "pool of truth". Different hypotheses take different parts of the whole truth. The question isn't "Is the hypothesis true?", the question is "How true is the hypothesis compared to others?" And if the hypotheses are regularized it can't be too wrong.
- Alternatively: "implications" of a specific hypothesis aren't attached to it. Instead there's a common "pool of implications". Different hypotheses take different parts of "implications".
- Conservation of implications: if implications of a hypothesis are simple enough, they remain true/likely even if the hypothesis is wrong. You can shift the implications to a different hypothesis, but you're very unlikely to completely dissolve them. For example, if you derived some implications of moral realism being true, then those implications are true even if moral realism is wrong. Like a smile without a cat, implications without a belief.
- In usual rationality (hypotheses don't share truth) you try to get the most accurate opinions about every single thing in the world. You're "greedy". But in this approach (hypotheses do share truth) it doesn't matter how wrong you are about everything unless you're right about "the most important thing". But once you're proven right about "the most important thing", you know everything. A billion wrongs can make a right. Because any wrong opinion is correlated with the ultimate true opinion, the pool of the entire truth.
- You can't prove a hypothesis to be "too bad" because it would harm all other hypotheses. Because all hypotheses are correlated, created by each other. When you keep proving something wrong the harm to other hypotheses grows exponentially.
- Motivated reasoning is valid: truth of a hypothesis depends on context, on the range of interests you choose. Your choice affects the truth.
- Any theory is the best (or even "the only one possible") on its own level of reality. For example, on a certain level of reality modern physics doesn't explain weather better than gods of weather.
In a way it means that specific hypotheses/beliefs just don't exist, they're melted into a single landscape. It may sound insane ("everything is true at the same time and never proven wrong" and also relative!). But human language, emotions, learning, pattern-matching and research programs often work like this. It's just a consequence of ideas (1) not being atomic statements about the world and (2) not being focused on causal reasoning, causal modeling. And it's rational to not start with atomic predictions when you don't have enough evidence to locate atomic hypotheses.
Causal rationality, Descriptive rationality
You can split rationality into 2 components. The second component isn't explored. My idea describes the second component:
- Causal rationality. Focused on atomic independent hypotheses about the world. On causal explanations, causal models. Answers "WHY this happens?". Goal: to describe a specific reality in terms of outcomes. To find what's true "right now".
- Descriptive rationality. Focused on fuzzy and correlated hypotheses about the world. On patterns and analogies. Answers "HOW this happens?". Goal: to describe all possible (and impossible) realities in terms of each other. To find what's true "in the future".
Causal and Descriptive rationality work according to different rules. Causal uses Bayesian updating. Descriptive uses "the common pool of properties + Bayesian updating", maybe.
- "Map is not the territory" is true for Causal rationality. It's wrong for Descriptive rationality: every map is a layer of reality.
- "Uncertainty and confusion is a part of the map, not the territory". True for Causal rationality. Wrong for Descriptive rationality: the possibility of an uncertainty/confusion is a property of reality.
- "Details make something less likely, not more" (Conjunction fallacy). True for Causal rationality. Wrong for Descriptive rationality: details are not true or false by themselves, they "host" kernels of truth, more details may accumulate more truth.
- For Causal rationality, math is the ideal of specificity. For Descriptive rationality, math has nothing to do with specificity: an idea may have different specificity on different layers of reality.
- In Causal rationality, hypotheses should constrain outcomes, shouldn't explain any possible outcome. In Descriptive rationality... constraining depends on context.
- Causal rationality often conflicts with people. Descriptive rationality tries to minimize the conflict. I believe it's closer to how humans think.
- Causal rationality assumes that describing reality is trivial and should be abandoned as soon as possible. Only (new) predictions matter.
- In Descriptive rationality, a hypothesis is somewhat equivalent to the explained phenomenon. You can't destroy a hypothesis too much without destroying your knowledge about the phenomenon itself. It's like hitting a nail so hard that you destroy the Earth.
Example: Vitalism. It was proven wrong [LW · GW] in causal terms. But in descriptive terms it's almost entirely true. Living matter does behave very different from non-living matter. Living matter does have a "force" that non-living matter doesn't have (it's just not a fundamental force). Many truths of vitalism were simply split into different branches of science: living matter is made out of special components (biology/microbiology) including nanomachines/computers!!! (DNA, genetics), can have cognition (psychology/neuroscience), can be a computer (computer science), can evolve (evolutionary biology), can do something like "decreasing entropy" (an idea by Erwin Schrödinger, see entropy and life). On the other hand, maybe it's bad that vitalism got split into so many different pieces. Maybe it's bad that vitalism failed to predict reductionism. However, behaviorism did get overshadowed by cognitive science (some living matter did turn out to be more special than it could be). Our judgement of vitalism depends on our choices, but at worst vitalism is just the second best idea. Or the third best idea compared to some other version of itself... Absolute death of vitalism is astronomically unlikely and it would cause most of reductionism and causality to die too along with most of our knowledge about the world. Vitalism partially just restates our knowledge ("living matter is different from non-living"), so it's strange to simply call it wrong. It's easier to make vitalism better than to disprove it.
Perhaps you could call the old version of vitalism "too specific given the information about the world": why should "life-like force" be beyond laws of physics? But even this would be debatable at the time. By the way, the old sentiment "Science is too weak to explain living things" can be considered partially confirmed: 19th century science lacked a bunch of conceptual breakthroughs. And "only organisms can make the components of living things" is partially just a fact of reality: skin and meat don't randomly appear in nature. This fact was partially weakened, but also partially strengthened with time. The discovery of DNA strengthened it in some ways. It's easy to overlook all of those things.
In Descriptive rationality, an idea is like a river. You can split it, but you can't stop it. And it doesn't make sense to fight the river with your fists: just let it flow around you. However, if you did manage to split the river into independent atoms, you get Causal rationality.
2 types of rationality should be connected
I think causal rationality has some problems and those problems show that it has a missing component:
- Rationality is criticized for dealing with atomic hypotheses about the world. For not saying how to generate new hypotheses and obtain new knowledge. Example: critique by nostalgebraist. See "8. The problem of new ideas"
- You can't use causal rationality to be critical of causal rationality. In theory you should be able to do it, but in practice people often don't do it. And causal rationality doesn't model human argumentation, even for the most important topics such as AI safety. So we end up arguing like anyone argues.
- Doomsday argument, Pascal's mugging [? · GW]. Probability starts to behave weird when we add large numbers of (irrelevant) things to our world.
- The problem of modesty. Should you assume that you're just an average person?
- Weird addition in ethics. Repugnant conclusion, "Torture vs. Dust Specks" [? · GW].
- Causal rationality doesn't give/justify an ethical theory [? · GW]. Doesn't say how to find it if you want to find it.
- Causal rationality doesn't give/justify a decision theory [? · GW]. There's a problem with logical uncertainty (uncertainty about implications of beliefs).
I'm not saying that all of this is impossible to solve with Causal rationality. I'm saying that Causal rationality doesn't give any motivation to solve all of this. When you're trying to solve it without motivation you kind of don't know what you're doing. It's like trying to write a program in bytecode without having high-level concepts even in your mind. Or like trying to ride an alien device in the dark: you don't know what you're doing and you don't know where you're doing.
What and where are we doing when we're trying to fix rationality?
Brainstorming/spitballing
If you want to help with formalizing the idea, one thing we can do is... brainstorming/spitballing.
My approach is this: think about all possible connections between an idea and anything else. Any connection contains some kernel of truth, even the most crazy/improbable one. So, try to find the best way to aggregate and expand those kernels of truth. In order for this to work you need to give me information about formal ideas.
Equivalent perspective: there 100% guaranteed to be a useful true analogy between any idea A and any idea B. The only question is in how many pieces you need to split A (/how many times you need to slightly modify A) for it to become useful. This can be used for effective search of connections, I hope.
I split the connections I thought about into 5 categories. I aim for "2 bits of similarity" in my analogies: if 2 different details of 2 ideas are similar, then that's good enough.
1) Analogies about updating objects:
- Neural networks. "Everything affects everything", backpropagation (made a mistake? go back and alter something). Possible connection: you may compare neurons to objects and their weights to the common property (probability). Identity of objects is a landscape that gets adjusted (step by step) to fit reality. More specifically, you can compare different layers of neurons to different orders of objects (see [LW · GW]).
- "Iterated Amplification" by Paul Christiano. It's about, for example, compressing a tree into a single node. Possible connection: nodes are like objects, each object gets compressed into any other object. You also can compress orders into properties for other orders.
- I tried to suggest a recursive learning method for a neural network here [LW · GW]. What I suggested may 100% fail to work, but I think the conceptual idea is worth considering. Especially in context of this post. All ideas here are just a more fundamental version of that suggestion. (conversation on Reddit, No Participation link) Sorry if I sounded too arrogant in my LW-post, I'm confident in my idea because it's based on my experience.
2) Analogies about locality in a space:
- Maybe you can compare negative orders and hyper orders to negative probability and complex probability. Each order works the same way, so how can they be described by different probabilities? Maybe the answer is that "negative" and "hyper" are relative terms: when you're in a specific order places from other orders seem like having absurd negative and complex values... but inside of those orders the values are not negative and not complex. It's like the arbitrary choice of basis vectors.
- k-nearest neighbors algorithm. Possible connection: in k-NN objects fight for identity of other objects. E.g.: more blue objects cause more nearby objects to be classified as blue. The opposite happens in my idea.
- Logarithmic scale, gravity in general relativity. Possible connection: those concepts of non-linear spaces may help to explain the rule of distributing a property (probability) between objects.
- Principal component analysis. PCA seeks the most uncorrelated properties. My method of classification seeks the most correlated. PCA slices an object into pieces/layers of details and stores them. My method slices an object into pieces/layers and tells you how to transform them into one another. Maybe my method is an addition to the PCA. (a conversation on Reddit, No Participation link; I haven't come up with the idea of using probabilities at the moment of the conversation)
- Word embedding. Likely connection: "the common pool of properties" (orders of objects) should present some alternative to this representation of objects. Word embeddings memorize context, but don't model the change of context. Orders of objects do.
3) Analogies about identity, extra correlations and extra uncertainty in decisions:
- Counterfactual worlds can be compared to negative and hyper orders. And to universes [LW · GW] of orders.
- Decision theories [? · GW] beyond Causal (CDT) and Evidental (EDT), i.e. TDT/UDT/FDT. Reasons to expect some connection: unusual decision theories deal with questions of identity, extra correlations, unusual uncertainty about impossible possibilities (logical uncertainty).
- Causal decision theory vs. Evidental decision theory and others. Possible connection: Maybe you can compare the conflict/overlap between decision theories to that between Causal and Descriptive rationality.
- Dempster–Shafer theory. I should mention that. "Generalisation of Bayesian theory". Possible connections: it's about combining beliefs from different sources (1), it deals with power sets (2). Maybe you can compare (1) and (2) to combining probabilities (properties) of different objects.
Note: I'm 100% sure that questions posed by exotic decision theories are not purely theoretical. E.g. thinking "What my actions tell about me? What kind of person should I be?" may be a part of human thinking.
4) Analogies about paths, graphs:
- Dijkstra's algorithm for finding the shortest paths. Possible connection: you can compare paths to objects, nodes to parts of properties "in the common pool". Different objects fight for those properties.
- Electric circuits, hydraulic analogy. Possible connection: you can compare electricity (or water) to properties in the common pool. Potentially useful hydraulic analogy: connected water containers on different levels, the water is like the common property (probability).
- Markov chain. Possible connection: you can compare states to objects and their probability to the common property.
- Three utilities problem. Possible connection: houses fight for utilities by splitting the common ground, we could compare houses to objects and utilities to properties. And different shapes of lines are different degrees of probabilities. Shapes that use 3D (torus) or twisted (Möbius strip) properties of the graph are negative and complex degrees of probability.
5) Analogy about combining different types of relativity/dynamics:
- Quantum gravity. How to make spacetime dynamic and matter uncertain at the same time? How to deal with infinities (of self-actions)? Possible connection: (1) it's about different types of relative/dynamic/uncertain things (2) it's about self-actions.
- There's a possibility of comparing "the common pool of properties" to more specific physical concepts. E.g., you can compare orders of objects to both (connected) universes and wave functions. See universal wavefunction. And maybe you can compare adding objects in an order or splitting orders to quantization (e.g. the black body problem). Compare "ultraviolet catastrophe" to separating objects and orders.
- Computational complexity theory. That's kind of about uncertainty (complexity). Possible connection: Organizing objects in an order is a problem. Organizing objects in a "hyper order" is a more complicated problem (relative to the normal order's algorithm). So, the universe of all orders is the universe of complexity of inputs, the universe of algorithms that can easily solve specific sets of inputs. And it may be analogous to human thinking: different people solve different problems more easily. And maybe you can predict what algorithms you need for solving something effectively. Know how similar outputs of different algorithms will be.
Not saying that my idea really has anything to do with Physics (or formal complexity theory). Just saying that physicists have experience of dealing with different uncertain/dynamic things. I believe that any idea tries to describe the entire reality and there's no sense in trying to stop it. But the key word is "trying".
I think we shouldn't search all those categories too deep anyway, they're just possible reference points.
P.S.: this post replaces and advances everything I wrote here [LW · GW] and here [LW · GW]. Here [LW · GW] I tried to introduce probability for comparing ideas, but didn't have the "axiom". And I think that "super-correlated" [LW · GW] descriptions of objects and recursion [LW · GW] can be important for learning human values (see this argument [LW · GW] and those thought experiments [LW · GW]). I'm not asking you to read those links, I leave them here just in case, for context.
0 comments
Comments sorted by top scores.