Posts
Comments
I should have written "algebraic complement", which becomes logical negation or set-theoretic complement depending on the model of the theory.
Anyway, my intuition on why open sets are an interesting model for concepts is this: "I know when I see it" seems to describe a lot of the way we think about concepts. Often we don't have a precise definition that could argue all the edge case, but we pretty much have a strong intuition when a concept does apply. This is what happens to recursively enumerable sets: if a number belongs to a R.E. set, you will find out, but if it doesn't, you need to wait an infinite amount of time. Systems that take seriously the idea that confirmation of truth is easy falls under the banner of "geometric logic", whose algebraic model are frames, and topologies are just frames of subsets. So I see the relation between "facts" and "concepts" a little bit like the relation between "points" and "open sets", but more in a "internal language of a topos" or "pointless topology" fashion: we don't have access to points per se, only to open sets, and we imagine that points are infinite chains of ever precise open sets
It also happens to me when I got to solve a problem that many have and realize in retrospetct that it was a combination of luck, knowing the right people and skills that you don't know how to transfer, possibly because they are genetic traits. It must be frustrating to hear, after a question like "how have you conquered your social anxiety?", the condensed answer "mostly luck"
On the other hand, it makes you think when you realized how much these kinds of social status booster have permeated every step of the hierarchical ladder of any large organization... and yet, somehow, things still work out
There is, at least at a mathematical / type theoretic level.
In intuitionistic logic, is translated to , which is the type of processes that turn an element of into an element of , but since is empty, the whole is absurd as long as is istantiated (if not, then the only member is the empty identity). This is also why constructively but not
Closely related to constructive logic is topology, and indeed if concepts are open set, the logical complement is not a concept. Topology is also nice because it formalizes the concept of edge case
One thing to remember when talking about distinction/defusion is that it's not a free operation: if you distinguish two things that you previously considered the same, you need to store at least a bit of information more than before. That is something that demands effort and energy. Sometimes, you need to store a lot more bits. You cannot simply become superintelligent by defusing everything in sight.
Sometimes, making a distinction is important, but some other times, erasing distinctions is more important. Rationality is about creating and erasing distinctions to achieve a more truthful or more useful model.
This is also why I vowed to never object that something is "more complicated" if I cannot offer a better model, because it's always very easy to inject distinctions, the harder part is to make those distinctions matter.
I don't think you need the concept of evidence. In Bayesian probability, the concept of evidence is equivalent to the concept of truth; both in the sense that P(X|X) = 1, whatever you consider evidence is true, but also P(X) = 1 --> P(A /\ X) = P(A|X), you can consider true sentences as evidence without changing anything else.
Add to this that good rationalist practice is to never assume that anything is P(A) = 1, so that nothing is actually true or actually an evidence. You can do epistemology exclusively in the hypotethical: what happens if I consider this true? And then derive consequences.
Well, I share the majority of your points. I think that in 30 years millions of people will try to relocate in more fertile areas. And I think that not even the firing of the clathrate gun will force humans to coordinate globally. Although I am a bit more optimist about technology, the actual status quo is broken beyond repair
The fact is surprising when coupled with the fact that particles do not have a definite spin direction before you measure it. The anti-correlation is maintained non-locally, but the directions are decided by the experiment.
A better example is: take two spheres, send them far away, then make one sphere spin in any orientation that you want. How much would you be surprised to learn that the other sphere spins with the same axis in the opposite directions?
How probable is that someone knows their internal belief structure? How probable is that someone who knows their internal belief structure tells you that truthfully instead of using a self-serving lie?
The causation order in the scenario is important. If the mother is instantly killed by the truck, then she cannot feel any sense of pleasure after the fact. But if you want to say that the mother feels the pleasure during the attempt or before, then I would say that the word "pleasure" here is assuming the meaning of "motivation", and the points raised by Viliam in another comment are valid, it becomes just a play on words, devoid of intrinsic content.
So far, Bayesian probability has been extended to infinite sets only as a limit of continuous transfinite functions. So I'm not quite sure of the official answer to that question.
On the other hand, what I know is that even common measure theory cannot talk about the probability of a singleton if the support is continuous: no sigma-algebra on supports the atomic elements.
And if you're willing to bite the bullet, and define such an algebra through the use of a measurable cardinal, you end up with an ultrafilter that allows you to define infinitesimal quantities
Under the paradigm of probability as extended logic, it is wrong to distinguish between empirical and demonstrative reasoning, since classical logic is just the limit of Bayesian probability with probabilities 0 and 1.
Besides that, category theory was born more than 70 years ago! Sure, very young compared to other disciplines, but not *so* young. Also, the work of Lawvere (the first to connect categories and logic) began in the 70's, so it dates at least forty years back.
That said, I'm not saying that category theory cannot in principle be used to reason about reasoning (the effective topos is a wonderful piece of machinery), it just cannot say that much right now about Bayesian reasoning
Yeah, my point is that they aren't truth values per se, not intuitionistic or linear or MVs or anything else
I've also dabbled into the matter, and I have two observation:
- I'm not sure that probabilities should be understood as truth values. I cannot prove it, but my gut feeling is telling me that they are two different things altogether. Sure, operations on truth values should turn into operations on probabilities, but their underlying logic is different (probabilities after all should be measures, while truth values are algebras)
- While 0 and 1 are not (good) epistemic probabilities, they are of paramount importance in any model of probability. For example, P(X|X) = 1, so 0/1 should be included in any model of probability
The way it's used in the set theory textbooks I've read is usually this:
- define a function successor on a set S:
- assume the existence of an inductive set that contains a set and all its successors. This is a weak and very limited form of infinite induction.
- Use Replacement on the inductive set to define a general form of transfinite recursion.
- Use transfinite recursion and the union operation to define the step "taking the limit of a sequence".
So, there is indeed the assumption of a kind of infinite process before the assumption of the existence of an infinite set, but it's not (necessarily) the ordinal . You can't also use it to deduce anything else, you still need Replacement. The same can be said for the existence and uniqueness of the empty set, which can be deduced from the axioms of Separation.
This approach is not equivalent nor weaker to having fiat transfinite recursion , it's the only correct way if you want to make the least amount of new assumptions.
Anyway, as far as I can tell, having a well defined theory of sets is crucial to the definitions of surreals, since they are based on set operations and ontology, and use infinite sets of every kind.
On the other hand, I don't understand your problem with the impredicativity of the definitions of the surreals. These are often resolved into recursive definitions and since ZF-sets are well-founded, you never run into any problem.
> Transfinite induction does feel a bit icky in that finite prooflines you outline a process that has infinitely many steps. But as limits have a similar kind of thing going on I don't know whether it is any ickier.
Well, transfinite induction / recursions is reduced to (at least in ZF set theory) the existence of an infinite set and the Replacement axioms (a class function on a set is a set). I suspect you don't trust the latter.
The first link in the article is broken...
Obviously, only the wolves that survive.
Beware of the selection bias: even if veterans show more productivity, it could just be because the military training has selected those with higher discipline
The diagram at the beginning is very interesting. I'm curious about the arrow from relationship to results... care to explain? It refers to joint works or collaborations?
On the other hand, it's not surprising to me that AI alignment is a field that requires much more research and math than software writing skills... the field is completely new and not very well formalized yet, probably your skill set is misaligned with the need of the market
> The first thing that you must accept in order to seek sense properly is the claim that minds actually make sense
This is somewhat weird to me. Since Kahneman & Tverski, we know that system 2 is mostly good at rationalizing the actions taken by system 1, to create a self-coherent narrative. Not only thus minds generally don't make any sense, my minds in general lacks any sense. I'm here just because my system 1 is well adjusted to this modern environment, I don't *need* to make any sense.
From this perspective, "making sense" appears to be a tiring and pointless exercise...
Isn't "just the right kind of obsession" a natural ability? It's not that you can orient your 'obsessions' at will...
Two of my favorite categories show that they really are everywhere: the free category on any graph and the presheaves of gamma.
The first: take any directed graph, unfocus your eyes and instead of arrows consider paths. That is a category!
The second: take any finite graph. Take sets and functions that realize this graph. This is a category, moreover you can make it dagger-compact, so you can do quantum mechanics with it. Take as the finite graph gamma, which is just two vertex with two arrows between them. Sets and functions that realize this graph are... any graph! So, CT allows you to do quantum mechanics with graphs.
Amazing!
Lambda calculus is though the internal language of a very common kind of category, so, in a sense, category theory allows lambda calculus to do computations not only with functions, but also sets, topological spaces, manifolds, etc.
While I share your enthusiasm toward categories, I find suspicious the claim that CT is the correct framework from which to understand rationality. Around here, it's mainly equated with Bayesian Probability, and the categorial grasp of probability or even measure is less than impressive. The most interesting fact I've been able to dig up is that the Giry monad is the codensity monad of the inclusion of convex spaces into measure spaces, hardly an illuminating fact (basically a convoluted way of saying that probabilities are the most general ways of forming convex combinations out of measures).
I've searched and searched for categorial answers or hints about the problem of extending probabilities to other kinds of logic (or even simply extending it to classical predicate logic), but so far I've had no luck.
The difference between the two is literally a single summation, so... yeah?
I'd like to point out a source of confusion around Occam's Razor that I see you're falling for, dispelling it will make things clearer: "you should not multiplicate entities without necessities!". This means that Occam's Razor helps decide between competing theories if and only if they have the same explanation and predictive power. But in the history of science, it was almost never the case that competing theories had the same power. Maybe it happened a couple of times (epicycles, the Copenhagen interpretation), but in all other instances a theory was selected not because it was simpler, but because it was much more powerful.
Contrary to popular misconception, Occam's razor gets to be used very, very rarely.
We do have, anyway, a formalization of that principle in algorithmic information theory: Solomonoff induction. A agent that, to predict the outcome of a sequence, places the highest probabilities in the shortest compatible programs, will eventually outperform every other class of predictor. The catch here is the word 'eventually': in every measure of complexity, there's a constant that offset the values due to the definition of the reference universal Turing machine. Different references will indicate different complexities for the same first programs, but all measure will converge after a finite amount.
This is also why I think that the problem explaining thunders with "Thor vs clouds" is such a poor example of Occam's razor: Solomonoff induction is a formalization of Occam razor for theories, not explanations. Due to the aforementioned constant, you cannot have absolutely simpler model of a finite sequence of event. There's no such a thing, it will always depend on the complexity of the starting Turing machine. However, you can have eventually simpler models of infinite sequence of events (infinite sequence predictor are equivalent to programs). In that case, the natural event program will prevail because it will allow to control better the outcomes.
I arrived at the same conclusion when I tried to make sense of the Metaethics Sequence. My summary of Eliezer's writings is: "morality is a bunch of mental computations shared between most human beings". Morality thus grew out of our evolutive history, and it should not be surprising that in extreme situations it might be incoherent or maladaptive.
Only if you believe that morality should be like systematic and universal and coherent, then you can say that extreme examples are uncovering something interesting about peoples' morality.
Otherwise, extreme situations are as interesting as saying that people cannot mentally factor long numbers.
First of all, the community around LW2.0 can only be loosely associated to a movement: I don't think there's anyone that explicitly endorses *every* technique or theory appeared here. LW is not CFAR, is not the Alignment forum, etc. So I would caution against enticing someone into LW by saying that the community supports this or that technique.
The main advantage of rationality, in its present stage, is defensive: if you're aspiring to be rational, you wouldn't waste time attending religious gatherings that you despise; you wouldn't waste money buying ineffective treatments (sugar pills, crystals, etc.); you wouldn't waste resources following people that mistake fiction for facts. At the moment, rationality is just a very good filter for every product, knowledge and praxis that society presents to you (hint: 99% of those things is crap).
On the other hand, what you can or should do with all the resources you're not wasting, is something rationality cannot answer in full today. Metaethics and akrasia are, after all, the greatest unsolved problems of our community.
There were notorious attempts (e.g. Torture vs Dust specks or the Basilisk), but nothing has emerged with the clarity and effectiveness of Bayesian reasoning. Effective Altruism and MIRI are perhaps the most famous examples of trying to solve the most pressing problems. A definitive framework though still eludes us.
In Foerster's paper, he links the increase in productivity linearly with the increase in population. But Scott has also proposed that the rate of innovation is slowing down, due to a logarithmic increase of productivity from population. So maybe Foerster's model is still valid, and 1960 is only the year where we exhausted the almost linear part of progress (the "low hanging fruits").
Perhaps nowadays we combine the exponential growth of population from population with the logarithmic increase in productivity, to get the linear economic growth we see.
Algebraic topology is the discipline that studies geometries by associating them with algebraic objects (usually, groups or vector spaces) and observing how changing the underlying space affects the related algebras. In 1941, two mathematicians working in that field sought to generalize a theorem that they discovered, and needed to show that their solution was still valid for a larger class of spaces, obtained by "natural" transformations. Natural, at that point, was a term lacking a precise definition, and only meant something like "avoiding arbitrary choices", in the same way a vector space is naturally isomorphic to its double dual, while it's isomorphic to its dual only through the choice of a basis.
The need to make precise the notion of naturality for algebraic topology led them to the definition of natural transformation, which in turn required the notion of functor which in turn required the notion of category.
This answers questions 1 and 2: category theory was born to give a precise definition of naturality, and was sought to generalize the "universal coefficient theorem" to a larger class of spaces.
This story is told with a lot of details in the first paragraphs of Riehl's wonderful "Category theory in context".
To answer n° 3, though, even if category theory was rapidly expanding during the '50s and the '60s, it was only with the work of Lawvere (who I consider a genius on par with Gödel) in the '70s that it became a foundational discipline: guided by his intuitions, category theory became the unifying language for every branch of mathematics, from geometry to computation to logic to algebras. Basically, it showed how the variety of mathematical disciplines are just different ways to say the same thing.
Is it really quite different, besides halo effect? It strongly depends on the detail, though if the two say the exact same thing, how are things different?
The concept of "fake framework", elucidated in the original post, to me it seems one of a model of reality that hides some complexity, sometimes even to the point of being very wrong, but that is nonetheless useful because it makes some other complex area manageable.
On the other hand, when I read the quotes you presented, I see a rich tapestry of metaphors and jargon, of which the proponent himself says that they can be wrong... but I fail completely to see what part of reality they make manageable. These frameworks seems to just add complexity to complexity, without any real leverage over reality. This makes those frameworks draw nearer fiction, rather than useful but simplified models.
For example, if there's no post-rational stage of developement, what use is the advice of not confusing it with a pre-rational stage of developement? If Enlightenment is not a thing, what use is the exortation to come up with a chronologically robust definition of the same?
This to me is the most striking difference between "Integral spirituality" and say a road map. With the road map, you know exactly what is hidden and why, and it's evident how to use it. With Wilber's framework, it seems exactly the opposite.
Maybe this is due to of my unfamiliarity with that material... so someone who has effectively found out something useful out of that model can chime in and tell their experience, and I will stand corrected.
I'm sorry, but you cannot really learn anything from one example. I'm happy that your parents are faring well in their marriage, but if they didn't would you have learned the same thing?
I've consulted a few statistics on arranged marriage, and they all are:
- underpowered
- showing no significative difference between autonomous and arranged marriages
The latter part is somewhat surprising for a Westerner, but given what you say, the same should be said for an Indian coming from your background.
The only conclusion I can draw fairly conclusively is that, for a long term relationship, the way or the why it started doesn't really matter.
Are you familiar with the concept of fold/unfold? Folds are functions that consume structures and produce values, while unfolds do the opposite. The composition of an unfold plus a fold is called a hylomorphism, of which the factorial is a perfect example: the unfold creates a list from 1 to n, the fold multiplies together the entire list. Your section on the "two-fold recursion" is a perfect description of a hylomorphism: you take a goal, unfold it into a plan composed of a list of micro-steps, then you fold it by executing each one of the micro-steps in order.
Luke already wrote that there are at least four factors that feed motivation, and the expectation of success is only one of them. No amount of expectancy can increment drive if other factors are lacking, and as Eliezer notice, it's not sane to expect only one factor to be 10x the others so that it alone powers the engine.
What Eliezer is asking is basicall if anyone has solved the basic coordination problem of mankind, and I think he knows very well that the answer to his question is no. Also, because we are operating in a relatively small mindspace (humans' system 1), the fact that no one solved that problem in hundreds of thousands of years of cooperation points strongly toward the fact that such a solution doesn't exist.
Re: the third point, I think it's important to differentiate between and , where is the true prediction, that is what actually happens when an agent performs the action .
is simply the outcome the agent is aiming at, while is the outcome the agent eventually gets. So maybe it's more interesting a measure of similarity in , from which you can compare the two.
Let's say that is the set of available actions and is the set of consequences. is then the set of predictions, where a single prediction associates to every possible action a consequence. is then a choice operator, that selects for each prediction an action to take.
What we have seen so far:
- There's no 'general' or 'natural' choice operator, that is, every choice operator must be based on at least a partial knowledge of the domain or the codomain;
- Unless the possible consequences are trivial, a choice operator will choose the same action for many different predictions, that is a choice operator only uses certain feature of the predictions' space and is indifferent to anything else [1];
- A choice operator defines naturally a 'preferred outcome' operator, which is simply the predicted outcome of the chosen action, and is defined by 'sandwiching' the choice operator between two predictions. I just thought interleave is a better name than sandwich. It's of type .
[1] To show this, let be a partition of and let be the equivalence relation uniquely generated by the partition. Then
I wonder if there are any plausible examples of this type where the constraints don't look like ordering on B and search on A.
Yes, as I shown in my post, such operators must know at least an element of one of the domains of the function. If it knows at least an element of A, a constant function on that element has the right type. Unfortunately, it's not much interesting.
It's interesting to notice that there's nothing with that type on hoogle (Haskell language search engine), so it's not the type of any common utility.
On the other hand, you can still say quite a bit on functions of that type, drawing from type and set theory.
First, let's name a generic function with that type . It's possible to show that k cannot be parametric in both types. If it were, would be valid, which is absurd ( has an element!). It' also possible to show that if k is not parametric in one type, it must have access to at least an element of that type (think about and ).
A simple cardinality argument also shows that k must be many-to-one (that is, non injective): unless B is 1 (the one element type),
There is an interesting operator that uses k, which I call interleave:
Trivially,
It's interesting because partially applying interleave to some k has the type , which is the type of continuations, and I suspect that this is what underlies the common usage of such operators.
The difference would be that I'm doing it more for myself than for those out there, because I don't expect my youtube video to get out much.
I also don't know if I'll get some attention, I'm doing that entirely for myself: to leave a legacy, to look back and say that I too did something to raise the sanity waterline.
My biggest hurdle currently is video editing.
My motto: "think big, act small, move quickly". I know that my first videos will suck, I've prepared to embrace suckiness and plunge forward anyway.
Honestly, I'm not sure how explaining Bayesian thinking will help people with understanding media claims.
Sometimes important news are based entirely on the availability bias or the base rate fallacy: knowing them is important to cultivate a critical view of media. To understanding why they are wrong you need probabilistic reasoning. But media awareness is just an excuse, a hook to introduce Bayesian thinking, which will allow me to also talk about how to construct a critical view of science.
These are all excellent tips, thank you!
A much, much easier think that still works is P(sunrise) = 1, which I expect is what ancient astronomers felt about.
That entirely depends on your cosmological model, and in all cosmological models I know, the sun is a definite and fixed object, so usually
From what I've understood of the white paper, there's no transaction fee because, instead of rewarding active nodes like in the blockchain, the Tangle punishes inactive nodes. So when a node performes few transactions, other nodes tends to disconnect from it and in the long run an inactive node will be dropped entirely.
On the other hand, a node has only a partial copy of the entire Tangle at each time, so it is possible to keep it small even when the total volume is large.
Economically, I don't know if switching from incentives to partecipate to punishments for leaving makes sense.
With the magic of probability theory, you can convert one into the other. By the way, you yourself should search for evidence that you're wrong, as any honest intellectual would do.
This might be a minor or a major nitpick, depending on your point of view: Laplace rule works only if the repeated trials are thought to be independent of one another. That is why you cannot use it to predict sunrise: even without accurate cosmological model, it's quite clear that the ball of fire rising up in the sky every morning is always the same object. But what prior you use after that information is another story...
This is a standard prediction since the unconscious was theorized more than a century ago, so unfortunately it's not good evidence that the model is correct. Unfortunately, if what you've written is the only things that the list has to say, then I would say that no, this is not worth pursuing.
In a vein similar to Erfeyah's comment, I think that your model needs to be developed much more. For example, what predictions does it make that are notably different from other psychological models? It's just an explanation that feels too "overfitted".