Posts
Comments
Thanks for the link MakoYass.
I am familiar with the concept of superrationality, which seems similar with what you are describing. The lack of special relationship between observer moments--let's call it non-continuity--is also a common concept in many mystical traditions. I view both of these concepts as different than the concept of unity, "we are all one".
Superrationality combines a form of unity with a requirement for rationality. I could think that "we are all one" without thinking that we should behave rationally. If I thought, "we are all one" and and also that "one ought to be rational", the behavior that results might be described as superrational.
Non-continuity is orthogonal to unity. I could think "we are distinct" and still think "I only exist in the moment". This might have been the view of Heraclitus. But I could also think "we are one" and also think "we only exist in the moment." This might be a natural view to have if you think of the universe as an amplitude distribution over a large number of quantum states that is evolving according to some transition function. If you identify with a particular quantum state, then there is no sense in which you have a unique "past" or "future" path, because all "moments" (states) are concurrent: the only thing that is changing is the amplitude flow.
I am not familiar with those concepts. References would be appreciated. 🙏
It seems obvious that your change in relationship with suffering constitutes a kind of value shift, doesn't it?
This is not obvious to me. In the first place, I never had the value "avoid suffering" even before I started my practices. Since before I even knew the concept of suffering, I have always had the compulsion of avoiding suffering, but the value to transcend it.
What's your relationship with value drift? Are you unafraid of it? That gradual death by mutation? The infidelity of your future self?
I am afraid of value drift, but I am even more afraid that the values that I already have are based on incoherent thinking and false assumptions, which, once exposed, would lead me to realize that I have been spending my life in pursuing the wrong things entirely.
Because I am afraid of both value drift and value incoherence, I place a high priority in learning how I can upgrade my understanding of my own values, while at the same time being very cautious about which sources I trust and learn from. I cannot seek to improve value coherence without making myself vulnerable to value drift. Therefore, I only invest in learning from sources authored by people who appear to be aligned with my values.
Do you see it as a kind of natural erosion, a more vital aspect of the human telos than the motive aspects it erodes?
No, I do not think that value drift is inevitable, nor do I think the "higher purpose", if such a thing exists, involves constantly drifting. My goal is to achieve a state of value constancy.
Anyone, it seems, can have the experience of “feeling totally fine and at ease while simultaneously experiencing intense … pain”[1]:
It would greatly please me if people could achieve a deeper understanding of suffering just by taking analgesics. If that were the case, perhaps we should encourage people to try them just for that purpose. However, I'm guessing that the health risks, especially cognitive side-effects (a reduction of awareness that would preclude the possibility of gaining any such insight), risks of addiction and logistical issues surrounding the distribution of drugs for non-medical purposes will render infeasible any attempt to systematically employment of analgesics for the purpose of spiritual insight. In all likelihood, we'll be stuck with the same old meditations and pranayamas and asanas for a while.
But the reason you bring up the topic of analgesics, if I am not mistaken, is to challenge the legitimacy of my insight by an argument that boils down to: "the experience you describe could be obtained through drugs, so it must not be that profound". I do not know if you were also expecting to rely on a negative halo effect of "drug usage" to augment your rhetoric, but as you may have guessed from the preceding paragraph, my opinion is that the negative connotations of drug-induced states is due to irrational associations. If we ignore the drugs, then the remaining constituent of your rhetoric is the underlying assumption that "any easily obtained insight must be trivial." That is far from the truth. I believe there are many simple things that people could do, which would profoundly increase their wisdom at a very low cost [1]. But precisely because these things are so simple, people wouldn't take them seriously even if somebody suggested it to them. (The chance is a bit higher, but still not terribly high, if it's said by the teacher of an expensive paid workshop, or their guru, or their psychotherapist. But the most effective way so far to get people to do these kinds of simple things is to integrate them inside some elaborate social ritual.)
But concluding from this that “there’s no such thing as suffering” is a conceptual confusion of the highest order—and not some insight into deep Truth.
I agree that the statement "there’s no such thing as suffering" is false, and not any kind of insight into deep Truth.
That's because I am not claiming that "there's no such thing as suffering." I claim to have an insight which can't be described in words, but the best verbal description of this insight is something like "suffering is an illusion."
I don't even consider this insight to be particularly deep. Like you said, maybe you could get it by taking painkillers. Certainly not an insight into deep Truth. That is not to say that I don't take other people's suffering seriously--far from it, it concerns me greatly. However if you were to compare the difficulty of understanding suffering and the difficulty of understanding consciousness, I think suffering is a far easier problem to resolve.
ETA: And it seems to me to be far from obvious, that it is good or desirable to voluntarily induce in yourself a state akin to a morphine high or a lobotomy… especially if doing so has the additional consequence of leading you into the most elementary conceptual errors
You are wise to be cautious, because neural self-modification could potentially lead to states where one loses all concern for one's own well-being. However, just because an altered state of consciousness is similar to a drug-induced state or a state of neurological impairment doesn't, by itself, imply that it should be avoided. It all depends on whether you've taken appropriate steps to control the risk (e.g. by only accessing the state under the guidance of an experienced teacher), and what insight you stand to gain by experiencing that state. The states of consciousness may be the same, but the intentions and degree of control makes all the difference. Recreational drug users pursue these states with little or no understanding of the process, little or no control over the outcomes, and out of the intention of thrill-seeking, social bonding, or alleviating boredom. Mystics pursue these states, often backed up by a tradition which has precise knowledge of how to attain these states, are equipped with mental tools to control the process, and seek these states with the intention of obtaining insights that will be of enduring value to their lives.
[1] Such as: take just one hour to think and reflect. Dance with abandon. Volunteer to take care of children. Go to the forest, just to look around. Play in the mud. Fast for one day. Learn and go see where your food comes from, how your clothes are made.
I would assign that a probability less than 0.1, and that's because I already experienced some insights which defy verbal transmission. For instance, I feel that I am close to experientially understanding the question of "what is suffering?" The best way I can formulate my understanding into words is, "there is no such thing as suffering. It is an illusion." I don't think additional words or higher-context instructions would help in conveying my understanding to someone who cannot relate to the experience of feeling totally fine and at ease while simultaneously experiencing intense physical and emotional pain.
I don't think Buddha ever attempted to describe the Truth in words. Sometimes he would give a koan to a student who just needed a little push. But most of his sutras were for giving the instructions for how students could work at the Truth, and also just practical advice on how to live skillfully.
I'm reducing my subjective probability that you will abandon rationality...
I suppose what you are attempting is similar to what Buddha did in the first place. The sages of his time must have felt pained to see their beautiful non-dualism sliced and diced into mass-produced sutras, rather than the poems and songs and mythology which were, up until then, the usual vehicle of expression for these truths.
I guess I'm just narcissistic enough to still be a Quinean naturalist and say 'yep, that is also me.'
Considering God to be part of yourself is very elevated and good. The only problem is the things that you don't consider to be part of yourself. So I guess what I am saying is that you should amplify your narcissism :)
The truths of General Relativity cannot be conveyed in conventional language. But does one have to study the underlying mathematics before evaluating its claims?
Just as there exists a specialized language that accurately conveys General Relativity, there similarly exists a specialized language (mythological language) for conveying mystical truths. However, I think the wrong approach would be to try to understand that language without having undergone the necessary spiritual preparation. As St. Paul says in 1 Corinthians 2:14
The natural person does not accept the things of the Spirit of God, for they are folly to him, and he is not able to understand them because they are spiritually discerned.
This echoes countless similar statements in other traditions, the most famous (and probably oldest) being "the Tao that can be told is not the eternal Tao."
That is not to say that one cannot approximate the truth by means of analogy, You can approximately capture the truth General Relativity in the statement, "gravity bends space." This approximation of the truth is useful because it allows you to understand certain consequences of that truth, such as gravitational lensing. Hence, even someone untrained in physics can be convinced of General Relativity, because they understand an approximate version of it, which in turn intuitively explains phenomena such as the Hubble Space telescope photo of a horseshoe Einstein ring.
Likewise, approximations to the Truth abound in the various spiritual traditions. "God exists, and is the only entity that exists. I am God, you are God, we are all one being" is one such approximation [1]. It is an approximation because the words "you" and "God" are not well-defined. My own definition of these terms has been continually evolving as I progress in spirituality.
One consequence of this approximation is that the feeling that we are separate individuals must be flawed. I will take another analogy from physics: the four fundamental forces are in fact different aspects of the same unified force, but they become distinct at lower energy levels. At higher energy levels, they become clearly unified.
Similarly, I have experienced that at high levels of awareness, my feeling of distinctness from other people and from the rest of the universe is reduced. It starts to seem like "my thoughts" and "my feelings" are not mine, they are just the thoughts generated by one particular mind, and furthermore I feel like I can start to feel what others are feeling. I have encountered others who appear to be even higher on the "energy scale." One day I was greeted extremely warmly out of the blue by a homeless man who was just working on the street. I responded to him in kind.
A consequence of the statement "we are all One" is that we should be able to experience this unity. If there exist people who experience this as a reality (and not just as an altered state,) they should be able to detect the thoughts and feelings of others around them. I find it plausible that such people exist, both from my reading and my encounters with people such as the homeless man. But it does bother me that there exists no known scientific mechanism that would enable us to read each other's minds, other than some very speculative ideas about consciousness being based on quantum phenomenon.
I do not expect that this particular example should be particularly convincing to a skeptic. I know that there exist non-mystical theories for explaining non-dualistic states, for instance Jill Bolte Taylor's theory that it is caused by a switch in dominance from the left to the right hemisphere. What ultimately convinced me to 'cross over' was not a single experience or insight but rather the aggregation of many experiences: listing all of these would distract from the point I am trying to making. I suggest any curious individuals to consult the much richer collection of data on personal experiences that exists in the religious studies literature; my own experiences are nothing special in comparison.
My aim for now was just to address your question about how a claim can be evaluated in the absence of the necessary cognitive framework to understand its content. To summarize, one limited form of evaluation can be obtained by learning of the different approximations of the truth, and then evaluating consequences of those approximations in comparison to empirical data.
That said, at a certain stage of maturity, one who is seeking the truth should stop bothering with approximations, because the approximations will not give you that necessary cognitive framework to really understand the truth. Reading popular science books can never give you the understanding that a Ph. D. physicist has obtained from rigorous training. You have to "sit down and learn the math", or in the case of spirituality, to follow your chosen path. If the approximations have any value, it would only be in giving the hope that the skeptic needs before they can make the commitment to seek the real thing.
[1] Another approximation, equally valid in my view, would be that "God created you and loves you." Note that combining with the first approximation yields the near-tautology that "You love yourself." Still, even a statement such as "God loves you" which might be parsed to something logically trivial can take a new profoundness to one who has undergone the proper cultivation.
Another approximation is "there is no self." Or that "everything is nothing." Combine those two, and you get "everything is self." The name of the Hindu god Shiva, literally means "No-thing."
I started out as a self-identified rationalist, got fascinated by mysticism and 'went native.' Ever since, I have been watching the rationality from the sidelines to see if anyone else will 'cross over' as well.
I predict that if Romeo continues to work on methods for teaching meditation, that eventually he will also 'go mystical' and publicly rescind his claim that all perceived metaphysical insights can be explained as pathological disconnects with reality caused by neural rewiring. Conditional on his continuing to teach, I predict with 70% probability that it occurs within the next 10 years.
There are two theories that both lead me to make this prediction, which I call Theory M and Theory D. I ascribe probability 0.8 to Theory M and 0.15 to Theory D. I ascribe a probability of 0.02 to their intersection.
Theory M ('Mystical') is that there exists a truth that cannot be expressed in conventional language, and that truth has been rediscovered independently by hundreds of seekers throughout history, and that many of the most established spiritual traditions--Taoist, Buddhist, Hindu, Kabbalistic, Christian, Sufi, and many others--were founded for the purpose of disseminating that same truth. If Theory M is true, and conditional on Romeo continuing to teach, I predict that his role in the community will motivate him to deepen his practice and will also bring him in contact with teachers from other spiritual practices. These experiences will catalyze his own mystical insights.
Theory D ('Delusional') is that spiritual seekers expose themselves to mechanisms of self-delusion that are so strong that they would even convince someone who is initially highly identified with rationality and skepticism to start assigning a high probability that Theory M is true.
The main reason I place so much confidence in Theory M is similar to the reason why string theorists place so much confidence in string theory: so many aspects of reality that were previously baffling to me are suddenly very comprehensible and elegant when I understand them through Theory M. Secondary reasons are my personal experiences, but I concede that some aspect of these could be due to delusions so I will not defend them here.
The reason I have so much confidence in Theory D is that I consider myself to have a high capacity for rational thinking, that I am very well-informed in the ideas of the rationality community and I consider myself knowledgeable about a variety of disciplines necessary for understanding reality: neuroscience, psychology, sociology, statistics, philosophy, biology, physics, religious history. I hold a Ph. D. in Statistics and am currently working as a research scientist in a research institution specializing in mental health research. And yet I have daily experiences which reaffirm my confidence in Theory M. If theory M is not true, then I would conclude that the delusion I am suffering is incredibly strong, lies outside of all mental diseases which I know about and evades my own attempts to self-diagnose. I admit that I have little incentive to eliminate the delusion since it makes my life so pleasant. I admit that I perhaps could have protected myself from delusion even more solidly by embedding myself deeply in a physical social community that was dedicated to rationality. I have not done that. However, I am surrounded by scientists at my workplace.
If theory M is true, then I warmly wish all of you reading to discover its truth. If it is not true, I wish for my delusion to be eliminated so that I can stop living in a pleasant fantasy and re-align myself with rationalists who are trying to maximize their positive impact on the world.
I expect that my views will not find much support among you, but I challenge you to judge my claims by the professed standards of your own community. If you feel yourself strongly disagreeing with me, I challenge you to engage me as a fellow rationalist (or to point out how I have violated the community rules of discourse) rather than succumbing the knee-jerk reaction of dismissing a set of beliefs which make you uncomfortable.
Cool, I will take a look at the paper!
Great comment, mind if I quote you later on? :)
That said, if you have example problems where a logically omniscient Bayesian reasoner who incorporates all your implicit knowledge into their prior would get the wrong answers, those I want to see, because those do bear on the philosophical question that I currently see Bayesian probability theory as providing an answer to--and if there's a chink in that armor, then I want to know :-)
It is well known where there might be chinks in the armor, which is what happens when two logically omniscient Bayesians sit down to play a a game of Poker? Bayesian game theory is still in a very developmental stage (in fact, I'm guessing it's one of the things MIRI is working on) and there could be all kinds of paradoxes lurking in wait to supplement the ones we've already encountered (e.g. two-boxing.)
If the game is really working like they say it is, then the frequentist is often concentrating probability around some random psi for no good reason, and when we actually draw random thetas and check who predicted better, we'll see that they actually converged around completely the wrong values. Thus, I doubt the claim that, setting up the game exactly as given, the frequentist converges on the "true" value of psi. If we assume the frequentist does converge on the right answer, then I strongly suspect either (1) we should be using a prior where the observations are informative about psi even if they aren't informative about theta or (2) they're making an assumption that amounts to forcing us to use the "tortured" prior. I wouldn't be too surprised by (2),
The frequentist result does converge, and it is possible to make up a very artificial prior which allows you to converge to psi. But the fact that you can make up a prior that gives you the frequentist answer is not surprising.
A useful perspective is this: there are no Bayesian methods, and there are no frequentist methods. However, there are Bayesian justifications for methods ("it does well based in the average case") and frequentist justifications ("it does well asymptotically or in a minimax sense") for methods. If you construct a prior in order to converge to psi asymptotically, then you may be formally using Bayesian machinery, but the justification you could possibly give for your method is completely frequentist.
Ok. So the scenario is that you are sampling only from the population f(X)=1.
EDIT: Correct, but you should not be too hung up on the issue of conditional sampling. The scenario would not change if we were sampling from the whole population. The important point is that we are trying to estimate a conditional mean of the form E[Y|f(X)=1]. This is a concept commonly seen in statistics. For example, the goal of non-parametric regression is to estimate a curve defined by f(x) = E[Y|X=x].
Can you exhibit a simple example of the scenario in the section "A non-parametric Bayesian approach" with an explicit, simple class of functions g and distribution over them, for which the proposed procedure arrives at a better estimate of E[ Y | f(X)=1 ] than the sample average?
The example I gave in my first reply (where g(x) is known to be either one of two known functions h(x) or j(x)) can easily be extended into the kind of fully specified counterexample you are looking for: I'm not going to bother to do it, because it's very tedious to write out and it's frankly a homework-level problem.
Is the idea that it is intended to demonstrate, simply that prior knowledge about the joint distribution of X and Y would, combined with the sample, give a better estimate than the sample alone?
The fact that prior information can improve your estimate is already well-known to statisticians. But statisticians disagree on whether or not you should try to model your prior information in the form of a Bayesian model. Some Bayesians have expressed the opinion that one should always do so. This post, along with Wasserman/Robbins/Ritov's paper, provides counterexamples where the full non-parametric Bayesian model gives much worse results than the "naive" approach which ignores the prior.
Update from the author:
Thanks for all of the comments and corrections! Based on your feedback, I have concluded that the article is a little bit too advanced (and possibly too narrow in focus) to be posted in the main section of the site. However, it is clear that there is a lot of interest in the general subject. Therefore, rather than posting this article to main, I think it would be more productive to write a "Philosophy of Statistics" sequence which would provide the necessary background for this kind of post.
The confusion may come from mixing up my setup and Robins/Ritov's setup. There is no missing data in my setup.
I could write up my intuition for the hierarchical model. It's an almost trivial result if you don't assume smoothness, since for any x1,...,xn the parameters g(x1)...g(xn) are conditionally independent given p and distributed as F(p), where F is the maximum entropy Beta with mean p (I don't know the form of the parameters alpha(p) and beta(p) off-hand). Smoothness makes the proof much more difficult, but based on high-dimensional intuition one can be sure that it won't change the result substantially.
It is quite possible that estimating E[Y] and E[Y|event] are "equivalently hard", but they are both interesting problems with different quite different real-world applications. The reason I chose to write about estimating E[Y|event] is because I think it is easier to explain than importance sampling.
I didn't reply to your other comment because although you are making valid points, you have veered off-topic since your initial comment. The question of "which observations to make?" is not a question of inference but rather one of experimental design. If you think this question is relevant to the discussion, it means that you neither understand the original post nor my reply to your initial comment. The questions I am asking have to do with what to infer after the observations have already been made.
By "importance sampling distribution" do you mean the distribution that tells you whether Y is missing or not?
Right. You could say the cases of Y1|D=1 you observe in the population are an importance sample from Y1, the hypothetical population that would result if everyone in the population were treated. E[Y1], the quantity to be estimated, is the mean of this hypothetical population. The importance sampling weights are q(x) = Pr[D=1|x]/p(x) where p(x) is the marginal distribution (ie you invert these weights to get the average), the importance sampling distribution is the conditional density of X|D=1.
I will go ahead and answer your first three questions
Objective Bayesians might have "standard operating procedures" for common problems, but I bet you that I can construct realistic problems where two Objective Bayesians will disagree on how to proceed. At the very least the Objective Bayesians need an "Objective Bayesian manifesto" spelling out what are the canonical procedures. For the "coin-flipping" example, see my response to RichardKennaway where I ask whether you would still be content to treat the problem as coin-flipping if you had strong prior infromation on g(x).
MaxENT is not invariant to parameterization, and I'm betting that there are examples where it works poorly. Far from being a "universal principle" it ends up being yet another heuristic joining the ranks of asymptotic optimality, minimax, minimax relative to oracle, etc. Not to say these are bad principles--each of them is very useful, but when and where to use them is still subjective.
That would be great if you could implement a Solomonoff prior. It is hard to say whether implementing an approximate algorithmic prior which doesn't produce garbage is easier or harder than encoding the sum total of human scientific knowledge and heuristics into a Bayesian model, but I'm willing to bet that it is. (This third bet is not a serious bet, the first two are.)
It is worth noting that the issue of non-consistency is just as troublesome in the finite setting. In fact, in one of Wasserman's examples he uses a finite (but large) space for X.
Yes, I think you are missing something (although it is true that causal inference is a missing data problem).
It may be easier to think in terms of the potential outcomes model. Y0 is the outcome is no treatment, Y1 is the outcome of treatment, you only ever observe either Y0 or Y1, depending on whether D=0 or 1. Generally you are trying to estimate E[Y1] or E[Y0] or their difference.
The point is that the quantity Robbins and Wasserman are trying to estimate, E[Y], does not depend on the importance sampling distribution. Whereas the quantity I am trying to estimate, E[Y|f(X)], does depend on f. Changing f changes the population quantity to be estimated.
It is true that sometimes people in causal inference are interested in estimating things like E[Y1 - Y0|D], " e.g. the treatment effect on the treated." However this is still different from my setup because D is a random variable, as opposed to an arbitrary function of the known variables like f(X).
My example is very similar to the Robbins/Wasserman example, but you end up drawing different conclusions. Robbins/Wasserman show that you can't make sense of importance sampling in a Bayesian framework. My example shows that you can't make sense of "conditional sampling" in a Bayesian framework. The goal of importance sampling is to estimate E[Y], while the goal of conditional sampling is to estimate E[Y|event] for some event.
We did talk about this before, that's how I first learnt of the R/W example.
I do not need to model the process f by which that population was selected, only the behaviour of Y within that population?
There are some (including myself and presumably some others on this board) who see this practice as epistemologically dubious. First, how do you decide which aspects of the problem to incorporate into your model? Why should one only try to model E[Y|f(X)=1] and not the underlying function g(x)=E[Y|x]? If you actually had very strong prior information about g(x), say that "I know g(x)=h(x) with probability 1/2 or g(x) = j(x) with probability 1/2" where h(x) and j(x) are known functions, then in that case most statisticians would incorporate the underlying function g(x) in the model; and in that case, data for observations with f(X)=0 might be informative for whether g(x) = h(x) or g(x) = j(x). So if the prior is weak (as it is in my main post) you don't model the function, and if the prior is strong, you model the function (and therefore make use of all the observations)? Where do you draw the line?
I agree, most statisticians would not model g(x) in the cancer example. But is that because they have limited time and resources (and are possibly lazy) and because using an overcomplicated model would confuse their audience, anyways? Or because they legitimately think that it's an objective mistake to use a model involving g(x)?
Good catch, it should be Beta(991, 11). The prior is uniform = Beta(1,1 ) and the data is (990 successes, 10 fails)
How do you get the top portion of the second payoff matrix from the first? Intuitively, it should be by replacing the Agent A's payoff with the sum of the agents' payoffs, but the numbers don't match.
Most people are altruists but only to their in-group, and most people have very narrow in-groups. What you mean by an altruist is probably someone who is both altruistic and has a very inclusive in-group. But as far as I can tell, there is a hard trade-off between belonging to a close-knit, small in-group and identifying with a large, diverse but weak in-group. The time you spend helping strangers is time taken away from potentially helping friends and family.
Like V_V, I don't find it "reasonable" for utility to be linear in things we care about.
I will write a discussion topic about the issue shortly.
EDIT: Link to the topic: http://lesswrong.com/r/discussion/lw/mv3/unbounded_linear_utility_functions/
I'll need some background here. Why aren't bounded utilities the default assumption? You'd need some extraordinary arguments to convince me that anyone has an unbounded utility function. Yet this post and many others on LW seem to implicitly assume unbounded utility functions.
Let's talk about Von Neumann probes.
Assume that the most successful civilizations exist digitally. A subset of those civilizations would selfishly pursue colonization; the most convenient means would be through Von Neumann machines.
Tipler (1981) pointed out that due to exponential growth, such probes should already be common in our galaxy. Since we haven't observed any, we must be alone in the universe. Sagan and Newman countered that intelligent species should actually try to destroy probes as soon as they are detected. This counterargument, known as "Sagan's response," doesn't make much sense if you assume that advanced civilizations exist digitally. For these civilizations, the best way to counter another race of Von Neumann probes is with their own Von Neumann probes.
Others (who have not been identified by the Wikipedia article) have tried to explain the visible absence of probes by theorizing how civilizations might deliberately limit the expansion range of the probes. But why would any expansionist civilization even want to do so? One explanation would be to avoid provoking other civilizations. However, it still remains to be explained why the very first civilizations, which had no reason to fear other alien civilizations, would limit their own growth. Indeed, any explanation of the Fermi paradox has to be able to explain why the very first civilization would not have already colonized the universe, given that the first civilization was likely to be aware of their uncontested claim to the universe.
The first civilization either became dominated by a singleton, or remained diversified into the space age. For the following theory, we have to assume the latter--besides, we should hope for our own sake that singletons don't always win. If the civilization remains diverse, at least some of the factions transition to a digital existence, and given the advantages provided for civilizations existing in that form, we could expect the digitalized civilizations to dominate.
Digitalized civilizations still have a wide range of possible value systems. There exist hedonistic civilizations, which gain utility from having immense computational power for recreational simulations or proving useless theorems, and there also exist civilizations which are more practically focused on survival. But any type of civilization has to act in self-preservation.
Details of the strategic interactions of the digitalized civilizations depend on speculative physics and technology: particularly in the economics of computation. Supposing dramatic economies of scale in computation (for example, supposing that quantum computers provide an exponential scaling of utility by cost), then it becomes plausible that distinct civilizations would cooperate. However, all known economies of scale have limits, in which case the most likely outcome is for distinct factions to maintain control of their own computing resources. Without such an incentive for cooperation, the civilizations would have to be wary of threats from the other civilizations.
Any digitalized civilization has to protect itself from being compromised from within. Rival civilizations with completely incompatible utility functions could still exploit each other's computing resources. Hence, questions about the theoretical limitations of digital security and data integrity could be relevant to predicting the behavior of advanced civilizations. It may turn out to be easy for any civilization to protect a single computational site. However, any civilization expanding to multiple sites would face a much trickier security problem. Presumably, the multiple sites should be able to interact in some way, since otherwise, what is the incentive to expand? However, any interaction between a parent site and a child site opens the parent site (and therefore the entire network) to compromise.
Colonization sites near any particular civilization quickly become occupied, hence a civilization seeking to expand would have to send a probe to a rather distant region of space. The probe should be able to independently create a child site, and then eventually this child site should be able to interact with the parent site. However, this then requires the probe to carry some kind of security credentials which would allow the child site to be authenticated by the parent site in the future. These credentials could potentially be compromised by an aggressor. The probe has a limited capacity to protect itself from compromise, and hence there is a possibility that an aggressor could "capture" the probe, without being detected by the probe itself. Thus, even if the probe has self-destruction mechanisms, they would be circumvented by a sufficiently sophisticated approach. A compromised probe would behave exactly the same as a normal probe, and succeed in creating a child site. However, after the compromised child site has started to interact with the parent, at some point, it can launch an attack and capture the parent network for the sake of the aggressor.
Due to these considerations, civilizations may be wary of sending Von Neumann probes all over the universe. Civilizations may still send groups of colonization probes, but the probes may delay colonization so as to hide their presence. One might imagine that a "cold war" is already in progress in the universe, with competing probes lying hidden even within our own galaxy, but lying in stalemate for billions of years.
Yet, new civilizations are basically unaffected by the cold war: they have nothing to lose from creating a parent site. Nevertheless, once a new civilization reaches a certain size, they have too much to lose from making unsecured expansions.
But some civilizations might be content to simply make independent, non-interacting "backups" of themselves, and so have nothing to fear if their probes are captured. It still remains to explain why the universe isn't visibly filled with these simplistic "backup" civilizations.
Sociology, political science and international politics, economics (graduate level), psychology, psychiatry, medicine.
Undergraduate mathematics, Statistics, Machine Learning, Intro to Apache Spark, Intro to Cloud Computing with Amazon
Thanks--this is a great analysis. It sounds like you would be much more convinced if even a few people already agreed to tutor each other--we can try this as a first step.
That's OK, you can get better. And you can use any medium which suits you. It could be as simple as assigning problems and reading, then giving feedback.
This is an interesting counterexample, and I agree with Larry that using priors which depend on pi(x) is really no Bayesian solution at all. But if this example is really so problematic for Bayesian inference, can one give an explicit example of some function theta(x) for which no reasonable Bayesian prior is consistent? I would guess that only extremely pathological and unrealistic examples theta(x) would cause trouble for Bayesians. What I notice about many of these "Bayesian non-consistency" examples is that they require consistency over very large function classes: hence they shouldn't really scare a subjective Bayesian who knows that any function you might encounter in the real world would be much better behaved.
In terms of practicality, it's certainly inconvenient to have to compute a non-parametric posterior just to do inference on a single real parameter phi. To me, the two practical aspects of actually specifying priors and actually computing the posterior remain the only real weakness of the subjective Bayesian approach (or the Likelihood principle more generally.)
PS: Perhaps it's worth discussing this example as its own thread.
EDIT: Edited my response to be more instructive.
On some level it's fine to make the kinds of qualitative arguments you are making. However, to assess whether a given hypothesis really robust to parameters like ubiquity of civilizations, colonization speed, and alien psychology, you have to start formulating models and actually quantify the size of the parameter space which would result in a particular prediction. A while ago I wrote a tutorial on how to do this:
http://lesswrong.com/lw/5q7/colonization_models_a_tutorial_on_computational/
which covers the basics, but to incorporate alien psychology you would have formulate the relevant game-theoretic models as well.
The pitfall of the kinds of qualitative arguments you are making is that you risk confusing the fact that "I found a particular region of the parameter space where your theory doesn't work" with the conclusion that "Your theory only works in a small region of the parameter space." It is true that under certain conditions regarding ubiquity of civilizations, colonization speed, and alien diplomatic strategy, that Catastrophe Engines end up being built on every star. However, you go on to claim that in most of the parameter space, such an outcome occurs, and that the Fermi Paradox is only observed in a small exceptional part of the parameter space. Given my experience with this kind of modeling, I predict that Catastrophe Engines actually are robust to all but the most implausible assumptions about ubiquity of intelligent life, colonization speed, and alien psychology, but you obviously don't need to take my word on it. On the other hand, you'd have to come up with some quantitative models to convince me of the validity of your criticisms. In any case, continuing to argue on a purely philosophical level won't serve to resolve our disagreement.
The second civ would still avoid building them too close to each other. This is all clear if you do the analysis.
Thanks for the references.
I am interested in answering questions of "what to want." Not only is it important for individual decision-making, but there are also many interesting ethical questions. If a person's utility function can be changed through experience, is it ethical to steer it in a direction that would benefit you? Take the example of religion: suppose you could convince an individual to convert to a religion, and then further convince them to actively reject new information that would endanger their faith. Is this ethical? (My opinion is that it depends on your own motivations. If you actually believed in the religion, then you might be convinced that you are benefiting others by converting them. If you did not actually believe in the religion, then you are being manipulative.)
Ordinarily, yes, but you could imagine scenarios where agents have the option to erase their own memories or essentially commit group suicide. (I don't believe these kinds of scenarios are extreme beyond belief--they could come up in transhuman contexts.) In this case nobody even remembers which action you chose, so there is no extrinsic motivation for signalling.
The second civilization would just go ahead and build them anyways, since doing so maximizes their own utility function. Of course, there is an additional question of whether and how the first civilization will try to stop this from happening, since the second civ's Catastrophe Engines reduce their own utility. If the first civ ignores them, the second civ builds Catastrophe Engines the same way as before. If the first civ enforces a ban on Catastrophe Engines, then the second civ colonizes space using conventional methods. But most likely the first civ would eliminate the second civ (the "Berserker" scenario.)
For the original proposal:
Explain:
- A mechanism for explosive energy generation on a cosmic scale might also explain the Big Bang.
Invalidate:
Catastrophe engines should still be detectable due to extremely concentrated energy emission. A thorough infrared sky survey would rule them out along with more conventional hypotheses such as Dyson spheres.
If it becomes clear there is no way to exploit vacuum energy, this eliminates one of the main candidates for a new energy source.
A better understanding of the main constraints for engineering Matrioshka brains: if heat dissipation considerations already limit the size of a single brain, then there is no point in considering speculative energy sources.
Disclaimer: I am lazy and could have done more research myself.
I'm looking for work on what I call "realist decision theory." (A loaded term, admittedly.) To explain realist decision theory, contrast with naive decision theory. My explanation is brief since my main objective at this point is fishing for answers rather than presenting my ideas.
Naive Decision Theory
Assumes that individuals make decisions individually, without need for group coordination.
Assumes individuals are perfect consequentialists: their utility function is only a function of the final outcome.
Assumes that individuals have utility functions which do not change with time or experience.
Assumes that the experience of learning new information has neutral or positive utility.
Hence a naive decision protocol might be:
A person decides whether to take action A or action B
An oracle tells the person the possible scenarios that could result from action A or action B, with probability weightings.
The person subconsciously assigns a utility to each scenario. This utility function is fixed. The person chooses the action A or B based on which action maximizes expected utility.
As a consequence of the above assumptions, the person's decision is the same regardless of the order of presentation of the different actions.
Note: we assume physical determinism, so the person's decision is even known in advance to the oracle. But we suppose the oracle can perfectly forecast counterfactuals; to emphasize this point, we might call it a "counterfactual oracle" from now on.
It should be no surprise that the above model of utility is extremely unrealistic. I am aware of experiments demonstrating non-transitivity of utility, for instance. Realist decision theory contrasts with naive decision theory in several ways.
Realist Decision Theory
Acknowledges that decisions are not made individually but jointly with others.
Acknowledges that in a group context, actions have a utility in of themselves (signalling) separate from the utility of the resulting scenarios.
Acknowledges that an individual's utility function changes with experience.
Acknowledges that learning new information constitutes a form of experience, which may itself have positive or negative utility.
Relaxing any one of the four assumptions radically complicates the decision theory. Consider only relaxing conditions 1 and 2: then game theory becomes required. Consider relaxing only 3 and 4, so that for all purposes only one individual exists in the world: then points 3 and 4 mean that the order in which a counterfactual oracle presents the relevant information to the individual affect the individual's final decision. Furthermore, an ethically implemented decision procedure would allow the individual to choose which pieces of information to learn. Therefore there is no guarantee that the individual will even end up learning all the information relevant to the decision, even if time is not a limitation.
It would be great to know which papers have considered relaxing the assumptions of a "naive" decision theory in the way I have outlined.
I mostly agree with you, but we may disagree on the implausibility of exotic physics. Do you consider all explanations which require "exotic physics" to be less plausible than any explanation that does not? If you are willing to entertain "exotic physics", then are there many ideas involving exotic physics that you find more plausible than Catastrophe Engines?
In the domain of exotic physics, I find Catastrophe Engines to be relatively plausible since are already analogues of similar phenomena to Catastrophe Engines in known physics: for example, nuclear chain reactions. It is quite natural to think that a stronger method of energy production would result in even greater risks, and finally the inherent uncertainty of quantum physics implies that one can never eliminate the risk of any machine, regardless of engineering. Note that my explanation holds no matter how small the risk lambda actually is (though I implicitly assumed that the universe has infinite lifetime: for my explanation to work the expected life of the Catastrophe Engine has to be at most on the same order as the lifetime of the universe.)
It is also worth noting that there are many variants of the Catastrophe Engine hypothesis that have the same consequences but which you might find more or less plausible. Perhaps these Engines don't have "meltdown", but it is necessary that they experience some kind of interference from other nearby Engines that would prevent them from being built too closely to each other. You could suppose that the best Matrioshka Brains produce chaotic gravity waves that would interfere with other nearby Brains, for instance.
Personally, I find explanations that require implausible alien psychology to be less plausible than explanations that require unknown physics. I expect most higher civilizations to be indifferent about our existence unless we pose a substantial threat, and I expect a sizable fraction of higher civilizations to value expansion. Perhaps you have less confidence in our understanding of evolutionary biology than our understanding of physics, hence our disagreement.
For the sake of discussion, here is my subjective ranking of explanations by plausibility:
- There are visible signs of other civilizations, we just haven't looked hard enough.
- Most expansionist civilizations develop near light-speed colonization, hence making it very unlikely for us to exist in the interval between when their civilization is visible and our planet has already been colonized
- We happen to be the first technologically advanced civilization in our visible universe
- Most artifacts are invisible due to engineering considerations (e.g. the most efficient structures are made out of low-density nanofibers, or dark matter).
- Colonization is much, much more difficult than we anticipated.
- Defensively motivated "berserkers". Higher civs have delicate artifacts that could actually be harmed by much less advanced spacefaring species, hence new spacefaring species are routinely neutralized. It still needs to be explained why most of the universe hasn't been obviously manipulated, hence "Catastrophe Engines" or a similar hypothesis. Also, it needs to be explained why we still exist, since it would be presumably very cheap to neutralize our civilization.
- Some "great filters" lie ahead of us: such as nuclear war. Extremely implausible because you would also have to explain why no species could manage to evolve with better cooperation skills.
- "Galactic zoo" hypotheses and other explanations which require most higher civilizations to NOT be expansionist. Extremely implausible because many accidentally created strong AIs would be expansionist.
I ignore the hypothesis that "we are in a simulation" because it doesn't actually help explain why we would be the only species in the simulation.
EDIT: Modified the order
There are only a limited number of ideas we can work on
You are right in general. However, it is also a mistake to limit your scope to too few of the most promising ideas. Suppose we put a number K on the number of different explanations we should consider for the Fermi paradox. What number K do you think would give the best tradeoff between thoroughness and time?
It's not a contest. And although my explanation invokes unknown physics, it makes specific predictions which could potentially be validated or invalidated, and it has actionable consequences. Could you elaborate on what criteria make an idea "worth entertaining"?
Regardless of whether ETs are sending signals, presumably we should be able to detect Type II or Type III civilizations given most proposals for how such civilizations should look like.
There exists a technological plateau for general intelligence algorithms, and biological neural networks already come close to optimal. Hence, recursive self-improvement quickly hits an asymptote.
Therefore, artificial intelligence represents a potentially much cheaper way to produce and coordinate intelligence compared to raising humans. However, it will not have orders of magnitude more capability for innovation than the human race. In particular, if humans are unable to discover breakthroughs enabling vastly more efficient production of computational substrate, then artificial intelligence will likewise be unable. In that case, unfriendly AI poses an existential threat primarily through dangers that we can already imagine, rather than unanticipated technological breakthroughs.
There is no way to raise a human safely if that human has the power to exponentially increase their own capabilities and survive independently of society.
You can try reduce philosophy to science, but how can you justify the scientific method itself? To me, philosophy refers to the practice of asking any kind of "meta" question. To question the practice of science is philosophy, as is the practice of questioning philosophy. The arguments you make are philosophical arguments--and they are good arguments. But to make a statement to the effect of "all philosophy is cognitive science" is too broad a generalization.
What Socrates was doing was asking "meta" questions about intuitions that most people take for granted. Now what you are doing is asking "meta" questions about what Socrates was doing. Was his goal to study how people think about justice? Perhaps. But that is saying that Socrates' goal was to find the truth. Perhaps his goal was more than that: he wanted to personally convince people to question their own intuitions via the "Socratic method."
But this does not fit into the scientific framework. because in science it is accepted that there is a universal truth. From the scientific point of view, Socrates should just design some experiment to test peoples' intuition about justice, publish the findings, and be satisfied that he uncovered some of this universal truth. But would that be enough to convince an average person to question their own intuitions? Perhaps in our age, it would be enough, since most people accept science. But I doubt it even now, and certainly people back in Socrates' time would not be convinced if he wrote his findings, proclaiming it as universal truth. He had to seek out individuals and personally convince them to question their own thinking.
You can't boil down philosophy to the process of "seeking universal truth", because for one, that is the definition of science, and two, philosophy is the place you start before you assume things about the universal truth. Of course, philosophy can lead you to science, if philosophical arguments convince you about the existence of universal truth. Once you accept the "universal truth", then looking back, most of philosophy looks like nonsense. But you shouldn't discount the importance of the process of asking apparently silly questions that got you, and the rest of civilization, to where we are now!
Hopefully people here do not interpret "rationalists" as synonymous for "the LW ingroup." For one, you can be a rationalist without being a part of LW. And secondly, being a part of LW in no way certifies you as a rationalist, no matter how many internal "rationality tests" you subject yourself to.
A different kind of "bias-variance" tradeoff occurs in policy-making. Take college applications. One school might admit students based only on the SAT score. Another admits students based on scores, activities, essays, etc. The first school might reject a lot of exceptional people who just happen to be bad at test-taking. The second school tries to make sure they accept those kinds of exceptional people, but in the process of doing so, they will admit more unexceptional people with bad test scores who somehow manage to impress the admissions committee. The first school is "biased" against exceptional students with bad test grades-- the second school has more "variance" because by attempting to capture the students that the first school who wrongly reject, they admit more low quality students as well. You might interpret this particular example as "sensitivity vs specificity."
Another example would be a policy for splitting tips at a restaurant. One policy would be to have all the staff split the tips equally. Another policy would be to have no splitting of tips. Splitting tips incurs bias, not splitting incurs variance. An intermediary policy would be to have each staff member keep half of their own tips, and to contribute the other half to be redistributed.
EM202623997 state complexity hierarchy
Relative to any cellular automata capable of universal computation, initial states can be classified according to a nested hierarchy'of complexity classes. The first three levels of the hierarchy were informally known since the beginnings of cellular automata theory in the 20th century, and the next two levels were also speculated to exist, motivated by the idea of formalizing an abstract notion of "organism" and an abstract notion of "sentience", respectively. EM-brain 202623897, a descendant of the Musk-Tao-Mirzahkani mergemind, formalized a definition of another two levels of the hierarchy, and argued in their 2107 paper that the formalization provides a basis for the aforementioned theories of "organism" and "sentience". The EM138716-EM198274 theorem established that states of the fifth hierarchy exist for any universal CA; on the other hand, no examples of states of the fourth hierarchy have been found for any cellular automata, other than somewhat unsatisfying examples such as "an encoding of a program designed to systematically search for states of the fourth hierarchy".
CA theorists generally agree that the EM20263897 hierarchy represents a conceptual leap in CA theory, but there is still debate as to whether the fourth and fifth hierarchy actually constitute a concept of "organism" and "sentience" as claimed. In particular, according to the transcoding theorem, any software program; and in particular, a simulated universe (of potentially unbounded size) containing a simulated genetic self-replicator or emulated brain, can be encoded as a finite state for a universal CA; but it is not known whether such encodings are included in the fourth level of the hierarchy. Amusingly, the main issue is the uncertainty as to whether or not all such self-replicators are destined to self-destruct [citation needed].
1. All states. The highest level attainable for any state sequence with a repeated state.
2. Aperiodic. No state ever repeats. A basic example of this is the glider gun) in the game of life.
3. Compuational Irreducibility. The state sequence cannot be matched by a non-universal cellular automata; for example, the state consisting of a single 1 for rule 30.
4. Level IV. The state sequence attains states of unbounded fractal entropy dimension. EM202623997 showed that the proportion states of size N in the third hierarchy with a supremum fractal entropy dimension above any constant times exp(N) goers to zero as N goes to infinity, i.e. "most" states in the third hierarchy are not contained in the fourth hierarchy. EM202623997 further argued that fractal entropy dimension captures the "macrostructures" exhibited by ecosystems. Additional work has confirmed that simulations of interacting and evolving self-replicators seem to increase in fractal entropy dimension.
5. Level V. The state sequence reaches an unbounded level of simuiation dimension. The formal definition is extremely technical, but the idea is that the state sequence contains space-localized subsequences which can be interpreted as simulations of the CA on a coarser scale, which may in turn contain space-localized subsequences with the same property, etc. The simulation dimension is a measure of such nesting: however, due to difficulties in formalizing what it means for a space-localized subsequence to contain a "simulation of the CA on a coarser scale", the simulation dimension is a real-valued quantity (and in practice, uncomputable) rather than a whole number, EM202623997 proved that this property implied unbounded fractal dimension, and argued that requiring unboundedness agrees with the work of philosophers studying "universal characteristics of sentient, rational agents".
Levels IV and V were originally named "life-like" and "sentience-llike", respectively, in the original paper by EM202623997, but the term was never widely adopted.
Philosophical reaction
Philosopher EM19387 criticized the definition of the fifth hierarchy as conflating sentience with self-preservation. EM202623997 responded by speculating that "any universe capable of producing sentience can also produce ambitious sentience", and hence the fifth hierarchy will, in practice, capture "most" state sequences which can be considered sentient.
Quantum analogues
Developing an analogous hierarchy for quantum cellular automata is an active area of research. In particular, a quantum CA could easily encode a universe based on string theory; hence, conditional on the accuracy of string theory in describing our universe, one can very directly ask whether universes similar to our (but modified to have unbounded informational capacity) own falls into the fourth hierarchy. Our own universe is probably not, according to most theories of cosmology, which indicate a bound on the informational capacity if our universe.
Simulated dream state experiments
Simulated dream state experiments (SDSEs) are computer simulation experiments involving simulated humans sentiences in a dream state. Since the passing of the Banford agreement (1) in 2035, SDSEs are the exclusive means of ethically conducting simulation experiments of simulated human sentiences without active consent (2), although contractual consent (3) is still universally required for SDSEs. SDSEs have widespread scientific, commercial, educational, political, military and legal purposes. Scientific studies using SDSEs have been used to develop accelerated dream learning techniques; SDSEs are also employed as part of the scientific process itself, as a means of controlling creative hypothesis bias (4). Commercial applications of SDSEs include screening of job applicants and simulated consumer testing. Simulated ordeals are a major use of SDSEs in legal, political, and military contexts, for the purpose of enforcing the ethical integrity or good faith of the subject.
Scientific status
SDSEs are widely accepted within the scientific community as a valid substitute for waking-state simulations. The rapid development of silicooneirology (5) provided the necessary understanding to influence dream states with a high degree of control, including fine control of the subject's degree of lucidity. In a series of studies [1], [2], Walker et al. demonstrated high correlation between the behavior of one simulated set of subjects participating in a battery of waking-state simulated experiments and an identical set of copies participating in the equivalent SDSEs.
Controversy
SDSEs are still banned in the European Union and in most Latin American countries. Additionally, most religious organizations are against SDSEs. Pope Clementine VIII has stated that the use of SDSEs conflicts with the concept of faith.
A number of international scandals have erupted over the alleged use of SDSEs as a torture mechanism [3],[4]. Such abuse of SDSEs is considered a war crime by the updated Geneva convention.
Critics of SDSEs point out that the widespread acceptance of SDSEs is due to the universal emotional tendency of humans to devalue dream experiences as being "unreal." However, SDSEs differ from natural dreams due to the external manipulation of the dream state. The status of SDSEs with long subjective time frames is also commonly disputed.
Author Jose Hernandez specifically criticized the effect of randomized SDSEs, especially simulation ordeals, on everyday life, in his book "Simulation Shock." Hernandez argues that the ubiquity of SDSEs creates a sense of continual unease as to whether or not one is participating in an SDSE at any given moment. However, surveys conducted by sociologists [4], [5] find a general increase in self-reported happiness levels in employees at companies which began the use of randomized SDSEs.
(1): Bandford agreement: A provision of the UN council of human rights prohibiting the simulation of simulated human sentiences without active consent, even if contractual consent was given.
(2): Active consent: In the context of experiments on simulated human sentiences, active consent refers three conditions: 1. the subjects are aware of their simulated nature, 2. the subjects have the right to end the simulation at any time, 3. the subjects have the means to communicate to end the simulation at all times to the experimenters.
(3) Contractual consent: requires the subjects to be aware of their simulated nature, and provide informed consent to the experiment prior to the setup of the experiment. Additionally, the consenting individual (not the simulated copy) has the right to withdraw their simulated copy from the experiment at any time, and also has a number of rights regarding the confidentiality of all obtained data.
(4) Creative hypothesis bias: A statistical bias resulting from using the same data to formulate a set of hypotheses and to test those hypotheses. According to the Popperian (traditional) philosophy of science, an ideal scientific study completely avoids the creative hypothesis bias by completely separating the process of formulating hypotheses and testing those hypotheses, with an independent experiment for each hypothesis. In post-Popperian philosophy of science, creative hypothesis bias is controlled by use of SDSEs, in which a simulated copies of the original research team is presented with synthetically generated data (control group) or the original data (treatment group).
(5) Silicooneirology: The scientific study of sleep via simulation. In contrast to SDSEs, simulated sleep studies, by definition, make no attempt to interfere with the sleeping brain, providing only a pre-specified set of input signals specified by the Handbook of the International Society of Silicooneirology.
[1],[2],...: fake references
Daniel grew up as a poor kid, and one day he was overjoyed to find $20 on the sidewalk. Daniel could have worked hard to become a trader on Wall Street. Yet he decides to become a teacher instead, because of his positive experiences in tutoring a few kids while in high school. But as a high school teacher, he will only teach thousand kids in his career, while as a trader, he would have been able to make millions of dollars. If he multiplied his positive experience with one kid by a thousand, it still probably wouldn't compare with the joy of finding $20 on the sidewalk times a million.