Posts
Comments
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
The entity providing the goals for the AI wouldn't have to be a human, it might instead be a corporation. A reasonable goal for such an AI might be to 'maximize shareholder value'. The shareholders are not humans either, and what they value is only money.
I find "false positive" and "false negative" also a bit confusing, albeit less so than "type I" and "type II" errors. Perhaps because of a programming background, I usually interpret 'false' and 'negative' (and '0') as the same thing. So is a 'false positive' something that is false but is mistaken as positive, or something that is positive (true), but that is mistaken as false (negative)? In other words, does 'false' apply to the postiveness (it is actually negative, but classified as positive), to being classified as positive (it is actually positive, but classified as positive)?
Perhaps we should call false positives "spurious" and false negatives "missed".
The link you provided contains absolutely no physics, as far as I can tell. Nor is there any math aside from some basic logic. So I am skeptical on whether this theory is correct (or even falsifiable).
Physics is built on top of mathematics, and almost all of mathematics can be built on top of ZFC (there are other choices). But there is as much time in ZFC as there are words in a single pixel on your screen.
Why would the price of necessities rise?
There are three reasons why the price might go up:
- demand increases
- supply decreases
- inflation
Right now, everyone is already consuming these necessities, so if UBI is introduced, demand will not go up. So 1 would not be true.
Supply could go down if enough people stop working. But if this reduces supply of the necessities, there is a strong incentive for people on just UBI to start working again. There is also increasing automation. So I find 2 unlikely.
That leaves 3, inflation. I am not an economist, but as far as I understand this shouldn't be a significant factor.
Define the sequence S by
S(0) = 0
S(n+1) = 1 + S(n)
This is a sequence of natural numbers. This sequence does not converge, which means that the limit as n goes to infinite of S(n) is not a natural number (nor a real number for that matter).
You could try to write it as a function of time, S'(t) such that S'(1-0.5^n) = S(n). That is, S'(0)=0, S'(0.5)=1, S'(0.75)=2, etc. A possible formula is S'(t) = -log_2(1-t). You could then ask what is S'(1). The answer is that this is the same as the limit S(infinity), or as log(0), which are both not defined. So in fact S' is not a function from numbers between 0 and 1 inclusive to natural or real numbers, since the domain excludes 1.
You can similarly define a sequence of distributions over the natural numbers by
T(0) = {i -> 0.5 * 0.5^i}
T(n+1) = the same as T(n) except two values swapped
This is the example that you gave above. The sequence T(n) doesn't converge (I haven't checked, but the discussion above suggests that it doesn't), meaning that the limit "lim_{n->inf} T(n)" is not defined.
This question presupposes that the task will ever be done Sure. It's called super-tasks.
From mathematics we know that not all sequences converge. So the sequence of distributions that you gave, or my example of the sequence 0,1,2,3,4,... both don't converge. Calling them a supertask doesn't change that fact.
What mathematicians often do in such cases is to define a new object to denote the hypothetical value at the end of sequence. This is how you end up with real numbers, distributions (generalized functions), etc. To be fully formal you would have to keep track of the sequence itself, which for real numbers gives you Cauchy sequences for instance. In most cases these objects behave a lot like the elements of the sequence, so real numbers are a lot like rational numbers. But not always, and sometimes there is some weirdness.
From the wikipedia link:
In philosophy, a supertask is a countably infinite sequence of operations that occur sequentially within a finite interval of time.
This refers to something called "time". Most of mathematics, ZFC included, has no notion of time. Now, you could take a variable, and call it time. And you can say that a given countably infinite sequences "takes place" in finite "time". But that is just you putting semantics on this sequence and this variable.
What can one expect after this super-task is done to see?
This question presupposes that the task will ever be done. Since, if I understand correctly, you are doing an infinite number of swaps, you will never be done.
You could similarly define a super-task (whatever that is) of adding 1 to a number. Start with 0, at time 0 add 1, add one more at time 0.5, and again at 0.75. What is the value when you are done? Clearly you are counting to infinity, so even though you started with a natural number, you don't end up with one. That is because you don't "end up" at all.
Why do you believe that? And do you also believe that ZF is inconsistent?
Not about the game itself, but the wording of the questions is a bit confusing to me:
In the above network, suppose that we were to observe the variable labeled "A". Which other variables would this influence?
The act of observing a variable doesn't influence any of the variables, it would only change your beliefs about the variables. The only things influencing a variable are its parents in the Bayesian network.
The wheels in this case come from robust statistics.
One example of a good robust estimator for the center is the [truncated mean]https://en.wikipedia.org/wiki/Truncated_mean). To put it simply: throw away the lowest x% and the highest x% of the samples, and take the mean of the rest. If x=0 you get the regular mean, if x=50% then you get the median.
I wonder if I should write up a rant or a better attempt at exposition, against this "clearly the problem is underdetermined" position.
You should.
An AI would be yet another node in our network, and participate in this process of throwing blegg-concepts at each other probably far better than any human can, but still just a node.
Why would an AI be a single node? I can run two programs in parallel right now on my computer, and they can talk to each other just fine. So if communication is necessary for intelligence, why couldn't an AI be split up into many communicating sub-AIs?
But alas, I fear that Professor Riddle would not have found lasting happiness in Hogwarts."
"Why not? "
"Because I still would've been surrounded by idiots, and I wouldn't have been able to kill them," Professor Quirrell said mildly.
The solution seems obvious (albeit hard and dangerous): make the students smarter so they are no longer idiots.
I cannot be truly killed by any power known to me, and lossing Sstone will not sstop me from returning, nor sspare you or yourss my wrath. Any impetuous act you are contemplating cannot win the game for you, boy.
The last sentence was not said in parseltongue. Could it be that Quirrell used English because it is a lie, and he believes that there is something that Harry could do to win?
I switched to The Old Reader, which, as the name suggests, is pretty close to Google Reader in functionality.
You can measure this by looking at the spoken or written works of the group. When talking to an Axiomist of Choice, you would on average agree 97% of the time with what they are saying, since the other 3% they are talking about yogic flying.
Of course in real life people also make a lot of smalltalk, which is probably not ideological at all. This is less of an issue when looking at writing.
Wikipedia lists the safe upper limit of vitamin D as 4000 IU (100ug), so taking 10000 could be unhealthy.
If everyone (or just most people) think like you, then seeing people suffer makes them suffer as well. And that makes their friends suffer, and so on. So, by transitivity, you should expect to suffer at least a little bit when people who you don't know directly are suffering.
But I don't think it is about the feeling. I also don't really feel anything when I hear about some number of people dying in a far away place. Still, I believe that the world would be a better place if people were not dying there. If I am in a position to help people, I believe that in the long run the result is better if I just shut up and multiply and help many far away people, rather than caring mostly about a few friends and neighbors.
This is another good explanation instead of / in addition to the Great Filter.
It could be that there are many local optima to life, that are hard to escape. And that intelligence requires an unlikely local optimum. This functions like an early Great Filter, but in addition, failing this filter (by going to a bad local optimum) might make it impossible to start over.
For example, you could imagine that it were possible to evolve a gray goo like organism which eats everything else, but which is very robust to mutations, so it doesn't evolve further.
I do not agree. Consider these payoffs for the same game: ...
Different payoffs imply a different game. But even in this different game, the simultaneous move version would be equivalent. With regards to choosing between X and Y, the existence of choice A still doesn't matter, because if player 1 chose A X and Y have the same payoff. The only difference is how much player 2 knows about what player 1 did, and therefore how much player 2 knows about the payoff he can expect. But that doesn't affect his strategy or the payoff that he gets in the end.
If Player 2 gets to move he is uncertain as to what Player 1 did. He might have a different probability estimate in the game I gave than one in which strategy A did not exist, or one in which he is told what Player 1 did.
In a classical game all the players move simultaneously. So to repeat, your game is:
- player 1 chooses A, B or C
- then, player 2 is told whether player 1 chose B or C, and in that case he chooses X or Y
- payoffs are (A,-) -> (3,0); (B,X) -> (2,0); (B,Y) -> (2,2); (C,X) -> (0,1); (C,Y) -> (6,0)
The classical game equivalent is
- player 1 chooses A, B or C
- without being told the choice of player 1, player 2 chooses X or Y
- payoffs are as before, with (A,X) -> (3,0); (A,Y) -> (3,0).
I hope you agree that the fact that player 2 gets to make a (useless) move in the case that player 1 chooses A doesn't change the fundamentals of the game.
In this classic game player 2 also has less information before making his move. In particular, player 2 is not told whether or not player 1 choose A. But this information is completely irrelevant for player 2's strategy, since if player 1 chooses A there is nothing that player 2 can do with that information.
I'm not convinced that the game has any equilibrium unless you allow for trembling hands.
If the players choose (A,X), then the payoff is (4,0). Changing his choice to B or C will not improve the payoff for player 1, and switching to Y doesn't improve the payoff for B. Therefore this is a Nash equilibrium. It is not stable, since player 2 can switch to Y without getting a worse payoff.
It's not equivalent because of the uncertainty.
Could you explain what you mean? What uncertainty is there?
Also, even if it were, lots of games have Nash equilibra that are not reasonable solutions so saying "this is a Nash equilibrium" doesn't mean you have found a good solution.
For example, consider the simultaneous move game where we each pick A or B. If we both pick B we both get 1. If anything else happens we both get 0. Both of us picking A is a Nash equilibrium, but is also clearly unreasonable.
This game has two equilibria: a bad one at (A,A) and good one at (B,B). The game from this post also has two equilibria, but both involve player one picking A, in which case it doesn't matter what player two does (or in your version, he doesn't get to do anything).
This game is exactly equivalent to the standard one where player one chooses from (A,B,C) and player two chooses from (X,Y), with the payoff for (A,X) and for (A,Y) equal to (3,0). When choosing what choice to make, player two can ignore the case where player one chooses A, since the payoffs are the same in that case.
And as others have said, the pure strategy (A,X) is a Nash equilibrium.
Thanks, that cleared things up.
Right, but P_0(s)
is defined for statements s
in F
. Then suddenly you talk about P_0(s | there is no contradiction of length <= D)
, but the thing between parentheses is not a statement in F
. So, what is the real definition of P_D
? And how would I compute it?
P_D(s) := P_0(s | there are no contradictions of length <= D).
You have not actually defined what P_0(a | b)
means. The usual definition would be P_0(a | b) = P_0(a & b) / P_0(b)
. But then, by definition of P_0, P_0(a & b) = 0.5
and P_0(b) = 0.5
, so P_0(a | b) = 1
. Also, the statement "there are no contradictions of length <= D" is not even a statement in F.
As I understand it, the big difference between Bayesian and frequentist methods is in what they output. A frequentist methods gives you a single prediction $z_t$, while a Bayesian method gives you a probability distribution over the predictions, $p(z_t)$. If your immediate goal is to minimize a known (or approximable) loss function, then frequentist methods work great. If you want to combine the predictions with other things as part of a larger whole, then you really need to know the uncertainty of your prediction, and ideally you need the entire distribution.
For example, when doing OCR, you have some model of likely words in a text, and a detector that tells you what character is present in an image. To combine the two, you would use the probability of the image containing a certain character and multiply it by the probability of that character appearing at this point in an English sentence. Note that I am not saying that you need to use a fully Bayesian model to detect characters, just that you somehow need to estimate your uncertainty and be able to give alternative hypotheses.
In summary, combining multiple models is where Bayesian reasoning shines. You can easily paste multiple models together and expect to get a sensible result. On the other hand, for getting the best result efficiently, state of the art frequentist methods are hard to beat. And as always, the best thing is to combine the two as appropriate.
After writing this I realize that there is a much simpler prior on finite sets S of consistent statements: simply have a prior over all sets of statements, and keep only the consistent ones. If your language is chosen such that it contains X if and only if it also contains ¬X, then this is equivalent to choosing a truth value for each basic statement, and a uniform prior over these valuations would work fine.
I am not convinced by the problematic example in the "Scientific Induction in Probabilistic Mathematics" writeup. Let's say that there are n atoms ϕ(1)..ϕ(n). If you don't condition, then because of symmetry, all consistent sets S drawn from the process have equal probability. So the prior on S is uniform and the probability of ϕ(i) is therefore 1/2, by
P(ϕ(i)) = ∑{S} 1[ϕ(i)∈S] * P(S)
n a consistent set S drawn from the process is exactly 1/2 for all i, this must be true by symmetry because μ(x)=μ(¬x). Now what you should do to condition on some statement X is simply throw out the sets S which don't satisfy that statement, i.e.
P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * P(S) * 1[X(S)] / ∑{S} P(S) * 1[X(S)]
Since the prior on S was uniform, it will still be uniform on the restricted set after conditioning. So
P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * 1[X(S)] / ∑{S} 1[X(S)]
Which should just be 90% in the example where X is "90% of the ϕ are true"
The mistake in the writeup is to directly define P(S|X) in an inconsistent way.
To avoid drowning in notation, let's consider a simpler example with the variables a, b and c. We will first pick a or ¬a uniformly, then b or ¬b, and finally c or ¬c. Then we try to condition on X="exactly one of a,b,c is true". You obviously get prior probabilities P(S) = 1/8 for all consistent sets.
If you condition the right way, you get P(S) = 1/3 for the sets with one true attom, and P(S)=0 for the other sets. So then
P(a|X) = P(a|{a,¬b,¬c})P({a,¬b,¬c}|X) + P(a|{¬a,b,¬c})P({¬a,b,¬c}|X) + P(a|{¬a,¬b,c})P({¬a,¬b,c}|X)
= 1/3
What the writeup does instead is first pick a or ¬a uniformly. If it picks a, we know that b and c are false. If we pick ¬a we continue. The uniform choice of a is akin to saying that
P({a,b,c}|X) = P(a) * P({b,c}|a,X).
But that first term should be P(a|X)
, not P(a)
!
Do you have any evidence for your claim that people need these two layers? As far as I can tell this is just something for which you can make up a plausible sounding story.
there are people who merely procrastinate on the LW website, and there are people who join some of the organizations mentioned here
There is a (multidimensional) continuum of people on LW. It is not as black and white as you make it out to be.
The question is whether 'change' signifies only a magnitude or also a direction. The average magnitude of the change in belief when doing an experiment is larger than zero. But the average of change as vector quantity, indicating the difference between belief after and before the test, is zero.
If you drive your car to work and back, then the average velocity of your trip is 0, but the average speed is positive.
Not all engineering is about developing products to sell to consumers. Engineers also design bridges and rockets. I don't think these are subject to the open marker in any meaningful sense.
but it turns out the real number is something like 20-30%.
Citation needed
A priori I wouldn't trust Omega to be fair. I only know that he doesn't lie. If Omega also said that he chose the logical statement in some fair way, then that would assure me the logical coin is identical to a normal coin. He can do this either using real uncertainty, like rolling a die to pick from a set of statements where half of them are true. Or he could use logical uncertainty himself, by not calculating the digit of pi before deciding to make the bet, and having a prior that assigns 50% probability to either outcome.
How does a logical coin work exactly? To come up with such a thing, wouldn't Omega first need to pick a particular formula? If the statement is about the nth digit of pi, then he needs to pick n. Was this n picked at random? What about the sign of the test itself? If not, how can you be sure that the logical coin is fair?
In general, you can not compare the utilities for two different agents, since a linear transformation doesn't change the agent's behavior. So (12, 12) is really (12a+a₀, 12b+b₀). How would you even count the utility for another agent without doing it in their terms?
We don't have this problem in practice, because we are all humans, and have similar enough utility functions. So I can estimate your utility as "my utility if I were in your shoes". A second factor is perhaps that we often use dollars as a stand-in for utilons, and dollars really can be exchanged between agents. Though a dollar for me might still have a higher impact than a dollar for you.
The Central Limit Theorem (The Middle Thing-It-Goes-To Idea-You-Can-Show-Is-True-With-Numbers - when you take lots of Middle Numbers of lots of groups, it looks like the Normal Line!)
Does it really simplify things if you replace "limit" with "thing-it-goes-to" and theorem with "idea-you-can-show-is-true-with-numbers"? IMO this is a big problem with the up-goer five style text: you can still try to use complex concepts by combining words. And because you have to describe the concept with inadequate words, it becomes actually harder to understand what you really mean.
There are two purposes of writing simple English:
- writing for children
- writing for non-native speakers
In both cases is "sun-colored stuff that comes out of the ground" really the way you would explain it? I would sooner say something like: "yellow is the color of the sun, it looks like . People like shiny yellow metal called gold, because there is little of it".
I suppose the actual reason we are doing this is
- artificially constrained writing is fun.
If your boyfriend or girlfriend has a different meaning for 'box' than you do, and you give them a line, not only will they be cross with you, but you will be wrong, and that is almost as bad
"give them a line" and "be cross with you" are expressions that make no sense with the literal interpretation of these words.
You are right, I was getting confused by the name. And the wikipedia article is pretty bad in that it doesn't give a proper concise definition, at least none that I can find. SEP is better.
It still looks like you need some consequentialism in the explanation, though.
That is not what utilitarianism means. It means doing something is good if what happens is good, and doing something is bad if what happens is bad. It doesn't say which things are good and bad.
Complement it with the fact that it costs about 800 thousand dollars to raise a mind, and an adult mind might be able to create value at rates high enough to continue existing.
An adult, yes. But what about the elderly? Of course this is an argument for preventing the problems of old age.
that is a good argument against children.
Is it? It just says that you should value adults over children, not that you should value children over no children. To get one of these valuable adult minds you have to start with something.
An update: your comment (among others) prompted me to do some more reading. In particular, the Stanford Encyclopedia of Philosophy article on Causal Decision Theory was very helpful in making the distinction between CDT and EDT clear to me.
I still think you can mess around with the notion of random variables, as described in this post, to get a better understanding of what you are actually prediction. But I suppose that this can be confusing to others.
So every human has a right to their continued existence. That's a good argument. Thanks.
Existing people take priority over theoretical people. Infinitely so.
Does this mean that I am free to build a doomsday weapon that kills everyone born after September 4th 2013 100 years from now, if that gets me a cookie?
This should be obvious, as the reverse conclusion ends up with utter absurdities of the "Every sperm is sacred" variety.
Not necessarily. It would merely be your obligation to have as many children as possible, while still ensuring that they are healthy and well cared for. At some point having an extra child will make all your children less well of.
Once a child is born, it has as much claim on our consideration as every other person in our light cone
Why is there a threshold at birth? I agree that it is a convenient point, but it is arbitrary.
Reject this axiom and you might as well commit suicide over the guilt of the billions of potentials children you could have that are never going to be born.
Why should I commit suicide? That reduces the number of people. It would be much better to start having children. (Note that I am not saying that this is my utility function).
Are old humans better than new humans?
This seems to be a hidden assumption of cryonics / transhumanism / anti-deathism: We should do everything we can to prevent people from dying, rather than investing these resources into making more or more productive children.
"Most of the causal inference community" agrees that causal models are made up of potential outcomes, which on the unit level are propositional logical variables that determine how some "unit" (person, etc.) responds in a particular way (Y) to a hypothetical intervention on the direct causes of Y.
Is Y a particular way of responding (e.g. Y = "the person dies"), or is it a variable that denotes whether the person responds in that way (e.g. Y=1 if the person dies and 0 otherwise)? I think you meant the latter.
If we don't know which unit we are talking about, we average over them to get a random variable Y(pa(Y)).
How does averaging over propositional logical variables give you a random variable? I am afraid I am getting confused by your terminology.
I think it's fine if you want to advocate a "new view" on things. I am just worried that you might be suffering from a standard LW disease of trying to be novel without adequately understanding the state of play, and why the state of play is the way it is.
I wasn't trying to be novel for the sake of it. Rather, I was just trying to write down my thoughts on the subject. As I said before, if you have some specific pointers to the state of the art in this field, then that would be much appreciated. Note that I have a background in computer science and machine learning, so I am somewhat familiar with causal models
and moreover the only way to give the right answer to these kinds of questions is to be isomorphic to the "CDT algorithm" for these kinds of questions.
That sounds interesting. Do you have a link to a proof of this statement?
I don't know. That is to say: it is original research, but probably subconsciously inspired by and stolen from many other sources.
What you call "EEDT" I would call "expected value calculation," and then I would use "decision theory" to describe the different ways of estimating conditional probabilities (as you put it).
I think there are three steps in the reduction:
- A "decision problem": "given my knowledge, what action should I take", which is answered by a "decision theory/procedure"
- Expected values: "given my knowledge, what is the expected value of taking action A"
- Conditional probability estimation: "given my knowledge, what is my best guess for the probability P(X|Y)"
The reduction from 1 to 2 is fairly obvious, just take the action that maximizes expected value. I think this is common to all decision theories.
The reduction of 2 to 3 is what is done by EEDT. To me this step was not so obvious, but perhaps there is a better name for it.
Basically, yes. What separates EDT and CDT is whether they condition on the joint probability distribution or use the do operator on the causal graph; there's no other difference.
So the difference is in how to solve step 3, I agree. I wasn't trying to obscure anything, of course. Rather, I was trying to advocate that we should focus on problem 3 directly, instead of problem 1.
It is right that expected value calculation is potentially nonobvious, and so saying "we use expected values" is meaningful and important, but I think that the language to convey the concepts you want to convey already exists, and you should use the language that already exists.
Do you have some more standard terms I should use?
What is a 'closed system' in this context? What is the definition of 'Universe'?
A typical system studied in introductory probability theory might be a die roll. This is a system with a sample space with 6 states. It is an abstract thing, that doesn't interact with anything else. In reality, when you roll a die, that die is part of the real world. The same world that also contains you, the earth, etc. That is what I meant by 'universe'.
For closed systems, I was thinking of the term as used in physics:
a closed system is a physical system which doesn't exchange any matter with its surroundings, and isn't subject to any force whose source is external to the system.