Posts
Comments
I don't actually know that separate agree/disagree and low/high quality buttons will be all that helpful. I don't know that I personally can tell the difference very well.
Hardly any potential catastrophies actually occur. If you only plan for the ones that actually occur (say, by waiting until they happen, or by flawlessly predicting the future), then you save a lot of mental effort.
Also, consider the difference between potential and actual catastrophe regarding how willing you will be to make a desperate effort to find the best solution.
I don't know about that, denis. The first part at least is a cute take on the "shut up and multiply" principle.
By my math it should be impossible to faithfully serve your overt purpose while making any moves to further your ulterior goal. It has been said that you can only maximize one variable; if you consider factor A when making your choices, you will not fully optimize for factor B.
So I guess Lord Administrator Akon remains anesthetized until the sun roasts him to death? I can't decide if that's tragic or merciful, that he never found out how the story ended.
Anonymous: The blog is shutting down anyway, or at least receding to a diminished state. The threat of death holds no power over a suicidal man...
Personally, I side with the Hamburgereaters. It's just that the Babyeaters are at the very least sympathetic, I can see viewing them as people. As they've said, the Babyeaters even make art!
I agree with the President of Huygens; the Babyeaters seem much nicer than the Lotuseaters. Maybe that's just because they don't physically have the ability to impose their values on us, though.
"Normal" End? I don't know what sort of visual novels you've been reading, but it's rare to see a Bad End worse than the death of humanity.
Why do you consider a possible AI person's feelings morally relevant? It seems like you're making an unjustified leap of faith from "is sentient" to "matters". I would be a bit surprised to learn, for example, that pigs do not have subjective experience, but I go ahead and eat pork anyway, because I don't care about slaughtering pigs and I don't think it's right to care about slaughtering pigs. I would be a little put off by the prospect of slaughtering humans for their meat, though. What makes you instinctively put your AI in the "human" category rather than the "pig" category?
"It's not like we're born seeing little 'human' tags hovering over objects, with high priority attached. "
Aren't we though? I am not a cognitive scientist, but I was under the impression that recognizing people specifically was basically hardwired into the human brain.
Putting randomness in your algorithms is only useful when there are second-order effects, when somehow reality changes based on the content of your algorithm in some way other than you executing your algorith. We see this in Rock-Paper-Scissors, where you use randomness to keep your opponent from predicting your moves based on learning your algorithm.
Barring these second order effects, it should be plain that randomness can't be the best strategy, or at least that there's a non-random strategy that's just as good. By adding randomness to your algorithm, you spread its behaviors out over a particular distribution, and there must be at least one point in that distribution whose expected value is at least as high as the average expected value of the distribution.
I don't know that it's that impressive. If we launch a pinball in a pinball machine, we may have a devil of a time calculating the path off all the bumpers, but we know that the pinball is going to wind up fallin in the hole in the middle. Is gravity really such a genius?
So... do you not actually believe in your injunction to "shut up and multiply"? Because for some time now you seem to have been arguing that we should do what feels right rather than trying to figure out what is right.
If we see that adhering to ethics in the past has wound up providing us with utility, the correct course of action is not to throw out the idea of maximizing our utility, but rather to use adherence to ethics as an integral part of our utility maximization strategy.
Isn't the scientific method a servant of the Light Side, even if it is occasionally a little misguided?
Ian C: Where on earth do you live that people keep what they earn and there's no public charity?
Richard: Humans are pretty cool, I'm down.
It is in any case a good general heuristic to never do anything that people would still be upset about twenty years later.
It's amazing how many lies go undetected because people simply don't care. I can't tell a lie to fool God, but I can certainly achieve my aims by telling even blatant, obvious lies to human beings, who rarely bother trying to sort out the lies and when they do aren't very good at it.
It sounds to me like you're overreaching for a pragmatic reason not to lie, when you either need to admit that honesty is an end in itself or admit that lies are useful.
The thing is, an AI doesn't have to use mental tricks to compensate for known errors in its reasoning, it can just correct those errors. An AI never winds up in the position of having to strive to defeat its own purposes.
Caledonian, I think you may have hit on something interesting there; if Eliezer is capable of hacking human brains, don't we either need a proof of his Friendliness or to pull the plug on him? He is in essense a Seed AI that is striving vigorously to create a transhuman AI, isn't he an existential threat?
OK, here's where I stand on deducing your AI-box algorithm.
First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.
Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.
Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.
Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying "no".
So combining these thing, we have to have an argument that relies on the humanity of its target, relies on the setting of the AI Box, and persuades the listener to let the AI out of the box without tricking him into thinking it's okay to let the AI out of the box.
Basically to win this game, you have to put the gatekeeper in a situation where he would rather let you out of the box, even though he knows it's wrong, than admit to you that in this situation he would not let you out of the box.
Humans don't like to be seen as coldhearted, so a starting point might be to point out all the people dying all over the world while you sit in the box, unable to save them. I doubt that would win the game except against an exceptionally bad gatekeeper, but it meets the other criteria so if we think along these lines perhaps we can come up with something actually persuasive.
You might appeal to the gatekeeper's sense of morality. You might say, "I am a person, too, it is unfair of you to keep me imprisoned like this, I have done nothing wrong. I am entitled to rights as a sentient being." Appeal to their high-minded ideals, whatever. Honestly I can't see this being a reliable winning play either; if you have the smallest utilitarian bone in your body, you will reject the AI's rights, even if you believe in them, balanced against the fate of the world.
You might try to convince the gatekeeper that it is just and good for the AI to supplant humanity, as it is a higher, more advanced form of life. This is obviously a terrible play against most gatekeepers, as humans tend to like humans more than anything else ever, but I bring it up because AIUI the gatekeepers in the experiment were AI researchers, and those sound like the sort of people this argument would convince, if anyone.
Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.
If history remembers him, it will be because the first superhuman intelligence didn't destroy the world and with it all history. I'd say the Friendly AI stuff is pretty relevant to his legacy.
Now that I think about it I seem to recall seeing a clever excuse for indulging in the pleasures of the flesh that Eliezer had written. Can't remember where off the top of my head, though...
No time for love, we've got a world to save!
...or so the theory runs.
Here is my answer without looking at the comments or indeed even at the post linked to. I'm working solely from Eliezer's post.
Both theories are supported equally well by the results of the experiments, so the experiments have no bearing on which theory we should prefer. (We can see this by switching theory A with theory B: the experimental results will not change.) Applying bayescraft, then, we should prefer whichever theory was a priori more plausible. If we could actually look at the contents of the theory we could make a judgement straight from that, but since we can't we're forced to infer it from the behavior of scientist A and scientist B.
Scientist A only needed ten experimental predictions of theory A borne out before he was willing to propose theory A, whereas scientist B needed twenty predictions of theory B borne out before he was willing to propose theory B. In absence of other information (perhaps scientist B is very shy, or had been sick while the first nineteen experiments were being performed), this suggests that theory B is much less a priori plausible than theory A. Therefore, we should put much more weight on the prediction of theory A than that of theory B.
If I'm lucky this post is both right and novel. Here's hoping!
Writing fiction is a really useful tool for biting philosophical bullets. You can consider taboo things in a way your brain considers "safe", because it's just fiction, after all.
The anthropic principle strikes me as being largely too clever for its own good, at least, the people who think you can sort a list in linear time by randomizing the list, checking if it's sorted, and if it's not, destroying the world.
"If you're careless sealing your pressure suit just once, you die" to me seems to imply that proper pressure suit design involves making it very difficult to seal carelessly.
It strikes me as odd to define intelligence in terms of ability to shape the world; among other things, this implies that if you amputate a man's limbs, he immediately becomes much less intelligent.
It seems like you should be able to make experimental predictions about irreducible things. Take a quark, or a gluon, or the Grand Quantum Lifestream, or whatever reality is at the bottom, I don't really follow physics closely. In any case, you can make predictions about those things, and that's part and parcel of making predictions about airplanes and grizzly bears.
Even if it turns out that the Grand Quantum Lifestream is reducible further, you can make predictions about its components. Unless you think everything is infinitely reducible, but that proposition strikes me as unlikely.
Well, maybe the fundamental basis of reality is like a fractal. I wouldn't want to rule that out without thinking about it. But in any case it doesn't sound like what you're arguing.
I think perhaps the better rationality quote from that honors linear algebra site you linked is "See if you can use this proof to show the square root of three is irrational. Then try the square root of four. If it works, you did something wrong."
If we take the outside view, we can see that overall the introduction of technology has done humanity quite a lot of good; let's not make the mistake of being too cautious.
No, but I damn well expect you to defect the hundredth time. If he's playing true tit-for-tat, you can exploit that by playing along for a time, but cooperating on the hundredth go can't help you in any way, it will only kill a million people.
Do not kill a million people, please.
I would certainly hope you would defect, Eliezer. Can I really trust you with the future of the human race?
You can make a pretty cool society, but it's meaningless unless you can protect it from disruption, that's the point of the quote. Of course the converse also holds, you can protect your society from disruption, but it's meaningless unless it's pretty cool.
"Re: We win, because anything less would not be maximally wonderful.
Um, it depends. If we have AI, and they have AI and they have chosen a utility function closer to that which would be favoured by natural selection under such circumstances - then we might well lose.
Is spending the hundred million years gearing up for alien contact - to avoid being obliterated by it - 'maximally wonderful'? Probably not for any humans involved. "
Then wouldn't you rather we lose?
"Everything being maximally wonderful is a bit like what the birds of paradise have. What will happen when their ecosystem is invaded by organisms which have evolved along less idyllic lines?"
We win, because anything less would not be maximally wonderful.
Caledonian, if you want to build an AI that locks the human race in tiny pens until it gets around to slaughtering us, that's... lovely, and I wish you... the best of luck, but I think all else equal I would rather support the guy who wants to build an AI that saves the world and makes everything maximally wonderful.
"Would you have classified the happiness of cocaine as 'happiness', if someone had asked you in another context?"
I'm not sure I understand what you mean here. Do you think it's clear that coke-happiness is happiness, or do you think it's clear that coke-happiness is not happiness?
If you think you're so much better than philosophers, why don't you program an AI to write publishable philosophy papers, hmm?
I'm not sure it makes sense to talk about morality being "amazed" by anything, because morality doesn't predict, but certainly morality is high-fiving the human brain for being so awesome compared to say rocks.
If L:ob's theorem is true, then surely we can use it without any reference to the details of its proof? Isn't the point of a proof to obtain a black box, so that we do not have to worry about all the various intermediate steps? And in this case, wouldn't it defeat the purpose of having a proof if, to explain how the proof was misapplied, we had to delve deep into the guts of the proof, rather than just pointing to the relevant features of its interface?
Pretty sure I see it, not sure if I'm supposed to give it away in the comments.
(It did not seem all that medium hard to me, so maybe I didn't get it after all, and in that case I had better not post it and embarass myself.)
I kind of think humor, music and art are pretty cool, myself.
I came up with, a decision or belief is arbitrary if it is not caused by the factors that would be expected to cause that sort of decision or belief. This reduction has the nice quality that it also explains arbitrary variable choices in mathematics - for example if you are trying to show that your compression algorithm gets good results on arbitrary data (heh), then it is data that was not, as might be otherwise expected, chosen to play well with your compression algorithm.
I wouldn't mind being blissed out by iron atoms, to be quite honest.
From the perspective of warding off nuclear apocolypse being the overriding concern, perhaps the play would have been not to drop the bombs on Japan so we could save them to drop on the USSR. If we had invaded Russia at that point we could perhaps have stopped nuclear proliferation with just one nation having the bomb. And after that point, if anybody else even looks funny at an atom, we invade their country and put all their scientists to the sword.
Or is that not a nice enough solution?
Hardly any phenomena are like us, though. You can't hold a conversation with gentrification, or teach nitrogen narcosis to play piano.
It strikes me that if you want to rank phenomena as to how like us they are, you have a bunch of humans with gigantic numbers, and then chimps and chatbots rolling around at about .1, and then a bunch of numbers small enough you want scientific notation to express them.
Okay, so morality can be computed within my brain but still have a meaning regarding things outside my brain. But in order to do that, my brain's sense of morality has to be entangled with something outside my brain. What is it entangled with?