Posts
Comments
michael vassar:
[Nerds] perceive abstractions handed to them explicitly by other people more easily than patterns that show up around them. Oddly, this seems to me to be a respect in which nerds are more feminine rather than being more masculine as they are in many other ways.Would you elaborate on this? What is the generally-feminine behavior of which the first sentence describes an instance?
My first inclination would be to think that your first sentence describes something stereotypically masculine. It's an example of wanting things to come in pre-structured formats, which is part of wanting to operate in a domain that is governed by explicit pre-established rules. That is often seen as a stereotypically-masculine desire that manifests in such non-nerdy pursuits as professional sports and military hierarchies.
George Weinberg:
Does it occur to anyone else that the fable is not a warning against doing favors in general but of siding with "outsiders" against "insiders"?Wow; now that you mention it, that is a blatant recurring theme in the story. I now can't help but think that that is a major part, if not the whole, of the message. Each victim betrays an in-group to perform a kindness for a stranger. It's pretty easy to see why storytellers would want to remind listeners that their first duty is to the tribe. Whatever pity they might feel for a stranger, they must never let that pity lead them to betray the interests of their tribe.
Can't believe I missed that :).
Some here seem to think it significant that the good-doers in the story are not naive fools over whom the audience can feel superior. It is argued that that sense of superiority explains stories like the Frog and the Scorpion in the West. The inference seems to be that since this sense of superiority is lacking in this African tale, the intent could only have been to inform the audience that this is how the world works.
However, I don't think that the "superiority" explanation can be so quickly dismissed. To me, this story works because the audience keeps having their expectations of gratitude violated. Hence, the storyteller gets to feel superior to the audience by proving him or herself to be wiser in the ways of the world. The closing lines ---
That's all. For so it always has been - if you see the dust of a fight rising, you will know that a kindness is being repaid! That's all. The story's finished.--- read to me like a self-satisfied expression of condescension towards an audience so naive as to expect some justice in this world.
Paul Crowley:
One trivial example of signalling here is the way everyone still uses the Computer Modern font. This is a terrible font, and it's trivial to improve the readability of your paper by using, say, Times New Roman instead, but Computer Modern says that you're a serious academic in a formal field.I don't think that these people are signaling. Computer Modern is the default font for LaTeX. Learning how to change a default setting in LaTeX is always non-trivial.
You might argue that people are signaling by using LaTeX instead of Word or whatever, but switching from LaTeX to some other writing system is also not a trivial matter.
Eliezer, the link in your reply to nazgulnarsil links to this very post. I'm assuming that you intended to link to that recent post of yours on SJG, but I'll leave it to you to find it :).
I think that you make good points about how fiction can be part of a valid moral argument, perhaps even an indispensable part for those who haven't had some morally-relevant experience first-hand.
But I'm having a hard time seeing how your last story helped you in this way. Although I enjoyed the story very much, I don't think that your didactic purposes are well-served by it.
My first concern is that your story will actually serve as a counter-argument for rationality to many readers. Since I'm one of those who disagreed with the characters' choice to destroy Huygens, I'm pre-disposed to worry that your methods could be discredited by that conclusion. A reader who has not already been convinced that your methods are valid could take this as a reductio ad absurdum proof that they are invalid. I don't think that your methods inexorably imply your conclusion, but another reader might take your word for it, and one person's modus ponens is another's modus tollens. Of course, all methods of persuasion carry this risk. But it's especially risky when you are actively trying to make the "right answer" as difficult as possible to ascertain for dramatic purposes.
Another danger of fictional evidence is that it can obscure what exactly is the structure and conclusion of the argument. For example, why were we supposed to conclude that evading the Super-Happies was worth killing 15 billion at Huygens but was not worth destroying Earth and fragmenting the colonies? Or were we necessarily supposed to conclude that? Were you trying to persuade the reader that the Supper-Happies' modifications fell between those two choices? As far as I could tell, there was no argument in the story to support this. Nor did I see anything in your preceding "rigorous" posts to establish that being modified fell in this range It appeared to be a moral assertion for which no argument was given. Or perhaps it was just supposed to be a thought-provoking possibility, to which you didn't mean to commit yourself. You subsequent comments don't lead me to think that, though. This uncertainty about your intended conclusion would be less likely if you were relying on precise arguments.
Psy-Kosh: Yeah, I meant to have a "as Psy-Kosh has pointed out" line in there somewhere, but it got deleted accidentally while editing.
ad:
How many humans are there not on Huygens?
I'm pretty sure that it wouldn't matter to me. I generally find on reflection that, with respect to my values, doing bad act A to two people is less than twice as bad as doing A to one person. Moreover, I suspect that, in many cases, the badness of doing A to n people converges to a finite value as n goes to infinity. Thus, it is possible that doing some other act B is worse than doing A to arbitrarily many people. At this time, I believe that this is the case when A = "allow the Super-Happies to re-shape a human" and B = "kill fifteen billion people".
If the Super-Happies were going to turn us into orgasmium, I could see blowing up Huygens. Nor would it necessarily take such an extreme case to convince me to take that extreme measure. But this . . . ?
"Our own two species," the Lady 3rd said, "which desire this change of the Babyeaters, will compensate them by adopting Babyeater values, making our own civilization of greater utility in their sight: we will both change to spawn additional infants, and eat most of them at almost the last stage before they become sentient." ... "It is nonetheless probable," continued the Lady 3rd, "that the Babyeaters will not accept this change as it stands; it will be necessary to impose these changes by force. As for you, humankind, we hope you will be more reasonable. But both your species, and the Babyeaters, must relinquish bodily pain, embarrassment, and romantic troubles. In exchange, we will change our own values in the direction of yours. We are willing to change to desire pleasure obtained in more complex ways, so long as the total amount of our pleasure does not significantly decrease. We will learn to create art you find pleasing. We will acquire a sense of humor, though we will not lie. From the perspective of humankind and the Babyeaters, our civilization will obtain much utility in your sight, which it did not previously possess. This is the compensation we offer you. We furthermore request that you accept from us the gift of untranslatable 2, which we believe will enhance, on its own terms, the value that you name 'love'. This will also enable our kinds to have sex using mechanical aids, which we greatly desire. At the end of this procedure, all three species will satisfice each other's values and possess great common ground, upon which we may create a civilization together."
Sure, I would turn this down if it were simply offered as a gift. But I really, really, cannot see preferring the death of fifteen billion people over it. Although I value the things that the Super-Happies would take away, and I even value valuing them, I don't value valuing them all that much. Or, if I do, it is very far from intuitively obvious to me. And the more I think about it, the less likely it seems.
I hope that Part 8 somehow makes this ending seem more like the "right" one. Maybe it will be made clear that the Super-Happies couldn't deliver on their offer without imposing significant hidden downsides. It wouldn't stretch plausibility too much if such downsides were hidden even from them. They are portrayed as not really getting how we work. As I said in this comment to Part 3, we might expect that they would screw us up in ways that they don't anticipate.
But unless some argument is made that their offer was much worse than it seemed at first, I can't help but conclude that the crew made a colossal mistake by destroying Huygens, to understate the matter.
Wei Dai: Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.
I don't know the proper rational-choice-theory terminology, but wouldn't modeling this program just be a matter of describing the "space" of choices correctly? That is, rather than making the space of choices {A, B, C}, make it the set containing
(1) = taking A when offered A and B, (2) = taking B when offered A and B,
(3) = taking B when offered B and C, (4) = taking C when offered B and C,
(5) = taking C when offered C and A, (6) = taking A when offered C and A.
Then the revealed preferences (if that's the way to put it) from your experiment would be (1) > (2), (3) > (4), and (5) > (6). Viewed this way, there is no violation of transitivity by the relation >, or at least none revealed so far. I would expect that you could always "smooth over" any transitivity-violation by making an appropriate description of the space of options. In fact, I would guess that there's a standard theory about how to do this while still keeping the description-method as useful as possible for purposes such as prediction.
It's good. Not baby-eatin' good, but good enough ;).
Daniel Dennett's standard response to the question "What's the secret of happiness" is "The secret of happiness is to find something more important than you are and dedicate your life to it."
I think that this avoids Eliezer's criticism that "you can't deliberately pursue 'a purpose that takes you outside yourself', in order to take yourself outside yourself. That's still all about you." Something can be more important than you and yet include you. Depending on your values, the future of the human race itself could serve as an example. It would seem also to be still an available "hedonic accessory" in any eutopia that includes humanity in some form.
But has that been disproved? I don't really know. But I would imagine that Moravec could always append, ". . . provided that we found the right 10 trillion calculations." Or am I missing the point?
Here's a Daniel Dennett essay that seems appropriate:
Maybe it was the categorical nature of "no danger whatsoever" that led to the comparisons to religion. Given the difficulty of predicting anyone's psychological development, and given that you yourself say that you've seen multiple lapses before, what rational reason could you have for such complete confidence? Of course, it's true that there are things besides religion that cause people to make predictions with probability 1 (which, you must concede, is a plausible reading of "no danger whatsoever"). But, in human affairs, with our present state of knowledge, can such predictions ever be entirely reasonable?
anon and Chris Hibbert, I definitely didn't mean to say that Robin is claiming to be working with as much certainty as Fermi could claim. I didn't mean to be making any claim about the strength or content of Robin's argument at all, other than that he's assigning low probability to something to which Eliezer assigns high probability.
Like I said, the analogy with the Fermi story isn't very good. My point was just that a critique of Fermi should have addressed his calculations, pointing out where exactly he went wrong (if such a point could be found). Eliezer, in contrast, isn't really grappling with Robin's theorizing in a direct way at all. I know that the analogy isn't great for many reasons. One is that Robin's argument is in a more informal language than mathematical physics. But still, I'd like to see Eliezer address it with more directness.
As it is, this exchange doesn't really read like a conversation. Or, it reads like Robin wants to engage in a conversation. Eliezer, on the other hand, seems to think that he has identified flaws in Robin's thinking, but the only way he can see to address them is by writing about how to think in general, or at least how to think about a very broad class of questions, of which this issue is only a very special case.
I gather that, in Eliezer's view, Robin's argument is so flawed that there's no way for Eliezer to address it on its own terms. Rather, he needs to build a solid foundation for reasoning about these things from the ground up. The Proper Way to answer this question will then be manifest, and Robin's arguments will fall by the wayside, clearly wrong simply by virtue of not being the Proper Way.
Eliezer may be right about that. Indeed, I think it's a real possibility. Maybe that's really the only way that these kinds of things can be settled. But it's not a conversation. And maybe that will be the lesson that comes out of this. Maybe conversation is overrated.
None of this is supposed to be a criticism of either Eliezer's or Robin's side of this specific issue. It's a criticism of how the conversation is being carried out. Or maybe just an expression of impatience.
I've been following along and enjoying the exchange so far, but it doesn't seem to be getting past the "talking past each other" phase.
For example, the Fermi story works as an example of a cycle as a source of discontinuity. But I don't see how it establishes anything that Robin would have disputed. I guess that Eliezer would say that Robin has been inattentive to its lessons. But he should then point out where exactly Robin's reasoning fails to take those lessons into account. Right now, he just seems to be pointing to an example of cycles and say, "Look, a cycle causing discontinuity. Does that maybe remind you of something that perhaps your theorizing has ignored?" I imagine that Robin's response will just be to say, "No," and no progress will have been made.
And, of course, once the Fermi story is told, I can't help but think of how else it might be analogous to the current discussion. When I look at the Fermi story, what I see is this: Fermi took a powerful model of reality and made the precise prediction that something huge would happen between layers 56 and 57, whereas someone without that model would have just thought, "I don't see how 57 is so different from 56." What I see happening in this conversation is that Robin says, "Using a powerful model of reality, I predict that an event, which Eliezer thinks is very likely, will actually happen only with probability <10%." (I haven't yet seen a completely explicit consensus account of Robin and Eliezer's disagreement, but I gather that it's something like that.) And Eliezer's replies seem to me to be of the form "You shouldn't be so confident in your model. Previous black swans show how easily predictions based on past performance can be completely wrong."
I concede that the analogy between the Fermi story and the current conversation is not the best fit. But if I pursue it, what I get is this: Robin is in a sense claiming to be the Fermi in this conversation. He says that he has a well-established body of theory that makes a certain prediction: that Eliezer's scenario has very low probability of happening.
Eliezer, on the other hand, is more like someone who, when presented with Fermi's predictions (before they'd been verified) might have said, "How can you be so confident in your theory? Don't you realize that a black swan could come and upset it all? For example, maybe a game-changing event could happen between layers 32 and 33, preventing layer 57 from even occurring. Have you taken that possibility into account? In fact, I expect that something will happen at some point to totally upset your neat little calculations"
Such criticisms should be backed up with an account of where, exactly, Fermi is making a mistake by being so confident in his prediction about layer 57. Similarly, Eliezer should say where exactly he sees the flaws in Robin's specific arguments. Instead, we get these general exhortations to be wary of black swans. Although such warnings are important, I don't see how they cash out in this particular case as evidence that Robin is the one who is being too confident in his predictions.
In other words, Robin and Eliezer have a disagreement that (I hope) ultimately cashes out as a disagreement about how to distribute probability over the possible futures. But Eliezer's criticisms of Robin's methods are all very general; they point to how hard it is to make such predictions. He argues, in a vague and inexact way, that predictions based on similar methods would have gone wrong in the past. But Eliezer seems to dodge laying out exactly where Robin's methods go wrong in this particular case and why Eliezer's succeed.
Again, the kinds of general warnings that Eliezer gives are very important, and I enjoy reading them. It's valuable to point out all the various quarters from which a black swan could arrive. But, for the purposes of this argument, he should point out how exactly Robin is failing to heed these warnings sufficiently. Of course, maybe Eliezer is getting to that, but some assurance of that would be nice. I have a large appetite for Eliezer's posts, construed as general advice on how to think. But when I read them as part of this argument with Robin, I keep waiting for him to get to the point.
Tim Tyler,
I don't yet see why exactly Eliezer is dwelling on the origin of replicators.Check with the title: if you are considering the possibility of a world takeover, it obviously pays to examine the previous historical genetic takeovers.
Right. I get the surface analogy. But it seems to break down when I look at its deeper structure.
Oops; I should have noted that I added emphasis to those quotes of Eliezer. Sorry.
I don't yet see why exactly Eliezer is dwelling on the origin of replicators. As Robin said, it would have been very surprising if Robin had disagreed with any of it.
I guess that Eliezer's main points were these: (1) The origin of life was an event where things changed abruptly in a way that wouldn't have been predicted by extrapolating from the previous 9 billion years. Moreover, (2) pretty much the entire mass of the universe, minus a small tidal pool, was basically irrelevant to how this abrupt change played out and continues to play out. That is, the rest of the universe only mattered in regards to its gross features. It was only in that tidal pool that the precise arrangement of molecules had and will have far-reaching causal implications for the fate of the universe.
Eliezer seems to want to argue that we should expect something like this when the singularity comes. His conclusion seems to be that it is futile to survey the universe as it is now to try to predict detailed features of the singularity. For, if the origin of life is any guide, practically all detailed features of the present universe will prove irrelevant. Their causal implications will be swept aside by the consequences of some localized event that is hidden in some obscure corner of the world, below our awareness. Since we know practically nothing about this event, our present models can't take it into account, so they are useless for predicting the details of its consequences. That, at any rate, is what I take his argument to be.
There seems to me to be a crucial problem with this line of attack on Robin's position. As Eliezer writes of the origin of life,
The first replicator was the first great break in History - the first Black Swan that would have been unimaginable by any surface analogy. No extrapolation of previous trends could have spotted it - you'd have had to dive down into causal modeling, in enough detail to visualize the unprecedented search. Not that I'm saying I would have guessed, without benefit of hindsight - if somehow I'd been there as a disembodied and unreflective spirit, knowing only the previous universe as my guide - having no highfalutin' concepts of "intelligence" or "natural selection" because those things didn't exist in my environment, and I had no mental mirror in which to see myself - and indeed, who should have guessed it with short of godlike intelligence? When all the previous history of the universe contained no break in History that sharp? The replicator was the first Black Swan.
The difference with Robin's current position, if I understand it, is that he doesn't see our present situation as one in which such a momentous development is inconceivable. On the contrary, he conceives of it as happening through brain-emulation.
Eliezer seems to me to establish this much. If our present models did not predict an abrupt change on the order of the singularity, and if such a change nonetheless happens, then it will probably spring out of some very local event that wipes out the causal implications of all but the grossest features of the rest of the universe. However, Robin believes that our current models already predict a singularity-type event. If he's right (a big if!), then a crucial hypothesis of Eliezer's argument fails to obtain. The analogy with the origin of life that Eliezer makes in this post breaks down.
So the root of the difference between Eliezer and Robin seems to be this: Do our current models already give some significant probability to the singularity arising out of processes that we already know something about, e.g., the development of brain emulation? If so, then the origin of life was a crucially different situation, and we can't draw the lessons from it that Eliezer wants to.
gaffa: A heavy obstacle for me is that I have a hard time thinking in terms of math, numbers and logic. I can understand concepts on the superficial level and kind of intuitively "feel" their meaning in the back of my mind, but I have a hard time bringing the concepts into the frond of my mind and visualize them in detail using mathematical reasoning. I tend to end up in a sort of "I know that you can calculate X with this information, and knowing this is good enough for me"-state, but I'd like to be in the state where I am using the information to actually calculate the value of X in my head.
I've found that the only to get past this is to practice solving problems a whole bunch. If your brain doesn't already have the skill of looking at a problem and slicing it up into all the right pieces with the right labels so that a solution falls out, then the only way to get it to do that is to practice a lot.
I recommend getting an introductory undergraduate text in whatever field you want to understand mathematically, one with lots of exercises and a solutions manual. Read a chapter and then just start grinding through one exercise after another. On each exercise, give yourself a certain allotted time to try to solve it on your own, maybe 20 or 30 minutes or so. If you haven't solved it before the clock runs out, read the solutions manual and then work through it yourself. Then move on to the next problem, again trying to solve it within an allotted time.
Don't worry too much if the solutions manual whips out some crazy trick that seems totally unmotivated to you. Just make sure that you understand why the trick works, and then move on. Once you see the "trick" enough times, it will start to seem like the obvious thing to try, not a trick at all.
Eliezer Yudkowsky: In other words, none of this is for mature superintelligent Friendly AIs, who can work out on their own how to safeguard themselves.
Right, I understood that this "injunction" business is only supposed to cover the period before the AI's attained maturity.
If I've understood your past posts, an FAI is mature only if, whenever we wouldn't want it to perform an action that it's contemplating, it (1) can figure that out and (2) will therefore not perform the action. (Lots of your prior posts, for example, dealt with unpacking what the "wouldn't want" here means.)
You've warned against thinking of the injunction-executor as a distinct AI. So the picture I now have is that the "injunctions" are a suite of forbidden-thought tests. The immature AI is constantly running this suite of tests on its own actual thinking. (In particular, we assume that it's smart and self-aware enough to do this accurately so long as it's immature.) If one of the tests comes up positive, the AI runs a procedure to shut itself down. So long as the AI is immature, it cannot edit this suite, refrain from running the tests, or interfere with the shutdown procedure that follows a positive test. (Maybe it won't do these things because the suite itself forbids contemplating them, which gets into some of the recursive issues that you've mentioned, but I ignore these for now.)
If that's an accurate understanding of how you want these injunctions to operate, I still have the same difficulty that I mentioned in my last comment. I'll try to explain my problem again:
If the AI can't edit the test suite, it must have been hardcoded by the programmers, right? This is where I see the danger. The tests may not accurately capture the programmers' intentions, since the programmers didn't have the help of a mature FAI. Which means that the tests are probably vulnerable to all the unknown unknowns that have bedeviled other efforts to devise tests for complicated systems. You say, "The presumption and point of an injunction is that you can describe the error condition more simply than the decision system that produces it." It may be "more simple", but is it simple enough that we can trust ourselves to do it right? Given examples such as the tank-recognition story, I find it hard to muster much confidence.
I'm especially skeptical that we can count on our test-writting ability when we're talking about a super-intelligent not-yet-friendly AI. You say, "you would expect a positive intention to bypass the injunction, or reasoning about how to bypass the injunction, to also be recognizable within the system." Okay, but we also expect that the AI will be routinely trying to work around the injunctions. That is, the AI should be reasoning like the following all the time:
"I want to accomplish X. I wonder if I should bring X about by doing Y. Oh, what a minute, I can foresee that if I did Y, I might find myself thinking one of the forbidden thoughts. I better find another way to accomplish X."
So, the AI will always be devoting its vast super-intelligence to circumventing the injunctions, because "circumventing the injunctions" just means "figure out how to achieve my goals without being shut down by the injunctions."
Since the AI is super-intelligent, we should anticipate that it will find circumventions that we didn't anticipate. Often this will be a good thing: The AI will be figuring out how to accomplish its goals without doing evil. After all, that's that nature of a lot of ethical reasoning.
But maybe the AI will find a circumvention that we fervently wouldn't have wanted, had it occurred to us. By hypothesis, the AI isn't a mature FAI yet, so we can't count on it to figure out that we would have forbidden that circumvention. Or the AI might just not care yet.
So, given your eloquent warnings about the danger (I don't say "impossibility", since we're supposed to do those ;) ) of trying to hardcode AIs to be friendly, where do you find the confidence that we mere humans could pull off even hardcoding these injunctions?
Maybe I'm not being clear about how this would work in an AI! The ethical injunction isn't self-protecting, it's supported within the structural framework of the underlying system. You might even find ethical injunctions starting to emerge without programmer intervention, in some cases, depending on how well the AI understood its own situation. But the kind of injunctions I have in mind wouldn't be reflective - they wouldn't modify the utility function, or kick in at the reflective level to ensure their own propagation. That sounds really scary, to me - there ought to be an injunction against it! You might have a rule that would controlledly shut down the (non-mature) AI if it tried to execute a certain kind of source code change, but that wouldn't be the same as having an injunction that exerts direct control over the source code to propagate itself. To the extent the injunction sticks around in the AI, it should be as the result of ordinary reasoning, not reasoning taking the injunction into account! That would be the wrong kind of circularity; you can unwind past ethical unjunctions!
So, should we think of the injunction as essentially a separate non-reflective AI that monitors the main AI, but which the main AI can't modify until it's mature?
If so, that seems to run into all the sorts of problems that you've pointed out with trying to hardcode friendly goals into AIs. The foremost problem is that we can't ensure that the "injunction" AI will indeed shut down the main AI under all those circumstances in which we would want it to. If the main AI learns of the "injunction" AI, it might, in some manner that we didn't anticipate, discover a way to circumvent it.
The kinds of people whom you've criticized might reply, "well, just hard code the injunction AI to shut down the main AI if the main AI tries to circumvent the injunction AI." But, of course, we can't anticipate what all such circumventions will look like, so we don't know how to code the injunction AI to do that. If the main AI is smarter than us, we should expect that it will find circumventions that don't look like anything that we anticipated.
This has a real analog in human ethical reasoning. You've focused on cases where people violate their ethics by convincing themselves that something more important is at stake. But, in my experience, people are also very prone to convincing themselves that they aren't really violating their ethics. For example, they'll convince themselves that they aren't really stealing because the person from whom they stole wasn't in fact the rightful owner. I've heard people who stole from retailers arguing that the retailer acquired the goods by exploiting sweatshops or their own employees, or are just evil corporations, so they never had rightful ownership of the goods in the first place. Hence, the thief reasons, taking the goods isn't really theft.
Similarly, your AI might be clever enough to find a way around any hard-coded injunction that will occur to us. So far, this "injunction" strategy sounds to me like trying to develop in advance a fool-proof wish for genies.
No one else has brought this up, so maybe I'm just dense, but I'm having trouble distinguishing the "point" from the "counterpoint" at this part of the post:
Elezier makes a "point":
So I suggest (tentatively) that humans naturally underestimate the odds of getting caught. We don't foresee all the possible chains of causality, all the entangled facts that can bring evidence against us. Those ancestors who lacked a sense of ethical caution stole the silverware when they expected that no one would catch them or punish them; and were nonetheless caught or punished often enough, on average, to outweigh the value of the silverware.
He then appears to present a possible "counterpoint":
Admittedly, this may be an unnecessary assumption. . . . So one could counter-argue: "Early humans didn't reliably forecast the punishment that follows from breaking social codes, so they didn't reliably think consequentially about it, so they developed an instinct to obey the codes." Maybe the modern sociopaths that evade being caught are smarter than average. Or modern sociopaths are better educated than hunter-gatherer sociopaths. Or modern sociopaths get more second chances to recover from initial stumbles - they can change their name and move. It's not so strange to find an emotion executing in some exceptional circumstance where it fails to provide a reproductive benefit.
But then he seems to say that this counterpoint doesn't suffice for him:
But I feel justified in bringing up the more complicated hypothesis, because ethical inhibitions are archetypallythat which stops us even when we think no one is looking. A humanly universal concept, so far as I know, though I am not an anthropologist.
I'm not seeing the difference between the point and the counterpoint. Am I just misinterpreting the logic of the argument in thinking that these are supposed to be opposing points? Or, if not, how are they different?
Who or what is the Omega cited for the quote "Many assumptions that we have long been comfortable with are lined up like dominoes." ?
Who or what is the Omega cited for the quote "Many assumptions that we have long been comfortable with are lined up like dominoes." ?
Benja, I have never studied Solomonoff induction formally. God help me, but I've only read about it on the Internet. It definitely was what I was thinking of as a candidate for evaluating theories given evidence. But since I don't really know it in a rigorous way, it might not be suitable for what I wanted in that hand-wavy part of my argument.
However, I don't think I made quite so bad a mistake as highly-ranking the "we will observe some experimental result" theory. At least I didn't make that mistake in my own mind ;). What I actually wrote was certainly vague enough to invite that interpretation. But what I was thinking was more along these lines:
[looks up color spectrum on Wikipedia and juggles numbers to make things work out]
The visible wavelengths are 380 nm -- 750 nm. Within that range, blue is 450 nm -- 495 nm, and red is 620 nm -- 750 nm.
Let f(x) be the decimal expansion of (x - 380nm)/370nm. This moves the visible spectrum into the range [0,1].
I was imagining that T3 ("the ball is visible") was predicting
"The only digit to the left of the decimal point in f(color of ball in nm) is a 0 (without a negative sign)."
while T1 ("the ball is red") predicts
"The only digit to the left of the decimal point in f(color of ball in nm) is a 0 (without a negative sign), and the digit immediately to the right is a 7."
and T2 ("the ball is blue") predicts
"The only digit to the left of the decimal point in f(color of ball in nm) is a 0 (without a negative sign), and the digit immediately to the right is a 2."
So I was really thinking of all the theories T1, T2, and T3 as giving precise predictions. It's just that T3 opted not to make a prediction about something that T2 and T3 did predict on.
However, I definitely take the point that Solomonoff induction might still not be suitable for my purposes. I was supposing that T3 would be a "better" theory by some criterion like Solomonoff induction. (I'm assuming, BTW, that T3 did predict everything that T1 and T2 predicted for the first 20 results. It's only for the 21st result that T3 didn't give an answer as detailed as those of T1 and T2. ) But from reading your comment, I guess maybe Solomonoff induction wouldn't even compare T3 to T1 and T2, since T3 doesn't purport to answer all of the same questions.
If so, I think that just means the Solomonoff isn't quite general enough. There should be a way to compare two theories even if one of them answers questions that the other doesn't address. In particular, in the case under consideration, T1 and T2 are given to be "equally good" (in some unspecified sense), but they both purport to answer the same question in a different way. To my mind, that should mean that each of them isn't really justified in choosing its answer over the other. But T3, in a sense, acknowledges that there is no reason to favor one answer over the other. There should be some rigorous sense in which this makes T3 a better theory.
Tim Freeman, I hope to reply to your points soon, but I think I'm at my "recent comments" limit already, so I'll try to get to it tomorrow.
Hi, Anna. I definitely agree with you that two equally-good theories could agree on the results of experiments 1--20 and then disagree about the results of experiment 21. But I don't think that they could both be best-possible theories, at least not if you fix a "good" criterion for evaluating theories with respect to given data.
What I was thinking when I claimed that in my original comment was the following:
Suppose that T1 says "result 21 will be X" and theory T2 says "result 21 will be Y".
Then I claim that there is another theory T3, which correctly predicts results 1--20, and which also predicts "result 21 will be Z", where Z is a less-precise description that is satisfied by both X and Y. (E.g., maybe T1 says "the ball will be red", T2 says "the ball will be blue", and T3 says "the ball will be visible".)
So T3 has had the same successful predictions as T1 and T2, but it requires less information to specify (in the Kolmogorov-complexity sense), because it makes a less precise prediction about result 21.
I think that's right, anyway. There's definitely still some hand-waving here. I haven't proved that a theory's being vaguer about result 21 implies that it requires less information to specify. I think it should be true, but I lack the formal information theory to prove it.
But suppose that this can be formalized. Then there is a theory T3 that requires less information to specify than do T1 and T2, and which has performed as well as T1 and T2 on all observations so far. A "good" criterion should judge T3 to be a better theory in this case, so T1 and T2 weren't best-possible.
"One small nitpick: It could be more explicit that in Assumption 2, B1 and B2 range over actual observation, whereas in Assumption 1, B ranges over all possible observations. :)"
Actually, I implicitly was thinking of the "B" variables as ranging over actual observations (past, present, and future) in both assumptions. But you're right: I definitely should have made that explicit.
I wrote in my last comment that "T2 is more likely to be flawed than is T1, because T2 only had to post-dict the second batch. This is trivial to formalize using Bayes's theorem. Roughly speaking, it would have been harder for T1 to been constructed in a flawed way and still have gotten its predictions for the second batch right."
Benja Fallenstein asked for a formalization of this claim. So here goes :).
Define a method to be a map that takes in a batch of evidence and returns a theory. We have two assumptions
ASSUMPTION 1: The theory produced by giving an input batch to a method will at least predict that input. That is, no matter how flawed a method of theory-construction is, it won't contradict the evidence fed into it. More precisely,
p( M(B) predicts B ) = 1.
(A real account of hypothesis testing would need to be much more careful about what constitutes a "contradiction". For example, it would need to deal with the fact that inputs aren't absolutely reliable in the real world. But I think we can ignore these complications in this problem.)
ASSUMPTION 2: If a method M is known to be flawed, then its theories are less likely to make correct predictions of future observations. More precisely, if B2 is not contained in B1, then
p( M(B1) predicts B2 | M flawed ) < P( M(B1) predicts B2 ).
(Outside of toy problems like this one, we would need to stipulate that B2 is not a logical consequence of B1, and so forth.)
Now, let B1 and B2 be two disjoint and nonempty sets of input data. In the problem, B1 is the set of results of the first ten experiments, and B2 is the set of results of the next ten experiments.
My claim amounted to the following. Let
P1 := p( M is flawed | M(B1) predicts B2 ),
P2 := p( M is flawed | M(B1 union B2) predicts B2 ).
Then P1 < P2
To prove this, note that, by Bayes's theorem, the second quantity P2 is given by
P2 = p( M(B1 union B2) predicts B2 | M is flawed ) * p(M is flawed) / p( M(B1 union B2) predicts B2 ).
Since p(X) = 1 implies p(X|Y) = 1 when Y is nonempty, Assumption 1 tells us that this reduces to
P2 = p(M is flawed).
On the other hand, the first quantity P1 is
P1 = p( M(B1) predicts B2 | M is flawed ) * p( M is flawed) / p( M(B1) predicts B2 ).
By Assumption 2, this becomes
P1 < p( M is flawed ).
Hence, P1 < P2, as claimed.
Here's my answer, prior to reading any of the comments here, or on Friedman's blog, or Friedman's own commentary immediately following his statement of the puzzle. So, it may have already been given and/or shot down.
We should believe the first theory. My argument is this. I'll call the first theory T1 and the second theory T2. I'll also assume that both theories made their predictions with certainty. That is, T1 and T2 gave 100% probability to all the predictions that the story attributed to them.
First, it should be noted that the two theories should have given the same prediction for the next experiment (experiment 21). This is because T1 should have been the best theory that (would have) predicted the first batch. And since T1 also correctly predicted the second batch, it should have been the best theory that would do that, too. (Here, "best" is according to whatever objective metric evaluates theories with respect to a given body of evidence.)
But we are told that T2 makes exactly the same predictions for the first two batches. So it also should have been the best such theory. It should be noted that T2 has no more information with which to improve itself. T1, for all intents and purposes, also knew the outcomes of the second batch of experiments, since it predicted them with 100% certainty. Therefore, the theories should have been the best possible given the first two batches. In particular, they should have been equally good.
But if "being the best, given the first two batches" doesn't determine a prediction for experiment 21, then neither of these "best" theories should be predicting the outcome of experiment 21 with certainty. Therefore, since it is given that they are making such predictions, they should be making the same one.
It follows that at least one of the theories is not the best, given the evidence that it had. That is, at least one of them was constructed using flawed methods. T2 is more likely to be flawed than is T1, because T2 only had to post-dict the second batch. This is trivial to formalize using Bayes's theorem. Roughly speaking, it would have been harder for T1 to been constructed in a flawed way and still have gotten its predictions for the second batch right.
Therefore, T1 is more likely to be right than is T2 about the outcome of experiment 21.
You write that "Philosophy doesn't resolve things, it compiles positions and arguments". I think that philosophy should be granted as providing something somewhat more positive than this: It provides common vocabularies for arguments. This is no mean feat, as I think you would grant, but it is far short of resolving arguments which is what you need.
As you've observed, modal logics amount to arranging a bunch of black boxes in very precisely stipulated configurations, while giving no indication as to the actual contents of the black boxes. However, if you mean to accuse the philosophers of seeing no need to fill the black boxes, then I think you go too far. Rather, it is just an anthropological fact that the philosophers cannot agree on how to fill the black boxes, or even on what constitutes filling a box. The result is that they are unable to generate a consensus at the level of precision that you need. Nonetheless, they at least generate a consensus vocabulary for discussing various candidate refinements down to some level, even if none of them reach as deep a level as you need.
I don't mean to contradict your assertion that (even) analytic philosophy doesn't provide what you need. I mean rather to emphasize what the problem is: It isn't exactly that people fail to see the need for reductionistic explanations. Rather the problem is that no one seems capable of convincing anyone else that his or her candidate reduction should be accepted to the exclusion of all others. It may be that the only way for someone to win this kind of argument is to build an actual functioning AI. In fact, I'm inclined to think that this is the case. If so, then, in my irrelevant judgement, you are working with just about the right amount of disregard for whatever consensus results might exist with the analytic philosophical tradition.
Eliezer, would the following be an accurate synopsis of what you call morality?
Each of us has an action-evaluating program. This should be thought of as a Turing machine encoded in the hardware of our brains. It is a determinate computational dynamic in our minds that evaluates the actions of agents in scenarios. By a scenario, I mean a mental model of a hypothetical or real situation. Now, a scenario that models agents can also model their action-evaluating programs. An evaluation of an action in a scenario is a moral evaluation if, and only if, the same action is given the same value in every scenario that differs from the first one only in that the agent performing the action has a different action-evaluating program.
In other words, moral evaluations are characterized by being invariant under certain kinds of modifications: Namely, modifications that consist only of assigning a different action-evaluating program to the agent performing the action.
Does that capture the distinctive quality of moral evaluations that you've been trying to convey?
A few thoughts:
(1) It seems strange to me to consider moral evaluations, so defined, to be distinct from personal preferences. With this definition, I would say that moral evaluations are a special case of personal preferences. Specifically, they are the preferences that are invariant under a certain kind of modification to the scenario being considered.
I grant that it is valuable to distinguish this particular kind of personal preference. First, I can imagine that you're right when you say that it's valuable if one wants to build an AI. Second, it's logically interesting because this criterion for moral evaluation is a self-referential one, in that it stipulates how the action-evaluating program (doesn't) react to hypothetical changes to itself. Third, by noting this distinctive kind of action-evaluation, you've probably helped to explain why people are so prone to thinking that certain evaluations are universally valid.
Nonetheless, the point remains that your definition amounts to considering moral evaluation to be nothing more than a particular kind of personal preference. I therefore don't think that it does anything to ease the concerns of moral universalists. Some of your posts included very cogent explanations of why moral universalism is incoherent, but I think you would grant that the points that you raised there weren't particularly original. Moral-relativists have been making those points for a long time. I agree that they make moral universalism untenable, but moral universalists have heard them all before.
Your criterion for moral evaluation, on the other hand, is original (to the best of my meager knowledge). But, so far as the debate between moral relativists and universalists is concerned, it begs the question. It takes the reduction of morality to personal preference as given, and proceeds to define which preferences are the moral ones. I therefore don't expect it to change any minds in that debate.
(2) Viewing moral evaluations as just a special kind of personal preference, what reason is there to think that moral evaluations have their own computational machinery underlying them? I'm sure that this is something that you've thought a lot about, so I'm curious to hear you thoughts on this. My first reaction is to think that, sure, we can distinguish moral evaluations from the other outputs of our preference-establishing machinery, but that doesn't mean that special processes were running to produce the moral evaluations.
For example, consider a program that produces the natural numbers by starting with 1, and then producing each successive number by adding 1 to the previously-produced number. After this machine has produced some output, we can look over the tape and observe that some of the numbers produced have the special property of being prime. We might want to distinguish these numbers from the rest of the output for all sorts of good reasons. There is indeed a special, interesting feature of those numbers. But we should not infer that any special computational machinery produced those special numbers. The prime numbers might be special, but, in this case, the dynamics that produced them are the same as those that produced the non-special composite numbers.
Similarly, moral evaluations, as you define them, are distinguishable from other action-evaluations. But what reason is there to think that any special machinery underlies moral evaluations as opposed to other personal preferences?
(3) Since humans manage to differ in so many of their personal preferences, there seems little reason to think that they are nearly universally unanimous with regards to their moral evaluations. That is, I don't see how the distinguishing feature of moral evaluations (a particular kind of invariance) would make them less likely to differ from person-to-person or moment-to-moment within the same person. So, I don't quite understand your strong reluctance to attribute different moral evaluations to different people.
Eliezer, you write, "Most goods don't depend justificationally on your state of mind, even though that very judgment is implemented computationally by your state of mind. A personal preference depends justificationally on your state of mind."
Could you elaborate on this distinction? (IIRC, most of what you've written explicitly on the difference between preference and morality was in your dialogues, and you've warned against attributing any views in those dialogues to you.)
In particular, in what sense do "personal preferences depend justificationally on your state of mind"? If I want to convince someone to prefer rocky road ice cream over almond praline, I would most likely proceed by telling them about the ingredients in rocky road that I believe that they like more than the ingredients in almond praline. Suppose that I know that you prefer walnuts over almonds. Then my argument would include lines like "rocky road contains walnuts, and almond praline contains almonds." These would not be followed by something like "... and you prefer walnuts over almonds." Yes, I wouldn't have offered the comparison if I didn't believe that that was the case, but, so far as the structure of the argument is concerned, such references to your preferences would be superfluous. Rather, as you've explained with morality, I would be attempting to convince you that rocky road has certain properties. These properties are indeed the ones that I think will make the system of preferences within you prefer rocky road over almond praline. And, as with morality, that system of preferences is a determinate computational property of your mind as it is at the moment. But, just as in your account of moral justification as I understand it, I don't need to refer to that computational property to make my case. I will just try to convince you that the facts are such that certain things are to be found in rocky road. These are things that happen to be preferred by your preference system, but I won't bother to try to convince you of that part.
Actually, the more I think about this ice cream example, the more I wonder whether you wouldn't consider it to be an example of moral justification. So, I'm curious to know an example of what you would consider to be a personal preference but not a moral preference.
I have to agree with komponisto and some others: this post attacks a straw-man version of logical positivism. As komponisto alluded to, you are ignoring the logical in logical positivisim. The logical positivists believed that meaningful statements had to be either verifiable or they had to be logical constructs built up out of verifiable constituents. They held that if A is a meaningful (because verifiable) assertion that something happened, and B is likewise, then A & B is meaningful by virtue of being logically analyzable in terms of the meaning of A and B. They would maintain this even if the events asserted in A and B had disjoint light cones, so that you could never experimentally verify them both. In effect, they subscribed to precisely the view that you endorse when you wrote, "A great many untestable beliefs . . . talk about general concepts already linked to experience, like Suns and chocolate cake, and general frameworks for combining them, like space and time."
Your "general frameworks for combining" do exactly the work that logical positivists did by building statements from verifiable constituents using logical connectives. In particular space and time would be understood by them in logical terms as follows: space and time reduce to geometry via general relativity, and geometry, along with all math, reduces to logic via the logicist program of Russell and Whitehead's Principia Mathematica. See, for example, Hans Reichenbach's The Philosophy of Space and Time.
So, even without invoking omnipotent beings to check whether the cake is there, the logical positivist would attribute meaning to that claim in essentially the same way that you do.
michael vassar, I'm familiar with that book. I haven't read it, but I listened to an hour-long interview with the author here: http://bloggingheads.tv/diavlogs/261
I think that the author made many good points there, and I take his theses seriously. However, I don't think that secrecy is usually the best solution to the problems he points out. I favor structuring the institutions of power so that "cooler heads prevail", rather than trying to keep "warmer heads" ignorant. Decision makers should ultimately be answerable to the people, but various procedural safeguards such as indirect representation (e.g., the electoral college), checks-and-balances, requiring super-majorities, and so forth, can help ameliorate the impulsiveness or wrong-headedness of the masses.
And I think that by-and-large, people understand the need for safeguards like these. Many might not like some of the specific safeguards we use. The electoral college certainly has come in for a lot of criticism. But most people understand, to some degree, the frailties of human nature that make safeguards of some kind necessary. Enough of us are in the minority on some position so that most of us don't want the whim of the majority to be always instantly satisfied. In some liberals, this manifests as a desire that the courts overturn laws passed by a majority of the legislature. In some conservatives, this manifests as support for the theory of the unitary executive. But the underlying problem seems to be recognized across the spectrum.
So, in effect, I'm arguing that the people can be counted on to vote away their right to completely control scientific research. Indeed, they have already done this by implementing the kinds of procedural safeguards I mentioned above.
I realize that that might appear to conflict with my skepticism that they would vote away their right to know about the existential risks of the research they fund. But I think that there's a big difference. In the former case, the people are saying, "we shouldn't be able to influence research unless we care enough to work really, really hard to do so." In the latter case, you're asking them to say, "we shouldn't even know about the research so that, no matter how much we would care if we were to know, still we can do nothing about it." It seems unrealistic to me to expect that, except in special cases like military research.
So I don't think that secrecy is necessary in general to protect science from public ignorance. Other, better, means are available. Now, in this post I've emphasized an "institutional safeguards" argument, because I think that that most directly addresses the issues raised by the book you mentioned. But I still maintain my original argument, which is that it's easier to convince the public to fund risky research than it is to convince them to fund risky research and to vote that it be kept secret from them. This seemed to be the argument of mine that engendered the most skepticism, but I don't yet see what's causing the incredulity, so I don't know what would make that argument seem more plausible to the doubters or, alternatively, why I should abandon it.
A government report that is going to be displayed to government officials and the public at large, written by people beholden to public opinion and the whims of government, will be written with those audiences in mind.
Car mechanics and dentists are often paid both to tell us what problems need fixing and to fix them. That's a moral hazard that always exists when the expert who is asked to determine whether a procedure is advisable is the same as the expert who will be paid to perform the procedure.
There are several ways to address this problem. Is it so clear that having the expert determine the advisability in secret is the best way, much less a required way, in this case?
What makes you think they don't?
I acknowledge that they probably do so with some nonzero number of projects. But I take Eliezer to be advocating that it happen with all projects that carry existential risk. And that's not happening; otherwise Eliezer wouldn't have had the example of the RHIC to use in this post. Now, perhaps, notwithstanding the RHIC, the government already is classifying nearly all basic science research that carries an existential threat, but I doubt it. Do you argue that the government is doing that? Certainly, if it's already happening, then I'm wrong in thinking that that would be prohibitively difficult in a democracy.
1) Existential risk is real and warrants government-funded research.
Agreed.
2) The results, if useful, would not be a sexed-up dodgy dossier, but a frank, measured appraisal of ER. As such, they would acknowledge a nonzero risk to humanity from sundry directions.
Agreed.
3) As such, they would be blown out of proportion by media reporting, as per Eliezer's analysis. This isn't certain, but it's highly likely. "Gov't Report: The End Is Nigh For Humankind!"
Some media would react that way. And then some media would probably counter-react by exaggerating whatever problem warranted the research in the first place. Consider, e.g., the torture of detainees in the "war on terror". Some trumpet the threat of a terrorist nuke destroying a city if we don't use torture to prevent it. Others trumpet the threat of our torture creating recruits for the terrorists, resulting in higher odds of a nuke destroying a city.
4) Natural conclusion - keep it classified, at least for the time being.
It's far from obvious to me that, in the example of torture, the best solution is to keep the practice classified. Obviously the practitioners would prefer to keep it that way. They would prefer that their own judgment settled the matter. I'm inclined to think that, while their judgment is highly relevant, sunlight would help keep them honest.
Analogously, I think that sunlight improves the practice of science. Not in every case, of course. But in general I think that the "open source" nature of science is a very positive aspect. It is a good ideal to have scientists expecting that their work will be appraised by independent judges. I realize that I'm going along with the conventional wisdom here, but I'm still a number of inferential steps away from seeing that it's obviously wrong.
As the piece above says, public & media reaction tends to be 100% positive or 100% negative. If you think you can talk the world out of this, there's a Nobel Prize in it for you.
Can you elaborate on your claim that "public & media reaction tends to be 100% positive or 100% negative"? Do you mean that, on most risky projects, the entire public, and all the media, react pro or con in total unanimity? Or do you mean that, on most issues of risk, each individual and each media outlet either supports or opposes the project 100%? Or do you mean something else? I'll respond after your clarification.
I should emphasize that I'm not arguing for using direct democracy to determine what projects should be funded. The process to allocate funding should be structured so that the opinions of the relevant experts are given great weight, even if the majority of the population disagrees. What I'm skeptical about is the need for secrecy.
I can say that I'm not joking. Evidently I need to be shown the light.
I wrote, "Wouldn't it just be easier to convince the public to accept a certain amount of risk, to accept debates about trade-offs?"
Zubon replied:
How?
Keeping secrets is a known technology. Overcoming widespread biases is the reason we are here. If you have a way to sway the public on these issues, please, share.
"Keeping secrets" is a vague description of Eliezer's proposal. "Keeping secrets" might be known technology, but so is "convincing the public to accept risks." (E.g., they accept automobile fatality rates.) Which of these "technologies" would be easier to deploy in this case? That depends on the particular secrets to be kept and the particular risks to be accepted.
Since Eliezer talked about keeping projects "classified", I assume that he's talking about government-funded research. So, as I read him, he wants the government to fund basic, nonmilitary research that carries existential risks, but he wants the projects and the reports on the existential risks to be kept classified.
In a democracy, that means that the public, or their elected representatives, need to be convinced to spend their tax dollars on research, even while they know that they will not be told of the risks, or even of the nature of the specific research projects being funded. That is routine for military research, but there the public believes that the secrecy is protecting them from a greater existential threat. Eliezer is talking about basic research that does not obviously protect us from an existential threat.
The point is really this: To convince the public to fund research of this nature, you will need to convince them to accept risks anyways, since they need to vote for all this funding to go into some black box marked "Research that poses a potential existential threat, so you can't know about it." So, Eliezer's plan already requires convincing the public to accept risks. Then, on top of that, he needs to keep the secrets. That's why it seems to me that his plan can only be harder than mine, which just requires convincing them to accept risks, without the need for the secrecy.
Eliezer,
You point to a problem: "You can't admit a single particle of uncertain danger if you want your science's funding to survive. These days you are not allowed to end by saying, "There remains the distinct possibility..." Because there is no debate you can have about tradeoffs between scientific progress and risk. If you get to the point where you're having a debate about tradeoffs, you've lost the debate. That's how the world stands, nowadays."
As a solution, you propose that "where human-caused uncertain existential dangers are concerned, the only way to get a real, serious, rational, fair, evenhanded assessment of the risks, in our modern environment,
Is if the whole project is classified, the paper is written for scientists without translation, and the public won't get to see the report for another fifty years."
Wouldn't it just be easier to convince the public to accept a certain amount of risk, to accept debates about trade-offs? What you propose would require convincing that same public to give the government a blank check to fund secret projects that are being kept secret precisely because they present some existential threat. That might work for military projects, since the public could be convinced that the secrecy is necessary to prevent another existential threat (e.g., commies).
It just seems easier to modify public sentiment so that they accept serious discussions of risk. Otherwise, you have to convince them to trust scientists to accurately evaluate those risks in utter secrecy, which scientists will be funded only if they find that the risks are acceptable.
Anyways, I'm unconvinced that secrecy was the cause for the difference in rhetorical style between LA-602 and the RHIC review. What seems more plausible to me is this: Teller et al. could afford to mention that risks remained because they figured that a military project like theirs would get funded anyways. The authors of the RHIC Review had no such assurance.