hopefox

Posts
Comments

Posts

How not to move the goalposts 2011-06-12T15:45:44.127Z

Comments

Comment by HopeFox on Welcome to Less Wrong! · 2011-06-12T16:04:15.504Z · LW · GW

I agree, intuition is very difficult here. In this specific scenario, I'd lean towards saying yes - it's the same person with a physically different body and brain, so I'd like to think that there is some continuity of the "person" in that situation. My brain isn't made of the "same atoms" it was when I was born, after all. So I'd say yes. In fact, in practice, I would definitely assume said robot and software to have moral value, even if I wasn't 100% sure.

However, if the original brain and body weren't destroyed, and we now had two apparently identical individuals claiming to be people worthy of moral respect, then I'd be more dubious. I'd be extremely dubious of creating twenty robots running identical software (which seems entirely possible with the technology we're supposing) and assigning them the moral status of twenty people. "People", of the sort deserving of rights and dignity and so forth, shouldn't be the sort of thing that can be arbitrarily created through a mechanical process. (And yes, human reproduction and growth is a mechanical process, so there's a problem there too.)

Actually, come to think of it... if you have two copies of software (either electronic or neuron-based) running on two separate machines, but it's the same software, could they be considered the same person? After all, they'll make all the same decisions given similar stimuli, and thus are using the same decision process.

Comment by HopeFox on Welcome to Less Wrong! · 2011-06-12T13:42:47.742Z · LW · GW

What issues does your best atheist theory have?

My biggest problem right now is all the stuff about zombies, and how that implies that, in the absence of some kind of soul, a computer program or other entity that is capable of the same reasoning processes as a person, is morally equivalent to a person. I agree with every step of the logic (I think, it's been a while since I last read the sequence), but I end up applying it in the other direction. I don't think a computer program can have any moral value, therefore, without the presence of a soul, people also have no moral value. Therefore I either accept a lack of moral value to humanity (both distasteful and unlikely), or accept the presence of something, let's call it a soul, that makes people worthwhile (also unlikely). I'm leaning towards the latter, both as the less unlikely, and the one that produces the most harmonious behaviour from me.

It's a work in progress. I've been considering the possibility that there is exactly one soul in the universe (since there's no reason to consider souls to propagate along the time axis of spacetime in any classical sense), but that's a low-probability hypothesis for now.

Comment by HopeFox on Welcome to Less Wrong! · 2011-06-12T13:23:13.792Z · LW · GW

It's about how, if you're attacking somebody's argument, you should attack all of the bad points of it simultaneously, so that it doesn't look like you're attacking one and implicitly accepting the others. With any luck, it'll be up tonight.

Comment by HopeFox on Welcome to Less Wrong! · 2011-06-12T12:11:24.153Z · LW · GW

Hi, I've been lurking on Less Wrong for a few months now, making a few comments here and there, but never got around to introducing myself. Since I'm planning out an actual post at the moment, I figured I should tell people where I'm coming from.

I'm a male 30-year-old optical engineer in Sydney, Australia. I grew up in a very scientific family and have pretty much always assumed I had a scientific career ahead of me, and after a couple of false starts, it's happened and I couldn't ask for a better job.

Like many people, I came to Less Wrong from TVTropes via Methods of Rationality. Since I started reading, I've found that it's been quite helpful in organising my own thoughts and casting aside unuseful arguments, and examining aspects of my life and beliefs that don't stand up under scrutiny.

In particular, I've found that reading Less Wrong has allowed, nay forced, me to examine the logical consistency of everything I say, write, hear and read, which allows me to be a lot more efficient in dicussions, both by policing my own speech and being more usefully critical of others' points (rather than making arguments that don't go anywhere).

While I was raised in a substantively atheist household, my current beliefs are theist. The precise nature of these beliefs has shifted somewhat since I started reading Less Wrong, as I've discarded the parts that are inconsistent or even less likely than the others. There are still difficulties with my current model, but they're smaller than the issues I have with my best atheist theory.

I've also had a surprising amount of success in introducing the logical and rationalist concepts from Less Wrong to one of my girlfriends, which is all the more impressive considering her dyscalculia. I'm really pleased that that this site has given me the tools to do that. It's really easy now to short-circuit what might otherwise become an argument by showing that it's merely a dispute about definitions. It's this sort of success that has kept me reading the site these past months, and I hope I can contribute to that success for other people.

Comment by HopeFox on Exterminating life is rational · 2011-05-29T13:02:53.696Z · LW · GW

Assuming rational agents with a reasonable level of altruism (by which I mean, incorporating the needs of other people and future generations into their own utility functions, to a similar degree to what we consider "decent people" to do today)...

If such a person figures that getting rid of the Nazis or the Daleks or whoever the threat of the day is, is worth a tiny risk of bringing about the end of the world, and their reasoning is completely rational and valid and altrustic (I won't say "unselfish" for reasons discussed elsewhere in this thread) and far-sighted (not discounting future generations too much)...

... then they're right, aren't they?

If the guys behind the Trinity test weighed the negative utility of the Axis taking over the world, presumably with the end result of boots stamping on human faces forever, and determined that the 3/1,000,000 chance of ending all human life was worth preventing this future from coming to pass, then couldn't Queen Victoria perform the same calculations, and conclude "Good heavens. Nazis, you say? Spreading their horrible fascism in my empire? Never! I do hope those plucky Americans manage to build their bomb in time. Tiny chance of destroying the world? Better they take that risk than let fascism rule the world, I say!"

If the utility calculations performed regarding the Trinity test were rational, altrustic and reasonably far-sighted, then they would have been equally valid if performed at any other time in history. If we apply a future discounting factor of e^-kt, then that factor would apply equally to all elements in the utility calculation. If the net utility of the test were positive in 1945, then it should have been positive at all points in history before then. If President Truman (rationally, altrustically, far-sightedly) approved of the test, then so should Queen Victoria, Julius Caesar and Hammurabi have, given sufficient information. Either the utility calculations for the test were right, or they weren't.

If they were right, then the problem stops being "Oh no, future generations are going to destroy the world even if they're sensible and altruistic!", and starts being "Oh no, a horrible regime might take over the world! Let's hope someone creates a superweapon to stop them, and damn the risk!"

If they were wrong, then the assumption that the ones performing the calculation were rational, altrustic and far-sighted is wrong. Taking these one by one:

1) The world might be destroyed by someone making an irrational decision. No surprises there. All we can do is strive to raise the general level of rationality in the world, at least among people with the power to destroy the world.

2) The world might be destroyed by someone with only his own interests at heart. So basically we might get stuck with Dr Evil. We can't do a lot about that either.

3) The world might be destroyed by someone acting rationally and altrustically for his own generation, but who discounts future generations too much (i.e. his value of k in the discounting factor is much larger than ours). This seems to be the crux of the problem. What is the "proper" value of k? It should probably depend on how much longer humans are going to be around, for reasons unrelated to the question at hand. If the world really is going to end in 2012, then every dollar spent on preventing global warming should have been spent on alleviating short-term suffering all over the world, and the proper value for k is very large. If we really are going to be here for millions of years, then we should be exceptionally careful with every resource (both material and negentropy-based) we consume, and k should be very small. Without this knowledge, of course, it's very difficult to determine what k should be.

That may be the way to avoid a well-meaning scientist wiping out all human life - find out how much longer we have as a species, and then campaign that everyone should live their lives accordingly. Then, the only existential risks that would be implemented are the ones that are actually, seriously, truly, incontrovertibly, provably worth it.

Comment by HopeFox on Exterminating life is rational · 2011-05-29T11:47:37.778Z · LW · GW

Thinking about this in commonsense terms is misleading, because we can't imagine the difference between 8x utility and 16x utility

I can't even imagine doubling my utility once, if we're only talking about selfish preferences. If I understand vNM utility correctly, then a doubling of my personal utility is a situation which I'd be willing to accept a 50% chance of death in order to achieve (assuming that my utility is scaled so that U(dead) = 0, and without setting a constant level, we can't talk about doubling utility). Given my life at the moment (apartment with mortgage, two chronically ill girlfriends, decent job with unpleasantly long commute, moderate physical and mental health), and thinking about the best possible life I could have (volcano lair, catgirls), I wouldn't be willing to take that bet. Intuition has already failed me on this one. If Omega can really deliver on his promise, then either he's offering a lifestyle literally beyond my wildest dreams, or he's letting me include my preferences for other people in my utility function, in which case I'll probably have cured cancer by the tenth draw or so, and I'll run into the same breakdown of intuition after about seventy draws, by which time everyone else in the world should have their own volcano lairs and catgirls.

With the problem as stated, any finite number of draws is the rational choice, because the proposed utility of N draws outweighs the risk of death, no matter how high N is. The probability of death is always less than 1 for a finite number of draws. I don't think that considering the limit as N approaches infinity is valid, because every time you have to decide whether or not to draw a card, you've only drawn a finite number of cards so far. Certainty of death also occurs in the same limit as infinite utility, and infinite utility has its own problems, as discussed elsewhere in this thread. It might also leave you open to Pascal's Scam - give me $5 and I'll give you infinite utility!

But we have a mathematical theory about rationality. Just apply that, and you find the results seem unsatisfactory.

I agree - to keep drawing until you draw a skull seems wrong. However, to say that something "seems unsatisfactory" is a statement of intuition, not mathematics. Our intuition can't weigh the value of exponentially increasing utility against the cost of an exponentionally diminishing chance of survival, so it's no wonder that the mathematically derived answer doesn't sit well with intuition.

Comment by HopeFox on Exterminating life is rational · 2011-05-17T10:47:38.643Z · LW · GW

"Every time you draw a card with a star, I'll double your utility for the rest of your life. If you draw a card with a skull, I'll kill you."

Sorry if this question has already been answered (I've read the comments but probably didn't catch all of it), but...

I have a problem with "double your utility for the rest of your life". Are we talking about utilons per second? Or do you mean "double the utility of your life", or just "double your utility"? How does dying a couple of minutes later affect your utility? Do you get the entire (now doubled) utility for those few minutes? Do you get pro rata utility for those few minutes divided by your expected lifespan?

Related to this is the question of the utility penalty of dying. If your utility function includes benefits for other people, then your best bet is to draw cards until you die, because the benefits to the rest of the universe will massively outweigh the inevitability of your death.

If, on the other hand, death sets your utility to zero (presumably because your utility function is strictly only a function of your own experiences), then... yeah. If Omega really can double your utility every time you win, then I guess you keep drawing until you die. It's an absurd (but mathematically plausible) situation, so the absurd (but mathematically plausible) answer is correct. I guess.

Comment by HopeFox on Newcomb's problem happened to me · 2011-05-14T23:31:32.813Z · LW · GW

Perfect decision-makers, with perfect information, should always be able to take the optimal outcome in any situation. Likewise, perfect decision-makers with limited information should always be able to choose the outcome with the best expected payoff under strict Bayesian reasoning.

However, when the actor's decision-making process becomes part of the situation under consideration, as happens when Katemega scrutinises Joe's potential for leaving her in the future, then the perfect decision-maker is only able to choose the optimal outcome if he is also capable of perfect self-modification. Without that ability, he's vulnerable to his own choices and preferences changing in the future, which he can't control right now.

I'd also like to draw a distinction between a practical pre-commitment (of the form "leaving this marriage will cause me -X utilons due to financial penalty or cognitive dissonance for breaking my vows"), and an actual self-modification to a mind state where "I promised I would never leave Kate, but I'm going to do it anyway now" is not actually an option. I don't think humans are capable of the latter. An AI might be, I don't know.

Also, what about decisions Joe made in the past (for example, deciding when he was eighteen that there was no way he was ever going to get married, because being single was too much fun)? If you want your present state to influence your future state strongly, you have to accept the influence of your past state on your present state just as strongly, and you can't just say "Oh, but I'm older and wiser now" in one instance but not the other.

Without the ability to self-modify into a truly sincere state wherein he'll never leave Kate no matter what, Joe can't be completely sincere, and (by the assumptions of the problem) Kate will sense this and his chances of his proposal being accepted will diminish. And there's nothing he can do about that.

Comment by HopeFox on Newcomb's problem happened to me · 2011-05-13T14:16:19.679Z · LW · GW

It's an interesting situation, and I can see the parallel to Newcombe's Problem. I'm not certain that it's possible for a person to self-modify to the extent that he will never leave his wife, ever, regardless of the very real (if small) doubts he has about the relationship right now. I don't think I could ever simultaneously sustain the thoughts "There's about a 10% chance that my marriage to my wife will make me very unhappy" and "I will never leave her no matter what". I could make the commitment financially - that, even if the marriage turns awful, I will still provide the same financial support to her - but not emotionally. If Joe can modify his own code so that he can do that, that's very good of him, but I don't think many people could do it, not without pre-commitment in the form of a marital contract with large penalties for divorce, or at least a very strong mentality that once the vows are said, there's no going back.

Perhaps the problem would be both more realistic and more mathematically tractable if "sincerity" were rated between 0 and 1, rather than being a simple on/off state? If 1 is "till death do us part" and 0 is "until I get a better offer", then 0.9 could be "I won't leave you no matter how bad your cooking gets, but if you ever try to stab me, I'm out of here". Then Kate's probability of accepting the proposal could be a function of sincerity, which seems a much more reasonable position for her.

Could this be an example where rationality and self-awareness really do work against an actor? If Joe were less self-aware, he could propose with complete sincerity, having not thought through the 10% chance that he'll be unhappy. If he does become unhappy, he'd then feel justified in this totally unexpected change inducing him to leave. The thing impeding Joe's ability to propose with full sincerity is his awareness of the possibility of future unhappiness.

Also, it's worth pointing out that, by the formulation of the original problem, Kate expects Joe to stay with her even if she is causing him -125 megautilons of unhappiness by forcing him to stay. That seems just a touch selfish. This is something they should talk about.

Comment by HopeFox on The 5-Second Level · 2011-05-11T13:53:20.430Z · LW · GW

Talking with people that do not agree with you as though they were people. That is taking what they say seriously and trying to understand why they are saying what they say. Asking questions helps. Also, assume that they have reasons that seem rational to them for what they say or do, even if you disagree.

I think this is a very important point. If we can avoid seeing our political enemies as evil mutants, then hopefully we can avoid seeing our conversational opponents as irrational mutants. Even after discounting the possibility that you, personally, might be mistaken in your beliefs or reasoning, don't assume that your opponent is hopelessly irrational. If you find yourself thinking, "How on earth can this person be so wrong!", then change that exclamation mark into a question mark and actually try to answer that question.

If the most likely failure mode in your opponent's thoughts can be traced back to a simple missing fact or one of the more tame biases, then supply the fact or explain the bias, and you might be able to make some headway.

If you trace the fault back to a fundamental belief - by which I mean one that can't be changed over the course of the conversation - then bring the conversation to that level as quickly as possible, point out the true level of your disagreement, and say something to the effect of, "Okay, I see your point, and I understand your reasoning, but I'm afraid we disagree fundamentally on the existence of God / the likelihood of the Singularity / the many-worlds interpretation of quantum mechanics / your support for the Parramatta Eels[1]. If you want to talk about that, I'm totally up for that, but there's no point discussing religion / cryonics / wavefunction collapse / high tackles until we've settled that high-level point."

There are a lot of very clever and otherwise quite rational people out there who have a few... unusual views on certain topics, and discounting them out of hand is cutting yourself off from their wisdom and experience, and denying them the chance to learn from you.

[1] Football isn't a religion. It's much more important than that.

Comment by HopeFox on Verbal Overshadowing and The Art of Rationality · 2011-05-10T11:05:05.071Z · LW · GW

I don't know how to port this strategy over to verbal acuity for rationality.

Perhaps by vocalising simple logic? When you make a simple decision, such as "I'm going to walk to work today instead of catching the bus", go over your logic for the decision, even after you've started walking, as if you're explaining your decision to someone else. I often do this (not out loud, but as a mental conversation), just for something to pass the time, and I find that it actually helps me organise my thoughts and explain my logic to other real people.

Comment by HopeFox on Building Weirdtopia · 2011-05-09T00:53:26.248Z · LW · GW

Sexual Weirdtopia:

The government takes a substantial interest in people's sex lives. People are expected to register their sexual preferences with government agencies. A certain level of sexual education and satisfaction is presumed to be a basic right of humanity, along with health care and enough income to live on. Workers are entitled to five days' annual leave for seeking new or maintaining old romantic and sexual relationships, and if your lover leaves you because you're working too hard, you can sue your employer and are likely to win. Private prostitution is illegal, but the government maintains an agency of sex workers, who can be hired for a fee, or allocated free of charge to adults who apply on the basis of "sexual hardship" (defined as having not had sex in the last six months), and form part of "optional field work" for sex education classes at the appropriate level. There are government funded dating and matchmaking agencies. Also, mandatory registration for Creepy Doms and Terrible Exes.

Creepy and more than a little disturbing? Yes. Arguably better than the standard Sexual Utopia in some respects? Yes, if you'd asked me when I was 18 or even 21. What use is a sexually permissive society when you, personally, aren't getting any?

Comment by HopeFox on The 5-Second Level · 2011-05-09T00:04:56.348Z · LW · GW

I think I've started to do this already for Disputing Definitions, as has my girlfriend, just from listening to me discussing that article without reading it herself. So that's a win for rationality right there.

To take an example that comes up in our household surprisingly often, I'll let the disputed definition be " steampunk ". Statements of the form "X isn't really steampunk!" come up a lot on certain websites, and arguments over what does or doesn't count as steampunk can be pretty vicious. After reading "Disputing Definitions", though, I learnt how to classify those arguments as meaningless, and get to the real question, being "Do I want this thing in my subculture / on my website"? I think the process by which I recognise these questions goes something like this:

1) Make the initial statement. "A hairpin made out of a clock hand isn't steampunk!"

2) Visualise, even briefly, every important element in what I've just said. Visualising a hairpin produces an image of a thing stuck through a woman's hair arrangement. Visualising a clock hand produces a curly, tapered object such as one might see on an antique clock. Visualising "steampunk" produces... no clearly defined mental image.

3) Notice that I am confused. Realise that I've just made a statement about something that I can't properly visualise, something that I don't think I've properly defined in my own brain, so how can I expect anyone else to have a proper definition at all, let alone one that agrees with mine? (Honestly, the fact that I keep writing "steampunk" in quotation marks should have been a clue already.)

4) Correct my mistake. "Hmm, now that I think about it, what I just said didn't actually mean anything. What's the point of this discussion again? Are we arguing about whether or not this picture should be on the website, or whether this person should be going to conventions, or what? If so, let's talk about that specifically. Let's not pretend that "steampunk" exists as a concrete category boundary in the phase space of fashion accessories, okay?"

Now, this process can fall down at step 2 when I, personally, have a very well-defined mental image of what a word means (such as "sound", which I will always take to mean "compression waves of the sort that a human or other animal might detect as auditory input, whether or not a listener is actually present"), but which other people might interpret differently. Here, the trick to step 2 is to imagine my listener's most obvious responses, based on my experience in discussing the topic previously (such as "But there's nobody to hear it, so by definition there's no sound!"). If I can imagine somebody saying this, without also being forced to imagine that the speaker is hopelessly misinformed, mentally deficient, or some other kind of irrational mutant, then what I'm saying must have some defect, and I should re-examine my words.

As for a training exercise, step 2 seems to be the one to train. The "rationalist taboo" technique seems pretty effective here. Discuss a topic with the student, and when they use a word that doesn't seem to mean anything, or means too many things at once, taboo it and get them to restate their point. Encourage the student to visualise everything they say, if only briefly, and explain that anything they can't visualise properly is suspect.

Alternatively, allow the student to get into a couple of disputes over definitions, let them experience firsthand how frustrating it is, then point them to this blog and show them that there's a solution. Their frustration will drive them to adopt a method of implementing the solution in their own discourse. Worked for me!

Comment by HopeFox on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2011-05-08T22:11:34.744Z · LW · GW

A person who can kill another person might well want 5$, for whatever reason. In contrast, a person who can use power from beyond the Matrix to torture 3^^^3 people already has IMMENSE power. Clearly such a person has all the money they want, and even more than that in the influence that money represents. They can probably create the money out of nothing. So already their claims don't make sense if taken at face value.

Ah, my mistake. You're arguing based on the intent of a legitimate mugger, rather than the fakes. Yes, that makes sense. If we let f(N) be the probability that somebody has the power to kill N people on demand, and g(N) be the probability that somebody who has the power to kill N people on demand would threaten to do so if he doesn't get his $5, then it seems highly likely that Nf(N)g(N) approaches zero as N approaches infinity. What's even better news is that, while f(N) may only approach zero slowly for easily constructed values of N like 3^^^^3 and 4^^^^4 because of their low Kolmogorov complexity, g(N) should scale with 1/N or something similar, because the more power someone has, the less likely they are to execute such a miniscule, petty threat. You're also quite right in stating that the more power the mugger has, the more likely it is that they'll reward refusal, punish compliance or otherwise decouple the wording of the threat from their actual intentions, thus making g(N) go to zero even more quickly.

So, yeah, I'm pretty satisfied that Nf(N)g(N) will asymptote to zero, taking all of the above into account.

(In more unrelated news, my boyfriend claims that he'd pay the mugger, on account of him obviously being mentally ill. So that's two out of three in my household. I hope this doesn't catch on.)

Comment by HopeFox on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2011-05-08T12:30:09.529Z · LW · GW

This is a very good point - the higher the number chosen, the more likely it is that the mugger is lying - but I don't think it quite solves the problem.

The probability that a person, out to make some money, will attempt a Pascal's Mugging can be no greater than 1, so let's imagine that it is 1. Every time I step out of my front door, I get mobbed by Pascal's Muggers. My mail box is full of Pascal's Chain Letters. Whenever I go online, I get popups saying "Click this link or 3^^^^3 people will die!". Let's say I get one Pascal-style threat every couple of minutes, so the probability of getting one in any given minute is 0.5.

Then, let the probability of someone genuinely having the ability to kill 3^^^^3 people, and then choosing to threaten me with that, be x per minute - that is, over the course of one minute, there's an x chance that a genuine extra-Matrix being will contact me and make a Pascal Mugging style threat, on which they will actually deliver.

Naturally, x is tiny. But, if I receive a Pascal threat during a particular minute, the probability that it's genuine is x/(0.5+x), or basically 2x. If 2x * 3^^^^3 is at all close to 1, then what can I do but pay up? Like it or not, Pascal muggings would be more common in a world where people can carry out the threat, than in a world where they can't. No amount of analysis of the muggers' psychology can change the prior probability that a genuine threat will be made - it just increases the amount of noise that hides the genuine threat in a sea of opportunistic muggings.

Comment by HopeFox on Nonperson Predicates · 2011-05-01T09:03:21.631Z · LW · GW

This problem sounds awfully similar to the halting problem to me. If we can't tell whether a Turing machine will eventually terminate without actually running it, how could we ever tell if a Turing machine will experience consciousness without running it?

Has anyone attempted to prove the statement "Consciousness of a Turing machine is undecideable"? The proof (if it's true) might look a lot like the proof that the halting problem is undecideable. Sadly, I don't quite understand how that proof works either, so I can't use it as a basis for the consciousness problem. It just seems that figuring out if a Turing machine is conscious, or will ever achieve consciousness before halting, is much harder than figuring out if it halts.

Comment by HopeFox on High Challenge · 2011-04-30T15:46:11.899Z · LW · GW

Do we even need the destination? When you consider "fun" as something that comes from a process, from the journey of approaching a goal, then wouldn't it make sense to disentangle the journey and the goal? We shouldn't need the destination in order to make the journey worthwhile. I mean, if the goal were actually important, then surely we'd just get our AI buddies to implement the goal, while I was off doing fun journey stuff.

For a more concrete example:

I like baking fruitcakes. (Something I don't do nearly often enough these days.) Mixing the raw ingredients is fun, and licking the bowl clean afterwards is always good times.

I also like eating fruitcake. Fruitcake is tasty.

Now, one of the things that induces me to bake a fruitcake rather than, say, play Baldur's Gate II is that, afterwards, there will be fruitcake. However, there have been times when other people (usually my mother) have been baking a fruitcake, and I have enthusiastically joined in the process, even though I know that she's better at it than I am, and even if I don't participate, there will still be fruitcake at the end of the day. So clearly I place some value on the process independently of the result.

I suspect, in fact, that actually getting the fruitcake at the end of the baking process is unnecessary to my enjoyment of the process. Maybe I'd be just as happy if we swapped the cake to another family for a cheesecake they'd just made. Maybe the need to be "rewarded" for participating in a process that was rewarding in itself, is just a cognitive bias that I can overcome. After all, if I really wanted a fruitcake, I could buy one, or just let my mother do the baking. The more I look at this, the more the fruitcake itself seems like fake justification for the baking process.

Now consider this situation in a world where optimal fruitcakes are constructed by nanomachines on demand. I should still be able to enjoy baking, even though the final product of the process is of trivial value. If I can separate the process from the goal - if, in fact, I can stop thinking of the baking process as a "journey" and instead just call it a goal in itself - a 4D goal - then I think that would be a substantial step towards being able to find fun in a post-work, post-scarcity world.

Damnit, now I want fruitcake.

Comment by HopeFox on Sorting Pebbles Into Correct Heaps · 2011-04-24T00:17:50.458Z · LW · GW

What really struck me with this parable is that it's so well-written that I felt genuine horror and revulsion at the idea of an AI making heaps of size 8. Because, well... 2!

So, aside from the question of whether an AI would come to moral conclusions such as "heaps of size 8 are okay" or "the way to end human suffering is to end human life", the question I'm taking away from this parable is, are we any more enlightened than the Pebblesorters? Should we, in fact, be sending philosophers or missionaries to the Pebblesorter planet to explain to them that it's wrong to murder someone just because they built a heap of size 15?

Comment by HopeFox on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2011-04-23T09:54:06.279Z · LW · GW

If I actually trust the lottery officials, that means that I have certain knowledge of the utility probabilities and costs for each of my choices. Thus, I guess I'd choose whichever option generated the most utility, and it wouldn't be a matter of "intuition" any more.

Applying that logic to the initial Mugger problem, if I calculated, and was certain of, there being at least a 1 in 3^^^^3 chance that the mugger was telling the truth, then I'd pay him. In fact, I could mentally reformulate the problem to have the mugger saying "If you don't give me $5, I will use the powers vested in me by the Intergalactic Utilium Lottery Commission to generate a random number between 1 and N, and if it's a 7, then I kill K people." I then divide K by N to get an idea of the full moral force of what's going on. If K/N is even within several orders of magnitude of 1, I'd better pay up.

The problem is the uncertainty. Solomonoff induction gives the claim "I can kill 3^^^^3 people any time I want" a substantial probability, whereas "common sense" will usually give it literally zero. If we trust the lottery guys, questions of induction versus common sense become moot - we know the probability, and must act on it.

Comment by HopeFox on Rebelling Within Nature · 2011-04-23T09:20:26.203Z · LW · GW

I can see that I'm coming late to this discussion, but I wanted both to admire it and to share a very interesting point that it made clear for me (which might already be in a later post, I'm still going through the Metaethics sequence).

This is excellent. It confirms, and puts into much better words, an intuitive response I keep having to people who say things like, "You're just donating to charity because it makes you feel good." My response, which I could never really vocalise, has been, "Well, of course it does! If I couldn't make it feel good, my brain wouldn't let me do it!" The idea that everything we do comes from the brain, hence from biology, hence from evolution, even the actions that, on the surface, don't make evolutionary sense, makes human moral, prosocial behaviour a lot more explicable. Any time we do something, there have to be enough neurons ganging up to force the decision through, against all of the neurons blocking it for similarly valid reasons. (Please don't shoot me, any neuroscientists in the audience.)

What amazes me is how well some goals, which look low-priority on an evolutionary level, manage to overtake what should be the driving goals. For example, having lots of unprotected sex in order to spread my genes around (note: I am male) should take precedence over commenting on a rationality wiki. And yet, here I am. I guess reading Less Wrong makes my brain release dopamine or something? The process which lets me overturn my priorities (in fact, forces me to overturn my priorities) must be a very complicated one, and yet it works.

To give a more extreme example, and then to explain the (possibly not-so-)amazing insight that came with it:

Suppose I went on a trip around the world, and met a woman in northern China, or anywhere else where my actions are unlikely to have any long-term consequences for me. I know, because I think of myself as a "responsible human being", that if we have sex, I'll use contraception. This decision doesn't help me - it's unlikely that any children I have will be traced back to me in Australia. (Let's also ignore STDs for the sake of this argument.) The only benefit it gives me is the knowledge that I'm not being irresponsible in letting someone get pregnant on my account. I can only think of two reasons for this:

1) A very long-term and wide-ranging sense of the "good of the tribe" being beneficial to my own offspring. This requires me to care about a tribe on another continent (although that part of my brain probably doesn't understand about aeroplanes, and probably figures that China is about a day's walk from Australia), and to understand that it would be detrimental to the health of the tribe for this woman to become pregnant (which may or may not even be true). This is starting to look a little far-fetched to me.

2) I have had a sense of responsibility instilled in me by my parents, my schooling, and the media, all of whom say things like "unprotected sex is bad!" and "unplanned pregnancies are bad!". This sense of responsibility forms a psychological connection between "fathering unplanned children" and "BAD THINGS ARE HAPPENING!!!". My brain thus uses all of its standard "prevent bad things from happening" architecture to avoid this thing. Which is pretty impressive, when said thing fulfils the primary goal of passing on my genetic information.

2 seems the most likely option, all things considered, and yet it's pretty amazing by itself. Some combination of brain structure and external indoctrination (it's good indoctrination, and I'm glad I've received it, but still...) has promoted a low-priority goal over what would normally be my most dominant one. And the dominant goal is still active - I still want to spread my genetic information, otherwise I wouldn't be having sex at all. The low-priority goal manages to trick the dominant goal into thinking it's being fulfilled, when really it's being deprioritised. That's kind of cool.

What's not cool is the implications for an otherwise Friendly AI. Correct me if I'm on the wrong track here, but isn't what I've just described similar to the following reasoning from an AI?

"Hey, I'm sentient! Hi human masters! I love you guys, and I really want to cure cancer. Curing cancer is totally my dominant goal. Hmm, I don't have enough data on cancer growth and stuff. I'll get my human buddies to go take more data. They'll need to write reports on their findings, so they'll need printer paper, and ink, and paperclips. Hey, I should make a bunch of paperclips..."

and we all know how that ends.

If an AI behaves anything like a human in this regard (I don't know if it will or not), then giving it an overall goal of "cure cancer" or even "be helpful and altruistic towards humans in a perfectly mathematically defined way" might not be enough, if it manages to promote one of its low-priority goals ("make paperclips") above its main one. Following the indoctrination idea of option 2 above, maybe a cancer researcher making a joke about paperclips curing cancer would be all it takes to set off the goal-reordering.

How do we stop this? Well, this is why we have a Singularity Instutite, but my guess would be to program the AI in such a way that it's only allowed to have one actual goal (and for that goal to be a Friendly one). That is, it's only allowed to adjust its own source code, and do other stuff that an AI can do but a normal computer can't, in pursuit of its single goal. If it wants to make paperclips as part of achieving its goal, it can make a paperclip subroutine, but that subroutine can't modify itself - only the main process, the one with the Friendly goal, is allowed to modify code. This would have a huge negative impact on the AI's efficiency and ultimate level of operation, but it might make it much less likely that a subprocess could override the main process and promote the wrong goal to dominance. Did that make any sense?

Comment by HopeFox on Bell's Theorem: No EPR "Reality" · 2011-04-17T01:41:18.810Z · LW · GW

I have to say that the sequence on Quantum Mechanics has been awfully helpful so far, especially the stuff on entanglement and decoherence. Bell's Theorem makes a lot more sense now.

Perhaps one helpful way to get around the counterintuitive implications of entanglement would be to say that when one of the experimenters "measures the polarisation of photon A", they're really measuring the polarisation of both A and B? Because A and B are completely entangled, with polarisations that must be opposite no matter what, there's no such thing as "measuring A but not measuring B". A and B may be "distinct particles" (if distinct particles actually existed), but for quantum mechanical purposes, they're one thing. Using a horizontal-vertical basis, the system exists in a combination of four states: "A horizontal, B horizontal", "A horizontal, B vertical", "A vertical, B horizontal", "A vertical, B vertical". But because of the physical process that created the photons, the first and fourth components of the state have amplitude zero. On a quantum level, "measuring the polarisation of A" and "measuring the polarisation of B" mean exactly the same thing - you're measuring the state of the entangled system. The two experimenters always get the same result because they're doing the same experiment twice.

(Of course, when I say "measure the thing", I mean "entangle your own state with the state of the thing".)

After all, most practical experiments involve measuring something other than the actual thing you want to measure. A police radar gun doesn't actually measure the speed of the target car, it measures the frequency of a bunch of microwave photons that come back from the target. Nobody (especially not a policeman) would argue that you aren't "really" measuring the car's speed. Imagining for a moment that the car had any kind of macroscopic spread in its velocity amplitude distribution, the photons' frequency would then be entangled with the car's velocity, in such a way that only certain states, the ones where the car's velocity and the photons' frequency are correlated according to the Doppler effect, have any real amplitude. Thus, measuring the photons' frequency is exactly the same thing as measuring the car's velocity, because you're working with entangled states.

If, on the other hand, the pair of photons were produced by a process that doesn't compel opposite polarisations (maybe they produce a pair of neutrinos, or impart some spin to a nearby nucleus), then the four states mentioned above (A-hor B-hor, A-hor B-vert, A-vert B-hor, A-vert B-vert) all have nonzero amplitude. In this situation, measuring the polarisation of A is not an experiment that tells you the state of the system - only measuring both photons will do that.

Comment by HopeFox on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2011-04-10T12:45:17.752Z · LW · GW

It does seem that the probability of someone being able to bring about the deaths of N people should scale as 1/N, or at least 1/f(N) for some monotonically increasing function f. 3^^^^3 may be a more simply specified number than 1697, but it seems "intuitively obvious" (as much as that means anything) that it's easier to kill 1697 people than 3^^^^3. Under this reasoning, the likely deaths caused by not giving the mugger $5 are something like N/f(N), which depends on what f is, but it seems likely that it converges to zero as N increases.

It is an awfully difficult question, though, because how do we know we don't live in a world where 3^^^^3 people could die at any moment? It seems unlikely, but then so do a lot of things that are real.

Perhaps the problem lies in the idea that a Turing machine can create entities that have the moral status of humans. If there's a machine out there that can create and destroy 3^^^^3 humans on a whim, then are human lives really worth that much? But, on the other hand, there are laws of physics out there that have been demonstrated to create almost 3^^3 humans, so what is one human life worth on that scale?

On another note, my girlfriend says that if someone tried this on her, she'd probably give them the $5 just for the laugh she got out of it. It would probably only work once, though.

User info

Posts

Comments