Posts

Is friendly AI "trivial" if the AI cannot rewire human values? 2012-05-09T17:48:24.657Z

Comments

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-11T16:09:01.673Z · LW · GW

Thanks, I appreciate that. I have no problem with people disagreeing with me as confronting disagreement is how people (self included) grow. However, I was taken aback by the amount of down voting I received merely for disagreeing with people here and the fact that by merely choosing to respond to people's arguments it would effectively guarantee even more down votes—a system tied to how much you can participate in the community—made it more concerning to me. At least on the discussion board side of the site, I expected down voting to be reserved for posts that were derailing topics, flaming, ignoring arguments presented to them, etc., not for posts with which one disagreed. As someone who does academic research in AI, I thought this could be a fun lively online community to discuss that, but having my discussion board topic posting privileges removed because people did not agree with things I said (and the main post didn't even assert anything, it asked for feedback), I've reconsidered that. I'm glad to see not all people here think this was an appropriate use of down voting, but I feel like the community at large has spoken with regards to how they use that and when this thread ends I'll probably be moving on.

Thanks for you support though, I do appreciate that.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-11T15:22:42.292Z · LW · GW

I appreciate that sentiment and I'll also add that I appreciate that even in your prior post you made an effort to suggest what you thought I was driving at.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-11T04:17:30.171Z · LW · GW

When you think of a nation conquering another, the US and Japan is really what comes to your mind? Are you honestly having trouble grasping the distinction I was making? Because personally, I'm really not interested in continuing an irrelevant semantics debate.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-11T04:07:17.108Z · LW · GW

Yes. I find it odd that this argument is derailed into demanding a discussion on the finer points of the semantics for "conquer."

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T22:16:06.705Z · LW · GW

Conquer is typically used to mean that you take over the government and run the country, not just win a war.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T20:22:24.877Z · LW · GW

You're missing the point of talking about opposition. The AI doesn't want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn't about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.

And if the machine thinks that's the best way to make people happy (for whatever horrible reason--perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we're still in trouble.

This specifically violates the assumption that the AI has well modeled how any given human measures their well-being.

However, if you're trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won't make such mistakes, then you are describing friendly AI.

It is the assumption that it models human well-being at least as well as the best a human can model the well-being function of another. However, this constraint by itself does not solve friendly AI, because in a less constrained problem than the one I outlined, the most common response for an AI trying to maximize what humans value is that it will change and rewire what humans value to something more easy to maximize. The entire purpose of this post is to question whether it could achieve this without the ability to manually rewire human values (e.g., could this be done through persuasion?). In other words, you're claiming friendly AI is solved more easily than the constrained question I posed in the post.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T20:06:55.492Z · LW · GW

And as for the others? Or are you saying the AI trying to maximize well-being will try and succeed in effectively wiping out everyone and then condition future generations to have the desired easily maximized values? If so, this behavior is conditioned on the idea that the AI could be very confident in its ability to do so, because otherwise the chance of failing and the cost of war in expected value of human well-being would massively drop the expected value. I think you should also make clear what you think these values might end up being to which it will try to change human values to more easily maximize.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T19:56:00.567Z · LW · GW

We also didn't conquer Japan, we won the war. Those are two different things.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T17:28:59.373Z · LW · GW

Considering there were many people in germany who vehemently disliked the nazis too (even ignoring jews), it seems like a pretty safe bet that after being conquered we wouldn't have suddenly viewed the nazis as great people. Why do you think otherwise?

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T13:58:16.137Z · LW · GW

Lets lose the silly straw man arguments. I've already explicitly commented on how I don't believe the universe is fair and I think from that it should be obvious that I don't think really bad things can't happen. As far as moral progress goes, I think it happens in so far as its functional. Morals that lead to more successful societies win the competition and stick around. This often happens to move societies (not necessarily all people in the society) toward greater tolerance of peoples and less violence because oppressing people and allowing for more violence tends to have bad effects internally in the society.

If we were weaker the Nazis could have won. That's not even the central point though. For kicks, lets assume the Nazis would have won the war. What does that mean though? It still means that other humans were is huge opposition and went to war over it causing enumerable deaths. After the nazis won, there would also surely be people wildly unhappy with the situation. This presents a serious problem for the AI trying to maximize well-being. It would not want to do things that led to mass outrage and opposition because that fails its own metrics.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T13:47:27.550Z · LW · GW

Hunter gathers is not something sustainable for a large scale complex society. It is not a position we would favor at all and I'm struggling to see why an AI would try to make us value that set up or how you think a society with technology strong enough to make strong AI would be able to be convinced to it.

Views of killing animals is more flexible as the reason humans object to it seems to come from a level of innate compassion for life itself. So I could see that value being more manipulatable as a result. I don't see what that has to do with a doomsday set of values though.

1950s gener roles were abandoned because (1) women didn't like it (in which case maximizing people's well being would suggest not having such gender roles) and (2) it was less productive for society in that suppressing women limits the set of contributions to society.

I don't think you've presented here a set of doomsday values to which humans could be manipulated to holding by persuasion alone or demonstrated why they would be a set of values the AI would prefer humans to have for maximization.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T04:04:37.951Z · LW · GW

Most of our changes to where we are now seem to be a result of what works better in complex society and I therefore have difficulty accepting that a society in the highly advanced state it would be in by the time we had strong AI could be pushed to a non-productive doomsday set of values. So lets make the argument more clear then: what set of values do you think the AI could push us to through persuasion that would be effectively what we consider a doomsday scenario while and allowed the AI to more easily satisfy well-being?

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T02:42:18.497Z · LW · GW

I feel like I've already responded to this argument multiple times in various other responses I've made. If you think there's something I've overlooked in those responses let me know, but this seems like a restatement of things I've already addressed. Also, if there is something in one of the responses I've made with which you disagree and have a different reason than what's been presented, let me know.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T02:02:43.351Z · LW · GW

There is a profound difference between being persuasive and manipulating all sensory input of a human. Is your argument not that it would try to persuade but that an AI would hook up all humans to a computer that controlled everything we perceived? If you want to make that your argument, I'm game for discussing it, but I think it should be made clear that this is a very different argument than an AI trying to change people's minds through persuasion. But lets discuss it. This suggestion of manipulating the senses of humans seems to imply a massive use of technology and integration of the technology by the AI not available today, but that's okay, we should expect technology to improve incredibly by the time we can make strong AI. But so long as we're assuming that such huge amounts of improved technology with large integration is available and would allow the AI to pull the wool over everyone's eyes, we must also consider that humans have made use of that technology themselves to better themselves and provide wildly intelligent computer security systems such that it seems a stretch to me to posit that an AI could do this without anyone noticing.

How is this different from saying it's not going to let me take actions that cause extreme outrage? I hope you aren't planning on building an AI that has a sense of personal responsibility and doesn't care if humans subvert its utility function as long as it didn't cause them to do so.

I suppose if your actions were extreme enough in the outrage they caused we might make a case for those actions needing to be thwarted, even by the reasoning of the AI. I don't know you, but my guess is you're thinking perhaps of religious fundamentalists feelings about you? Such outrage on its own is (1) somewhat limited and counterbalanced by others and (2) counter productive for humanity to act upon in which case the better argument is not to thwart your actions but work toward behavior of tolerance. But lets contrast this to an AI trying to effectively replace mankind with easily satisfied humans and consider how people would respond to that. I think its clear that humans would work toward shutting such an AI down and would respond with extreme concern for their livelihood. The fact that we're sitting her talking about how this is doomsday scenario seems to be evidence of that concern. Given that, it just doesn't seem to be in the AIs interest to make that choice; it would simply cause too much of a collapse in the well-being of humanity with their profound concern for the situation.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-10T01:29:23.253Z · LW · GW

Do you honestly think a universe the size of ours can only support six billion people before reaching the point of diminishing returns?

That's not my point. The point is people aren't going to be happy if an AI starts making people that are easier to maximize for the sole reason that they're easier to maximize. This will suggest a problem to us by the very virtue that we are discussing hypotheticals where doing so is considered a problem by us.

If you allow it to use the same tools but better, it will be enough. If you don't, it's likely to only try to do things humans would do, on the basis that they're not smart enough to do what they really want done.

You seem to be trying to break the hypothetical assumption on the basis that I have not specified a complete criteria that would prevent an AI from rewiring the human brain. I'm not interested in trying to find a set of rules that would prevent an AI from rewiring human's brain (and I never tried to provide any, that's why it's called an assumption), because I'm not posing that as a solution to the problem. I've made this assumption to try and generate discussion all the problems where it will break down since typically discussion seems to stop at "it will rewire us". Trying to assert "yeah but it would rewire because you haven't strongly specified how it couldn't" really isn't relevant to what I'm asking since I'm trying to get specifically at what it could do besides that.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T23:48:01.063Z · LW · GW

I'm not sure how common it is, but I at least consider total well-being to be important. The more people the better. The easier to make these people happy, the better.

You must also consider that well-being need not be defined as a positive function. Even if it wasn't, if the gain of adding a person was less than drop in well-being of others, it wouldn't be beneficial unless the AI was able to without prevention, create many more such people.

An AI is much better at persuasion than you are. It would pretty much be able to convince you whatever it wants.

I'm sure it'd be better than me (unless I'm also heavily augmented by technology, but we can avoid that issue for now). On what grounds can you say that it'd be able to persuade me to anything it wants? Intelligence doesn't mean you can do anything and think this needs to be justified.

Our best neuroscientists are still mere mortals. Also, even among mere mortals, making small changes towards someones values are not difficult, and I don't think significant changes are impossible. For example, the consumer diamond industry would be virtually non-existant if De Beers didn't convince people to want diamonds.

I know they're mere mortals. We're operating under the assumption that the AI's methods of value manipulation are limited to what we can do ourselves, in which case rewiring is not something we can do with any great affect. The point of the assumption is to ask what the AI could do without more direct manipulation. To that end, only persuasion has been offered and as I've stated, I'm not seeing a compelling argument for why an AI could persuade anyone to anything.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T23:37:58.882Z · LW · GW

I don't think I live in a fair universe at all. Regardless, acknowledging that we don't live in a fair universe doesn't support your claim that an AI would be able to radically change the values of all humans on earth without outrage from others through persuasion alone.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T22:42:21.395Z · LW · GW

All human opinions cannot be created by persuasion alone because opinions have to start somewhere. People can and do think for themselves and that's what creates opinions. Then they might persuade people to have these opinions as well, but clearly persuasion is not the sole source and even then it's not like persuasion is a one-way process where you hit the persuade button and the other person is switched. It seems that your argument is that any human can be persuaded to any opinion at any time and I just can't buy that. Humans are malleable and we've made a huge number of mistakes in the past, but I don't see us as so bad that anyone can have their mind changed to anything regardless of the merit behind it. This entire site is based around getting people to not be arbitrarily malleable and to require rationality in making decisions—that there are objective conclusions and we should strive for them. Is this site and community a failure then? Are all of the people subject to mere persuasion in spite of rationality and cannot think for themselves?

Regarding actions that cause outrage I never said you were constrained by the outrage of others. I said an AI that maximizes human well-being is not going to take actions that cause extreme outrage.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:53:41.163Z · LW · GW

But the AI isn't being dropped into a completely undeveloped society. It will be dropped into an extremely developed society with values already existing. If the AI were dropped back into the era of early man, I could see major concern. I don't see humanity having the values we've developed being radically and entirely changed into something we consider so unsavory by persuasion alone. That doesn't mean no one could be affected, but I can't see such a thing going down without outrage from large sects of humanity; which is not what the AI wants.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:47:24.082Z · LW · GW

You make my point right there. World War 2. We went to war in defiance of nazis and refused to be assimilated. People in Germany didn't even like what the nazis were doing. And finally, the nazis didn't care about our outrage and death in the resulting war. An AI trying to maximize well-being, will care profoundly about that, by definition.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:43:53.594Z · LW · GW

And yet as time goes on civilization is progressing to more secular values. It will be interesting to see where we are by the time strong AI is possible, especially since we undoubtedly will be changing ourselves to improve our own capabilities. As I said in one of the other comments, I think assuming that humanity can in totality be persuaded to unsavory values, even through religion, is too negative a view of humanity. Humanity's history with religion is also filled with defiance and an AI that values human well-being will not be pleased with the outrage reaction as it tries to gain followers through persuasive means.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:39:11.477Z · LW · GW

This is not meant to be a resolution to FAI since you can't stop technology. It's meant to highlight whether the bad behavior of AI ends up being due to future technology to more directly change humanity. I'm asking the question because the answer to this may give insights as to how to tackle the problem.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:35:00.515Z · LW · GW

And yet humanity is resistant to large scale effects because we also combat changes in values that are destructive (like nazism). Are you suggesting that through persuasive means an AI could convert the values of all humanity to something unsavory? I think this is a bit too negative a view on humanity. You might suggest conditioning from birth, but this will result in outrage from the rest of humanity which the AI, by our utility definition, is trying to avoid.

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:23:11.785Z · LW · GW

Can you give examples of what you think humans capability to rewire another's values are?

As for what justifies the assumption? Nothing. I'm not asking it specifically because I don't think AIs will have it, I'm asking it so we can identify where the real problem lies. That is, I'm curious whether the real problem in terms of AI behavior being bad is entirely specific to advances in biological technology to which eventual AIs will have access, but we don't today. If we can conclude this is the case, it might help us in understanding how to tackle the problem. Another way to think of the question I'm asking is take such an AI robot and drop it into todays society. Will it start behaving badly immediately, or will it have to develop technology we don't have today before it can behave badly?

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:18:58.685Z · LW · GW

Thanks for the link, I'll give it a read.

Creating new people is potentially a problem, but I'm not entirely convinced. Let me elaborate. When you say:

What you need to do is program it so that it does what people would like if they were smarter, faster, and more the people they wish they were. In other words, use CEV.

Doesn't this kind of restate in different words that it models human well-being and tries to maximize that? I imagine when you phrased it this way that such an AI wouldn't create new people that are easier to maximize because that isn't what humans would want. And if that's not what humans would want doesn't that just mean it's negatively viewed in their well-being and my original definition suffices? Assuming humans don't want the AI to make new people that are simply easier to maximize, if it created a new person, all people on the earth view this negatively and their well-being drops. In fact, it may lead to humans shutting the AI down, so the AI deduces that it cannot create new people that are easier to maximize. The only possible hole in that I see is if the AI could suddenly create an enormous number of people at once..

Also, it's very hard to define what exactly constitutes "rewiring a human brain". If you make it too general, the AI can't do anything, because that would affect human brains. If you make it too specific, the AI would have some slight limitations on how exactly it messes with people's minds.

Indeed it's difficult to say precisely, that's why I used what we can do now as analogy. I can't really rewire a person's values at all except through persuasion or other such methods. Even our best neuroscientists can't do that unless I'm ignorant to some profound advances. The most we can really do is tweak pleasure centers (which as I stated isn't the metric for well-being) or effectively break the brain so the person is non-operational, but I'd argue that non-operational humans have effectively zero measure of well-being anyway (for similar reasons as to why I'd say a bug has a lower scale of well-being than a human does).

Comment by Alerus on Is friendly AI "trivial" if the AI cannot rewire human values? · 2012-05-09T21:00:55.694Z · LW · GW

What is wrong with the statement? The idea I'm trying to portray is that I as a person now, cannot go and forcefully rewire another person's values. The only ability I have to try an affect them is to be persuasive in argument or perhaps being deceptive about certain things to try and get them to a different position (e.g., consider the state of politics).

In contrast, one of the concerns for the future is that an AI may have the technological ability to more directly manipulate a person. So the question I'm asking is: is the future technology at the disposal of an AI the only reason it could behave "badly?" under such a utility function?

Also, please avoid such comments. I am interested in having this discussion, but alluding to finding something wrong in what I have posted and not saying what you think it is, is profoundly unhelpful and useless to discussion.

Comment by Alerus on Jason Silva on AI safety · 2012-05-09T18:17:26.571Z · LW · GW

I'm also wildly optimistic. Not because I don't think there are challenges we need to overcome, but because by the time we're able to make an AI as smart as us, I think we'll almost surely have those problems worked out.

Comment by Alerus on Consequentialist Formal Systems · 2012-05-08T23:51:46.847Z · LW · GW

So I think my basic problem here is I'm not familiar with this construct for decision making or why it would be favored over others. Specifically, why make logical rules about which actions to take? Why not take an MDP value-learning approach where the agent chooses an action based on which action has the highest predicted utility. If the estimate is bad, it's merely updated and if that situation arises again, the agent might choose a different action as a result of the latest update to it.

Comment by Alerus on [SEQ RERUN] Science Isn't Strict Enough · 2012-05-08T16:19:32.203Z · LW · GW

I feel like the suggested distinction between bayes and science is somewhat forced. Before I knew of bayes, I knew of Occam's razor and its incredible role in science. I had always been under the impression that science favored simpler hypotheses. If it is suggested that we don't see people rigorously adhering to bayes theorem when developing hypotheses, then the answer to why is not because science doesn't value the simpler hypotheses suggested by bayes and priors, but because determining the simplest hypothesis is incredibly difficult to do in many cases. And this difficulty is acknowledged in the post. As is such, I'm not seeing science as diverging from bayes, the way its practiced is just a consequence of the admitted difficulty of finding the correct priors and determining the space of hypotheses.

Comment by Alerus on Delayed Gratification vs. a Time-Dependent Utility Function · 2012-05-08T15:00:22.635Z · LW · GW

Yeah I agree that the ripple effect of your personal theft would be negligible. I see it as similar to littering. You do it in a vacuum, no big deal, but when many have that mentality, it causes problems. Sounds like you agree too :-)

Comment by Alerus on Delayed Gratification vs. a Time-Dependent Utility Function · 2012-05-07T20:47:58.145Z · LW · GW

Right, so if you can choose your utility function, then it's better to choose one that can be better maximized. Interestingly though, if we ever had this capability, I think we could just reduce the problem by using an unbiased utility function. That is, explicit preferences (such as liking math versus history) would be removed and instead we'd work with a more fundamental utility function. For instance, death is pretty much a universal stop point since you cannot gain any utility if you're dead, regardless of your function. This would be in a sense the basis of your utility function. We also find that death is better avoided when society works together and develops new technology. Your actions then might be dictated by what you are best at doing to facilitate the functioning and growth of society. This is why I brought up society damaning as being potentially objectively worse. You might be able to come up with specific instances of actions that we associate as society-damaging that seem okay, such as specific instances of stealing, but then they aren't really society damaging in the grand scheme of things. That said, I think as a rule of thumb stealing is bad in most cases due to the ripple effects of living in a society in which people do that, but that's another discussion. The point is there may be objectively better choices even if you have no explicit preferences for things (or you can choose your preferences).

Of course, that's all conditioned on whether you can choose your utility function. For our purposes for the foreseeable future, that is not the case and so you should stick with expected utility functions.

Comment by Alerus on On what rationality-related topic should I give a school presentation? · 2012-05-07T20:16:50.485Z · LW · GW

It's hard for me to gauge your audience, so maybe this wouldn't be terribly useful, but a talk outlining logical fallacies (especially lesser-known ones) and why they are fallacies seems like it would have a high impact since I think the layperson commits fallacies quite frequently. Or should I say, I observe people committing fallacies more often than I'd like :p

Comment by Alerus on Welcome to Less Wrong! (2012) · 2012-05-07T15:48:56.634Z · LW · GW

Hi! So I've actually already made a few comments on this site, but had neglected to introduce myself so I thought I'd do so now. I'm a PhD candidate in computer science at the University of Maryland, Baltimore County. My research interests are in AI and Machine Learning. Specifically, my dissertation topic is on generalization in reinforcement learning (policy transfer and function approximation).

Given this, AI is obviously my biggest interest, but as a result, my study of AI has led me to applying the same concepts to human life and reasoning. Lately, I've also been thinking more about systems of morality and how an agent should reach rational moral conclusions. My knowledge of existing working in ethics is not profound, but my impression is that most systems seem to be at too high a level to make concrete (my metric is whether we could implement it in an AI; if we cannot, then it's probably too high-level for us to reason strongly with it ourselves). Even desirism, which I've examined at least somewhat, seems to be a bit too high-level, but is perhaps closer to the mark than others (to be fair, I may just not know enough about it). In response to these observations, I've been developing my own system of morality that I'd like to share here in the near future to receive input.

Comment by Alerus on [SEQ RERUN] Science Doesn't Trust Your Rationality · 2012-05-07T14:22:27.018Z · LW · GW

I disagree with the quoted part of the post. Science doesn't reject your bayesian conclusion (provided it is rational), it's simply unsatisfied by the fact that it's a probabilistic conclusion. That is, probabilistic conclusions are never knowledge of truth. They are estimations of the likelihood of truth. Science will look at your bayesian conclusion and say "99% confident? That's good!, but lets gather more data and raise the bar to 99.9%!). Science is the constant pursuit of knowledge. It will never reach it it, but it will demand we never stop trying to get closer.

Beyond that, I think in a great many cases (not all) there are also some inherent problems in using explicit bayesian (or otherwise) reasoning for models of reality because we simply have no idea what the space of hypotheses could be. As is such, the best bayesian can ever do in this context is give an ordering of models (e.g., this model is better than this model), not definitive probabilities. This doesn't mean science rejects correct bayesian reasoning for the reason previously stated, but it would mean that you can't get definitive probabilistic conclusions with bayesian reasoning in the first place for many contexts.

Comment by Alerus on Delayed Gratification vs. a Time-Dependent Utility Function · 2012-05-07T13:23:08.400Z · LW · GW

Yeah I agree that you would have to consider time. However, my feeling is that for the utility calculation to be performed at all (that is, even in the context of a fixed utility), you must also consider time through the state of being in all subsequent states, so now you just add and expected utility calculation to each of those subsequent states (and therefore implicitly capture the length of time it lasts) instead of the fixed utility. It is possible, I suppose, that the probability could be conditional on the previous state's utility function too. That is, if you're really into math one day it's more likely that you could switch to statistics rather than history following that, but if you have it conditioned on having already switched to literature, maybe history would be more likely then. That makes for a more complex analysis, but again, approximations and all would help :p

Regarding your second question, let me make sure I've understood it correctly. You're basically saying couldn't you change the utility function, what you value, on the whims of what is most possible? For instance, if you were likely to wind up stuck in a log cabin that for entertainment only had books on the civil war, that you change your utility to valuing civil war books? Assuming I understood that correctly, if you could do that, I suppose changing your utility to reflect your world would be the best choice. Personally, I don't think humans are quite that malleable and so you're to an extent kind of stuck with who you are. Ultimately, you might also find that some things are objectively better or worse than others; that regardless of the utility function some things are worse. Things that are damaging to society, for instance, might be objectively worse than alternatives because the consequential reproductions for you will almost always be bad (jail, a society that doesn't function as well because you just screwed it up, etc.). If true, you still would have some constant guiding principles, it would just mean that there are a set of other paths that are in a way equally good.

Comment by Alerus on Delayed Gratification vs. a Time-Dependent Utility Function · 2012-05-06T18:02:35.477Z · LW · GW

So it seems to me that the solution is use an expected utility function rather than a fixed utility function. Lets speak abstractly for the moment, and consider the space of all relevant utility functions (that is, all utility functions that would change the utility evaluate of an action). At each time step, we now will associate a probability of you transitioning from your current utility function to any of these other utility functions. For any given future state then, we can compute the expected utility. When you run your optimization algorithm to determine your action, what you therefore do is try and maximize the expected utility function, not the current utility function. So the key is going to wind up being assigning estimates to the probability of switching to any other utility function. Doing this in an entirely complete way is difficulty I'm sure, but my guess is that you can come to reasonable estimates that make it possible to do the reasoning.

Comment by Alerus on Towards a New Decision Theory for Parallel Agents · 2011-12-25T16:03:00.113Z · LW · GW

I think you may be partitioning things that need not necessarily be partitioned and it's important to note that. In the nicotine example (or the "lock the refrigerator door" example in the cited material), this is not necessarily a competition between the wants of different agents. This apparent dichotomy can also be resolved by internal states as well as utility discount factors.

To be specific, revisit the nicotine problem. When a person decides to quit they may not be suffering any discomfort so the utility of smoking at that moment is small. Instead then, the eventual utility of longer life wins out and the agent decides to stop smoking. However, once discomfort sets in, it combines with the action of smoking because smoking will relieve the discomfort. Now the individual still has the other utility assigned to not dying sooner (which would favor the "don't smoke" action). However, the death outcome will happen much later. Even though death is far worse than the current discomfort being felt (assuming a "normal" agent ;), so long as the utilities also operate on a temporal discount factor, that utility may be reduced to be smaller than the utility of smoking that will remove the current discomfort due to how much it gets discounted from it happen much further in the future.

At no point have we needed to postulate that these are separate competing agents with different wants and this seeming contradiction is still perfectly resolved with a single utility function. In fact, wildly different agent behavior can be revealed by mere changes in the discount factor for enumerable reinforcement learning (RL) agents where discount and reward functions are central to the design of the algorithm.

Now, which answer to the question is true? Is the smoke/don't smoke contradiction a result of competing agents or discount factors and internal states? I suppose it could be either one, but it's important to not assume that these examples directly indicate that there are competing agents with different desires, otherwise you may lose yourself looking for something that isn't there.

Of course, even if we assume that there are competing agents with different desires, it seems to me this still can be, at least mathematically, reduced to a single utility function. All it means, is that you apply weights to the utilities of different agents, and then standard reasoning mechanisms are employed.