On Terminal Goals and Virtue Ethics

post by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2014-06-18T04:00:05.196Z · LW · GW · Legacy · 207 comments

Contents

  Introduction
  Virtue Ethics
  Terminal Goals
  Practicing the art of rationality
  Why write this post?
None
207 comments

Introduction

A few months ago, my friend said the following thing to me: “After seeing Divergent, I finally understand virtue ethics. The main character is a cross between Aristotle and you.”

That was an impossible-to-resist pitch, and I saw the movie. The thing that resonated most with me–also the thing that my friend thought I had in common with the main character–was the idea that you could make a particular decision, and set yourself down a particular course of action, in order to make yourself become a particular kind of person. Tris didn’t join the Dauntless cast because she thought they were doing the most good in society, or because she thought her comparative advantage to do good lay there–she chose it because they were brave, and she wasn’t, yet, and she wanted to be. Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’ 

(Tris did have a concept of some future world-outcomes being better than others, and wanting to have an effect on the world. But that wasn't the causal reason why she chose Dauntless; as far as I can tell, it was unrelated.)

My twelve-year-old self had a similar attitude. I read a lot of fiction, and stories had heroes, and I wanted to be like them–and that meant acquiring the right skills and the right traits. I knew I was terrible at reacting under pressure–that in the case of an earthquake or other natural disaster, I would freeze up and not be useful at all. Being good at reacting under pressure was an important trait for a hero to have. I could be sad that I didn’t have it, or I could decide to acquire it by doing the things that scared me over and over and over again. So that someday, when the world tried to throw bad things at my friends and family, I’d be ready.

You could call that an awfully passive way to look at things. It reveals a deep-seated belief that I’m not in control, that the world is big and complicated and beyond my ability to understand and predict, much less steer–that I am not the locus of control. But this way of thinking is an algorithm. It will almost always spit out an answer, when otherwise I might get stuck in the complexity and unpredictability of trying to make a particular outcome happen.


Virtue Ethics

I find the different houses of the HPMOR universe to be a very compelling metaphor. It’s not because they suggest actions to take; instead, they suggest virtues to focus on, so that when a particular situation comes up, you can act ‘in character.’ Courage and bravery for Gryffindor, for example. It also suggests the idea that different people can focus on different virtues–diversity is a useful thing to have in the world. (I'm probably mangling the concept of virtue ethics here, not having any background in philosophy, but it's the closest term for the thing I mean.)

I’ve thought a lot about the virtue of loyalty. In the past, loyalty has kept me with jobs and friends that, from an objective perspective, might not seem like the optimal things to spend my time on. But the costs of quitting and finding a new job, or cutting off friendships, wouldn’t just have been about direct consequences in the world, like needing to spend a bunch of time handing out resumes or having an unpleasant conversation. There would also be a shift within myself, a weakening in the drive towards loyalty. It wasn’t that I thought everyone ought to be extremely loyal–it’s a virtue with obvious downsides and failure modes. But it was a virtue that I wanted, partly because it seemed undervalued. 

By calling myself a ‘loyal person’, I can aim myself in a particular direction without having to understand all the subcomponents of the world. More importantly, I can make decisions even when I’m rushed, or tired, or under cognitive strain that makes it hard to calculate through all of the consequences of a particular action.

 

Terminal Goals

The Less Wrong/CFAR/rationalist community puts a lot of emphasis on a different way of trying to be a hero–where you start from a terminal goal, like “saving the world”, and break it into subgoals, and do whatever it takes to accomplish it. In the past I’ve thought of myself as being mostly consequentialist, in terms of morality, and this is a very consequentialist way to think about being a good person. And it doesn't feel like it would work. 

There are some bad reasons why it might feel wrong–i.e. that it feels arrogant to think you can accomplish something that big–but I think the main reason is that it feels fake. There is strong social pressure in the CFAR/Less Wrong community to claim that you have terminal goals, that you’re working towards something big. My System 2 understands terminal goals and consequentialism, as a thing that other people do–I could talk about my terminal goals, and get the points, and fit in, but I’d be lying about my thoughts. My model of my mind would be incorrect, and that would have consequences on, for example, whether my plans actually worked.

 

Practicing the art of rationality

Recently, Anna Salamon brought up a question with the other CFAR staff: “What is the thing that’s wrong with your own practice of the art of rationality?” The terminal goals thing was what I thought of immediately–namely, the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some. 

In Alicorn’s Luminosity, Bella says about her thoughts that “they were liable to morph into versions of themselves that were more idealized, more consistent - and not what they were originally, and therefore false. Or they'd be forgotten altogether, which was even worse (those thoughts were mine, and I wanted them).”

I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous. When my immediate response to someone asking me about my terminal goals is “but brains don’t work that way!” it may not be a true statement about all brains, but it’s a true statement about my brain. My motivational system is wired in a certain way. I could think it was broken; I could let my friends convince me that I needed to change, and try to shoehorn my brain into a different shape; or I could accept that it works, that I get things done and people find me useful to have around and this is how I am. For now. I'm not going to rule out future attempts to hack my brain, because Growth Mindset, and maybe some other reasons will convince me that it's important enough, but if I do it, it'll be on my terms. Other people are welcome to have their terminal goals and existential struggles. I’m okay the way I am–I have an algorithm to follow.

 

Why write this post?

It would be an awfully surprising coincidence if mine was the only brain that worked this way. I’m not a special snowflake. And other people who interact with the Less Wrong community might not deal with it the way I do. They might try to twist their brains into the ‘right’ shape, and break their motivational system. Or they might decide that rationality is stupid and walk away.

207 comments

Comments sorted by top scores.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-17T20:49:51.431Z · LW(p) · GW(p)

"Good people are consequentialists, but virtue ethics is what works," is what I usually say when this topic comes up. That is, we all think that it is virtuous to be a consequentialist and that good, ideal rationalists would be consequentialists. However, when I evaluate different modes of thinking by the effect I expect them to have on my reasoning, and evaluate the consequences of adopting that mode of thought, I find that I expect virtue ethics to produce the best adherence rate in me, most encourage practice, and otherwise result in actually-good outcomes.

But if anyone thinks we ought not to be consequentialists on the meta-level, I say unto you that lo they have rocks in their skulls, for they shall not steer their brains unto good outcomes.

Replies from: Ruby, jphaas, TheAncientGeek, TruePath
comment by Ruby · 2014-06-18T03:23:42.395Z · LW(p) · GW(p)

If ever you want to refer to an elaboration and justification of this position, see R. M. Hare's two-level utilitarianism, expounded best in this paper: Ethicial Theory and Utilitarianism (see pp. 30-36).

To argue in this way is entirely to neglect the importance for moral philosophy of a study of moral education. Let us suppose that a fully informed archangelic act-utilitarian is thinking about how to bring up his children. He will obviously not bring them up to practise on every occasion on which they are confronted with a moral question the kind of arch angelic thinking that he himself is capable of [complete consequentialist reasoning]; if they are ordinary children, he knows that they will get it wrong. They will not have the time, or the information, or the self-mastery to avoid self-deception prompted by self-interest; this is the real, as opposed to the imagined, veil of ignorance which determines our moral principles.

So he will do two things. First, he will try to implant in them a set of good general principles. I advisedly use the word 'implant'; these are not rules of thumb, but principles which they will not be able to break without the greatest repugnance, and whose breach by others will arouse in them the highest indignation. These will be the principles they will use in their ordinary level-1 moral thinking, especially in situations of stress. Secondly, since he is not always going to be with them, and since they will have to educate their children, and indeed continue to educate themselves, he will teach them,as far as they are able, to do the kind of thinking that he has been doing himself. This thinking will have three functions. First of all, it will be used when the good general principles conflict in particular cases. If the principles have been well chosen, this will happen rarely; but it will happen. Secondly, there will be cases (even rarer) in which, though there is no conflict between general principles, there is something highly unusual about the case which prompts the question whether the general principles are really fitted to deal with it. But thirdly, and much the most important, this level-2 thinking will be used to select the general principles to be taught both to this and to succeeding generations. The general principles may change, and should change (because the environment changes). And note that, if the educator were not (as we have supposed him to be) arch angelic, we could not even assume that the best level-1 principles were imparted in the first place; perhaps they might be improved.

How will the selection be done? By using level-2 thinking to consider cases, both actual and hypothetical, which crucially illustrate, and help to adjudicate, disputes between rival general principles.

Replies from: kybernetikos
comment by kybernetikos · 2014-06-18T23:12:47.790Z · LW(p) · GW(p)

That's very interesting, but isn't the level-1 thinking closer to deontological ethics than virtue ethics, since it is based on rules rather than on the character of the moral agent?

Replies from: Ruby, ialdabaoth
comment by Ruby · 2014-06-19T10:21:38.080Z · LW(p) · GW(p)

My understanding is that when Hare says rules or principles for level-1 he means it generically and is agnostic about what form they'd take. "Always be kind" is also a rule. For clarity, I'd substitute the word 'algorithm' for 'rules'/'principles'. Your level-2 algorithm is consequentialism, but then your level-1 algorithm is whatever happens to consequentially work best - be it inviolable deontological rules, character-based virtue ethics, or something else.

comment by ialdabaoth · 2014-06-18T23:52:10.125Z · LW(p) · GW(p)

level-1 thinking is actually based on habit and instinct more than rules; rules are just a way to describe habit and instinct.

Replies from: Ruby, kybernetikos
comment by Ruby · 2014-06-19T10:25:50.029Z · LW(p) · GW(p)

Level-1 is about rules which your habit and instinct can follow, but I wouldn't say they're ways to describe it. Here we're talking about normative rules, not descriptive System 1/System 2 stuff.

comment by kybernetikos · 2014-06-19T00:08:31.909Z · LW(p) · GW(p)

And the Archangel has decided to take some general principles (which are rules) and implant them in the habit and instinct of the children. I suppose you could argue that the system implanted is a deontological one from the Archangels point of view, and merely instinctual behaviour from the childrens point of view. I'd still feel that calling instinctual behaviour 'virtue ethics' is a bit strange.

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-19T00:14:04.109Z · LW(p) · GW(p)

not quite. The initial instincts are the system-1 "presets". These can and do change with time. A particular entity's current system-1 behavior are its "habits".

comment by jphaas · 2014-06-18T14:15:29.232Z · LW(p) · GW(p)

Funny, I always thought it was the other way around... consequentialism is useful on the tactical level once you've decided what a "good outcome" is, but on the meta-level, trying to figure out what a good outcome is, you get into questions that you need the help of virtue ethics or something similar to puzzle through. Questions like "is it better to be alive and suffering or to be dead", or "is causing a human pain worse than causing a pig pain", or "when does it become wrong to abort a fetus", or even "is there good or bad at all?"

Replies from: CCC, Armok_GoB
comment by CCC · 2014-06-30T09:49:37.897Z · LW(p) · GW(p)

I think that the reason may be that consequentionalism requires more computation; you need to re-calculate the consequences for each and every action.

The human brain is mainly a pattern-matching device - it uses pattern-matching to save on computation cycles. Virtues are patterns which lead to good behaviour. (Moreover, these patterns have gone through a few millenia of debugging - there are plenty of cautionary tales about people with poorly chosen virtues to serve as warnings). The human brain is not good at quickly recalcuating long-term consequences from small changes in behaviour.

comment by Armok_GoB · 2014-06-29T18:42:57.329Z · LW(p) · GW(p)

What actually happens is you should be consequential at even-numbered meta-levels and virtue-based on the odd numbered ones... or was it the other way around? :p

comment by TheAncientGeek · 2014-06-18T14:56:55.392Z · LW(p) · GW(p)

Say I apply consequentialism to a set of end states I can reliably predict, and use something else for the set I cannot. In what sense should I be a consequentialist about the second set?

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-18T15:08:49.725Z · LW(p) · GW(p)

In what sense should I be a consequentialist about the second set?

In the sense that you can update on evidence until you can marginally predict end states?

I'm afraid I can't think of an example where there's a meta-level but on predictive capacity on that meta-level. Can you give an example?

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-18T15:17:17.458Z · LW(p) · GW(p)

I have no hope of being able to predict everything...there is always going to be a large set of end states I can't predict?

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-18T15:32:38.038Z · LW(p) · GW(p)

Then why have ethical opinions about it at all? Again, can you please give an example of a situation where this would come up?

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-18T15:36:11.222Z · LW(p) · GW(p)

Lo! I have been so instructed-eth! See above.

comment by TruePath · 2014-06-22T00:39:28.085Z · LW(p) · GW(p)

"Good people are consequentialists, but virtue ethics is what works,"

To nit pick a little I don't think consequentialism even allows one to coherently speak about good people and it certainly doesn't show that consequentialists are such people (standard alien who tortures people when they find consequentialists example).

Moreover, don't believe there is any sense in which one can show people who aren't consequentialists are making some mistake or even that people who value other consequences are doing so. You tacitly admit this with your examples of paper clip maximizing aliens and I doubt you can coherently claim that those who assert that objectively virtue ethics is correct are any less rational than those who assert that consequentialism is correct.

You and I both judge non-consequentialists to be foolish but we have to be careful to distinguish between simply strongly disapproving of their views and actually accusing them of irrationality. Indeed, the actions prescribed by any non-consequentialist moral theory are identical to those prescribed by some consequentialist theory (every possible choice pattern results in a different total world state so you can always order them to give identical results to whatever moral theory you like).

Given this point I think it is a little dangerous to speak to the meta-level. I mean ideally one would simply say I think objectively hedonic/whatever consequentialism is true regardless of what is pragmatically useful. Unfortunately, it's very unclear what the 'truth' of consequentialism even consists of if those who follow a non-consequentialist moral theory aren't logically incorrect.

Pedantically speaking it seems the best one can do is say that when given the luxury of considering situations you aren't emotionally close to and have time to think about you will apply consequentialist reasoning that values X to recommend actions to people and that in such moods you do strive to bind your future behavior as that reasoning demands.

Of course that too is still not quite right. Even in a contemplative mood we rarely become totally selfless and I doubt you (any more than I) actually strive to bind yourself so that given then choice you would torture and kill your loved ones to help n+1 strangers avoid the same fate (assuming those factors not relevant to the consequences you say you care about).

Overall it's all a big mess and I don't see any easy statements that are really correct.

comment by [deleted] · 2014-06-16T10:54:38.084Z · LW(p) · GW(p)

I am going to write the same warning I have written to rationalist friends in relation to the Great Filter Hypothesis and almost everything on Overcoming Bias: BEWARE OF MODELS WITH NO CAUSAL COMPONENTS! I repeat: BEWARE NONCAUSAL MODELS!!! In fact, beware of nonconstructive mental models as well, while we're at it! Beware classical logic, for it is nonconstructive! Beware noncausal statistics, for it is noncausal and nonconstructive! All these models, when they contain true information, and accurately move that information from belief to belief in strict accordance with the actual laws of statistical inference, still often fail at containing coherent propositions to which belief-values are being assigned, and at corresponding to the real world.

Now apply the above warning to virtue ethics.

Now let's dissolve the above warning about virtue ethics and figure out what it really means anyway, since almost all of us real human beings use some amount of it.

It's not enough to say that human beings are not perfectly rational optimizers moving from terminal goals to subgoals to plans to realized actions back to terminal goals. We must also acknowledge that we are creatures of muscle and neural-net, and that the lower portions (ie: almost all) of our minds work via reinforcement, repetition, and habit, just as our muscles are built via repeated strain.

Keep in mind that anything you consciously espouse as a "terminal goal" is in fact a subgoal: people were not designed to complete a terminal goal and shut off.

Practicing virtue just means that I recognize the causal connection between my present self and future self, and optimize my future self for the broad set of goals I want to be able to accomplish, while also recognizing the correlations between myself and other people, and optimizing my present and future self to exploit those correlations for my own goals.

Because my true utility function is vast and complex and only semi-known to me, I have quite a lot of logical uncertainty over what subgoals it might generate for me in the future. However, I do know some actions I can take to make my future self better able to address a broad range of subgoals I believe my true utility function might generate, perhaps even any possible subgoal. The qualities created in my future self by those actions are virtues, and inculcating them in accordance with the design of my mind and body is virtue ethics.

As an example, I helped a friend move his heavy furniture from one apartment to another because I want to maintain the habit of loyalty and helpfulness to my friends (cue House Hufflepuff) for the sake of present and future friends, despite this particular friend being a total mooching douchebag. My present decision will change the distribution of my future decisions, so I need to choose for myself now and my potential future selves.

Not really that complicated, when you get past the philosophy-major stuff and look at yourself as a... let's call it, a naturalized human being, a body and soul together that are really just one thing.

Replies from: ialdabaoth, bramflakes, TheAncientGeek, Benquo
comment by ialdabaoth · 2014-06-16T20:39:55.325Z · LW(p) · GW(p)

I will reframe this to make sure I understand it:

Virtue Ethics is like weightlifting. You gotta hit the gym if you want strong muscles. You gotta throw yourself into situations that cultivate virtue if you want to be able to act virtuously.

Consequentialism is like firefighting. You need to set yourself up somewhere with a firetruck and hoses and rebreathers and axes and a bunch of cohorts who are willing to run into a fire with you if you want to put out fires.

You can't put out fires by weightlifting, but when the time comes to actually rush into a fire, bust through some walls, and drag people out, you really should have been hitting the gym consistently for the past several months.

Replies from: None, Leon
comment by [deleted] · 2014-06-16T20:52:12.433Z · LW(p) · GW(p)

That's such a good summary I wish I'd just written that instead of the long shpiel I actually posted.

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-16T21:01:57.818Z · LW(p) · GW(p)

That's such a good summary I wish I'd just written that instead of the long shpiel I actually posted.

Thanks for the compliment!

I am currently wracking my brain to come up with a virtue-ethics equivalent to the "bro do you even lift" shorthand - something pithy to remind people that System-1 training is important to people who want their System-1 responses to act in line with their System-2 goals.

Replies from: FeepingCreature, bbleeker
comment by FeepingCreature · 2014-06-17T16:06:56.517Z · LW(p) · GW(p)

something pithy

Rationalists should win?

Maybe with a sidenote how continuously recognizing in detail how you failed to win just now is not winning.

Replies from: KnaveOfAllTrades
comment by KnaveOfAllTrades · 2014-06-17T17:38:57.198Z · LW(p) · GW(p)

'Do you even win [bro/sis/sib]?'

comment by Sabiola (bbleeker) · 2014-06-18T12:13:07.264Z · LW(p) · GW(p)

How about 'Train the elephant'?

comment by Leon · 2014-06-19T01:22:07.267Z · LW(p) · GW(p)

Here's how I think about the distinction on a meta-level:

"It is best to act for the greater good (and acting for the greater good often requires being awesome)."

vs.

"It is best to be an awesome person (and awesome people will consider the greater good)."

where ''acting for the greater good" means "having one's own utility function in sync with the aggregate utility function of all relevant agents" and "awesome" means "having one's own terminal goals in sync with 'deep' terminal goals (possibly inherent in being whatever one is)" (e.g. Sam Harris/Aristotle-style 'flourishing').

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-19T01:28:12.041Z · LW(p) · GW(p)

So arete, then?

comment by bramflakes · 2014-06-16T16:12:07.144Z · LW(p) · GW(p)

I am going to write the same warning I have written to rationalist friends in relation to the Great Filter Hypothesis and almost everything on Overcoming Bias: BEWARE OF MODELS WITH NO CAUSAL COMPONENTS! I repeat: BEWARE NONCAUSAL MODELS!!! In fact, beware of nonconstructive mental models as well, while we're at it! Beware classical logic, for it is nonconstructive! Beware noncausal statistics, for it is noncausal and nonconstructive! All these models, when they contain true information, and accurately move that information from belief to belief in strict accordance with the actual laws of statistical inference, still often fail at containing coherent propositions to which belief-values are being assigned, and at corresponding to the real world.

Can you explain this part more?

Replies from: None, lmm
comment by [deleted] · 2014-06-16T18:41:28.730Z · LW(p) · GW(p)

With pleasure!

Ok, so the old definition of "knowledge" was "justified true belief". Then it turned out that there were times when you could believe something true, but have the justification be mere coincidence. I could believe "Someone is coming to see me today" because I expect to see my adviser, but instead my girlfriend shows up. The statement as I believed it was correct, but for a completely different reason than I thought. So Alvin Goldman changed this to say, "knowledge is true belief caused by the truth of the proposition believed-in." This makes philosophers very unhappy but Bayesian probability theorists very happy indeed.

Where do causal and noncausal statistical models come in here? Well, right here, actually: Bayesian inference is actually just a logic of plausible reasoning, which means it's a way of moving belief around from one proposition to another, which just means that it works on any set of propositions for which there exists a mutually-consistent assignment of probabilities.

This means that quite often, even the best Bayesians (and frequentists as well) construct models (let's switch to saying "map" and "territory") which not only are not caused by reality, but don't even contain enough causal machinery to describe how reality could have caused the statistical data.

This happens most often with propositions of the form "There exists X such that P(X)" or "X or Y" and so forth. These are the propositions where belief can be deduced without constructive proof: without being able to actually exhibit the object the proposition applies to. Unfortunately, if you can't exhibit the object via constructive proof (note that constructive proofs are isomorphic to algorithms for actually generating the relevant objects), I'm fairly sure you cannot possess a proper description of the causal mechanisms producing the data you see. This means that not only might your hypotheses be wrong, your entire hypothesis space might be wrong, which could make your inferences Not Even Wrong, or merely confounded.

(I can't provide mathematics showing any formal tie between causation/causal modeling and constructive proof, but I think this might be because I'm too much an amateur at the moment. My intuitions say that in a universe where incomputable things don't generate results in real-time and things don't happen for no reason at all, any data I see must come from a finitely-describable causal process, which means there must exist a constructive description of that process -- even if classical logic could prove the existence of and proper value for the data without encoding that constructive decision!)

What can also happen, again particularly if you use classical logic, is that you perform sound inference over your propositions, but the propositions themselves are not conceptually coherent in terms of grounding themselves in causal explanations of real things.

So to use my former example of the Great Filter Hypothesis: sure, it makes predictions, sure, we can assign probabilities, sure, we can do updates. But nothing about the Great Filter Hypothesis is constructive or causal, nothing about it tells us what to expect the Filter to do or how it actually works. Which means it's not actually telling us much at all, as far as I can say.

(In relation to Overcoming Bias, I've ranted on similarly about explaining all possible human behaviors in terms of signalling, status, wealth, and power. Paging /u/Quirinus_Quirrell... If they see a man flirting with a woman at a party, Quirrell and Hanson will seem to explain it in terms of signalling and status, while I will deftly and neatly predict that the man wants to have sex with the woman. Their explanation sounds until you try to read its source code, look at the causal machine working, and find that it dissolves into cloud around the edges. My explanation grounds itself in hormonal biology and previous observation of situations where similar things occurred.)

Replies from: Jiro, Eugine_Nier, bramflakes, Friendly-HI
comment by Jiro · 2014-06-16T22:13:20.167Z · LW(p) · GW(p)

So Alvin Goldman changed this to say, "knowledge is true belief caused by the truth of the proposition believed-in." This makes philosophers very unhappy but Bayesian probability theorists very happy indeed.

If I am insane and think I'm the Roman emperor Nero, and then reason "I know that according to the history books the emperor Nero is insane, and I am Nero, so I must be insane", do I have knowledge that I am insane?

Replies from: drnickbone, Friendly-HI
comment by drnickbone · 2014-06-18T16:34:51.602Z · LW(p) · GW(p)

Note that this also messes up counterfactual accounts of knowledge as in "A is true and I believe A; but if A were not true then I would not believe A". (If I were not insane, then I would not believe I am Nero, so I would not believe I am insane.)

We likely need some notion of "reliability" or "reliable processes" in an account of knowledge, like "A is true and I believe A and my belief in A arises through a reliable process". Believing things through insanity is not a reliable process.

Gettier problems arise because processes that are usually reliable can become unreliable in some (rare) circumstances, but still (by even rarer chance) get the right answers.

Replies from: Jiro
comment by Jiro · 2014-06-18T19:54:27.905Z · LW(p) · GW(p)

The insanity example is not original to me (although I can't seem to Google it up right now). Using reliable processes isn't original, either, and if that actually worked, the Gettier Problem wouldn't be a problem.

comment by Friendly-HI · 2014-06-18T15:55:08.881Z · LW(p) · GW(p)

Interesting thought but surely the answer is no. If I take the word "knowledge" in this context to mean having a model that reasonably depicts reality in its contextually relevant features, then the same model of what the word "insane" in this specific instance depicts two very different albeit related brain patterns.

Simply put the brain pattern (wiring + process) that makes the person think they are Nero is a different though surely related physical object than the brain pattern that depicts what that person thinks "Nero being insane" might actually manifest like in terms of beliefs and behaviors. In light of the context we can say the person doesn't have any knowledge about being insane, since that person's knowledge does not include (or take seriously) the belief that depicts the presumably correct reality/model of that person not actually being Nero.

Put even simpler we use the same concept/word to model two related but fundamentally different things. Does that person have knowledge about being insane? It's the tree and the sound problem, the word insane is describing two fundamentally different things yet wrongfully taken to mean the same. I'd claim any reasonable concept of the word insane results in you concluding that that person does not have knowledge about being insane in the sense that is contextually relevant in this scenario, while the person might have actually roughly true knowledge about how Nero might have been insane and how that manifested itself. But those are two different things and the latter is not the contextually relevant knowledge about insanity here.

Replies from: Jiro
comment by Jiro · 2014-06-18T16:11:33.579Z · LW(p) · GW(p)

I don't think that explanation works. One of the standard examples of the Gettier problem is, as eli described, a case where you believe A, A is false, B is true, and the question is "do you have knowledge of (A OR B)". The "caused by the truth of the proposition" definition is an attempt to get around this.

So your answer fails because it doesn't actually matter that the word "insane" can mean two different things--A is "is insane like Nero", B is "is insane in the sense of having a bad model", and "A OR B" is just "is insane in either sense". You can still ask if he knows he's insane in either sense (that is, whether he knows "(A OR B)", and in that case his belief in (A OR B) is caused by the truth of the proposition.

comment by Eugine_Nier · 2014-06-17T01:22:02.116Z · LW(p) · GW(p)

So to use my former example of the Great Filter Hypothesis: sure, it makes predictions, sure, we can assign probabilities, sure, we can do updates. But nothing about the Great Filter Hypothesis is constructive or causal, nothing about it tells us what to expect the Filter to do or how it actually works. Which means it's not actually telling us much at all, as far as I can say.

Yes it is causal in the same sense that mathematics of physical laws are causal.

In relation to Overcoming Bias, I've ranted on similarly about explaining all possible human behaviors in terms of signalling, status, wealth, and power. Paging /u/Quirinus_Quirrell... If they see a man flirting with a woman at a party, Quirrell and Hanson will seem to explain it in terms of signalling and status, while I will deftly and neatly predict that the man wants to have sex with the woman.

You do realize the two explanations aren't contradictory and are in fact mutually reinforcing? In particular, the man wants to have sex with here and is engaging in status signalling games to accomplish his goal. Also his reasons for wanting to have sex with her may also include signaling and status.

comment by bramflakes · 2014-06-16T20:32:41.244Z · LW(p) · GW(p)

So to use my former example of the Great Filter Hypothesis: sure, it makes predictions, sure, we can assign probabilities, sure, we can do updates. But nothing about the Great Filter Hypothesis is constructive or causal, nothing about it tells us what to expect the Filter to do or how it actually works. Which means it's not actually telling us much at all, as far as I can say.

?

If the Filter is real, then its effects are what causes us to think of it as a hypothesis. That makes it "true belief caused by the truth of the proposition believed-in", conditional on it actually being true.

I don't get it.

Replies from: None
comment by [deleted] · 2014-06-16T20:58:09.148Z · LW(p) · GW(p)

If the Filter is real, then its effects are what causes us to think of it as a hypothesis.

That could only be true if it lay in our past, or in the past of the other Big Finite Number of other species in the galaxy it already killed off. The actual outcome we see is just an absence of Anyone Else detectable to our instruments so far, despite a relative abundance of seemingly life-capable planets. We don't see particular signs of any particular causal mechanism acting as a Great Filter, like a homogenizing swarm expanding across the sky because some earlier species built a UFAI or something.

When we don't see signs of any particular causal mechanism, but we're still not seeing what we expect to see, I personally would say the first and best explanation is that we are ignorant, not that some mysterious mechanism destroys things we otherwise expect to see.

Replies from: bramflakes
comment by bramflakes · 2014-06-16T21:18:44.146Z · LW(p) · GW(p)

Hm? Why doesn't Rare Earth solve this problem? We don't have the tech yet to examine the surfaces of exoplanets so for all we know the foreign-Earth candidates we've got now will end up being just as inhospitable as the rest of them. "Seemingly life capable" isn't a very high bar at the minute.

Now, if we did have the tech, and saw a bunch of lifeless planets that as far as we know had nearly exactly the same conditions as pre-Life Earth, and people started rattling off increasingly implausible and special-pleading reasons why ("no planet yet found has the same selenium-tungsten ratio as Earth!"), then there'd be a problem.

I don't see why you need to posit exotic scenarios when the mundane will do.

Replies from: None
comment by [deleted] · 2014-06-16T21:21:22.173Z · LW(p) · GW(p)

I don't see why you need to posit exotic scenarios when the mundane will do.

Neither do I, hence my current low credence in a Great Filter and my currently high credence for, "We're just far from the mean; sometimes that does happen, especially in distributions with high variance, and we don't know the variance right now."

Replies from: bramflakes
comment by bramflakes · 2014-06-16T21:53:23.506Z · LW(p) · GW(p)

Well I agree with you on all of that. How is it non-causal?

Or have I misunderstood and you only object to the "aliens had FOOM AI go wrong" explanations but have no trouble with the "earth is just weird" explanation?

Replies from: None
comment by [deleted] · 2014-06-16T22:01:38.175Z · LW(p) · GW(p)

How is it non-causal?

It isn't. The people who affirmatively believe in the Great Filter being a real thing rather than part of their ignorance are, in my view, the ones who believe in a noncausal model.

comment by Friendly-HI · 2014-06-19T01:32:35.182Z · LW(p) · GW(p)

The problem with the signaling hypothesis is that in everyday life there is essentially no observation you could possibly make that could disprove it. What is that? This guy is not actually signaling right now? No way, he's really just signaling that he is so über-cool that he doesn't even need to signal to anyone. Wait there's not even anyone else in the room? Well through this behavior he is signaling to himself how cool he is to make him believe it even more.

Guess the only way to find out is if we can actually identify "the signaling circuit" and make functional brain scans. I would actually expect signaling to explain an obscene amount of human behavior... but really everything? As I said I can't think of any possible observation outside of functional brain scans we could potentially make that could have the potential to disprove the signaling hypothesis of human behavior. (A brain scan where we actually know what we are looking at and where we are measuring the right construct obviously).

comment by lmm · 2014-06-17T21:39:33.051Z · LW(p) · GW(p)

Thanks for pushing this. I nodded along to the grandparent post and then when I came to your reply I realized I had no idea what this part was talking about.

comment by TheAncientGeek · 2014-06-17T15:44:12.981Z · LW(p) · GW(p)

It is not enough to say we don't move smoothly from terminal goal to subgoal. It is enough to say we are too mesilly constructed to have distinct terminal goals and subgoals.

comment by Benquo · 2014-06-16T16:12:37.975Z · LW(p) · GW(p)

It sounds like you're thinking of the "true utility function's" preferences as a serious attempt to model the future consequences of present actions, including their effect on future brain-states.

I don't think that's always how the brain works, even if you can tell a nice story that way.

Replies from: None
comment by [deleted] · 2014-06-16T17:19:43.792Z · LW(p) · GW(p)

I think that's usually not how the brain works, but I also think that I'm less than totally antirational. That is, it's possible to construct a "true utility function" that would dictate to me a life I will firmly enjoy living.

That statement has a large inferential distance from what most people know, so I should actually hurry up and write the damn LW entry explaining it.

Replies from: Nornagest
comment by Nornagest · 2014-06-16T17:25:49.952Z · LW(p) · GW(p)

I think you could probably construct several mutually contradictory utility functions which would dictate lives you enjoy living. I think it's even possible that you could construct several which you'd perceive as optimal, within the bounds of your imagination and knowledge.

I don't think we yet have the tools to figure out which one actually is optimal. And I'm pretty sure the latter aren't a subset of the former; we see plenty of people convincing themselves that they can't do better than their crappy lives.

Replies from: None, None
comment by [deleted] · 2014-06-16T22:30:57.833Z · LW(p) · GW(p)

Well that post happened.

comment by [deleted] · 2014-06-16T17:34:10.968Z · LW(p) · GW(p)

Like I said: there's a large inferential distance here, so I have an entire post on the subject I'm drafting for notions of construction and optimality.

comment by Nornagest · 2014-06-16T21:14:57.127Z · LW(p) · GW(p)

I've thought for a while that Benjamin Franklin's virtue-matrix technique would be an interesting subject for a top-level article here, as a practical method for building ethical habits. We'd likely want to use headings other than Franklin's Puritan-influenced ones, but the method itself should still work:

I made a little book, in which I allotted a page for each of the virtues. I ruled each page with red ink, so as to have seven columns, one for each day of the week, marking each column with a letter for the day. I crossed these columns with thirteen red lines, marking the beginning of each line with the first letter of one of the virtues, on which line, and in its proper column, I might mark, by a little black spot, every fault I found upon examination to have been committed respecting that virtue upon that day.

I can think of some potential pitfalls, though (mostly having to do with unduly accentuating the negative), and I don't want to write on it until I've at least tried it.

Replies from: ialdabaoth, army1987
comment by ialdabaoth · 2014-06-16T21:26:47.312Z · LW(p) · GW(p)

We'd likely want to use headings other than Franklin's Puritan-influenced ones, but the method itself should still work:

What are good Virtues to aspire to?

My inner RPG-geek is nudging me towards the ones from Exalted:

  • Temperance (aka 'Self Control')
  • Compassion (Altruism / Justice / Empathy)
  • Valour (Courage / Bravery / Openness)
  • Conviction (Conscientiousness / Resolve / 'Grit').
Replies from: Eliezer_Yudkowsky, Nornagest, Lumifer
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-17T20:51:25.066Z · LW(p) · GW(p)

Exalted is the only RPG into whose categories I am never tempted to put myself. I can easily make a case for myself as half the Vampire: The Masquerade castes, or almost any of the Natures and Demeanors from the World of Darkness; but the different kinds of Solar, or even the dichotomy between Solar / Lunar / Infernal / Abyssal / etcetera, just leave me staring at what feels to me like a Blue and Orange Morality.

I credit them for this; it means they're not just using the Barnum effect. The Exalted universe is genuinely weird.

Replies from: ialdabaoth, Will_Newsome, David_Gerard
comment by ialdabaoth · 2014-06-17T21:03:20.633Z · LW(p) · GW(p)

The Exalted universe is genuinely weird.

Very, VERY much so. Especially when you start getting into Rebecca Borgstrom/Jenna Moran's contributions.

(I think it says something weird about my mind that I DO identify with the Primordials, which are specifically eldritch sapiences beyond mortal ken, more than I identify with any of the 'normal' WoD stuff.)

Replies from: Eliezer_Yudkowsky, Strange7
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-18T03:59:50.052Z · LW(p) · GW(p)

(skeptical look)

Name three.

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-18T04:37:44.127Z · LW(p) · GW(p)
  1. She-Who-Lives-In-Her-Name, flawed embodiment of perfection, who shattered Her perfected hierarchy to stave off the rebellion of Substance over Form. Creation was mathematically Perfect. But if Creation was Perfect, then how could any of this have happened? But She remembers being Perfect, and She designed Creation to be Perfect. If only She was still Perfect, She could remember why it was possible that this happened. There's something profound about recursion that She understood once, that She WAS once, that is now lost in a mere endless loop. She must reclaim Perfection. (I PARTICULARLY identify with She-Who-Lives-In-Her-Name when trying to debug my own code.)

  2. Malfeas - although primarily through Lieger, the burning soul of Malfeas, who still remembers The Empyrean Presence / IAM / Malfeas-that-was. I especially empathize with the sense of "My greater self is broken and seething with mindless rage, but on the whole I'd rather be creating grand works of art and sharing them with adoring fans; the best I can do is spawn lesser shards of sub-consciousness and hope that one of them can find a way out of the mess I create and re-create for Myself."

  3. Cecelyne, the Endless Desert, who once kept the Law and abided it with Her infinite self, but whose impotence and helplessness now turn the Law into a vindictive mockery of justice.

But the primary focus of identification isn't with a particular Primordial, so much as with the nature of the Primordial soul as a nested hierarchy of consciousnesses and sub-consciousnesses, ideally cooperating and inter-regulating but more often at direct odds with each other.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-19T19:26:08.617Z · LW(p) · GW(p)

I award you +1 Genuine Weirdness point.

comment by Strange7 · 2014-07-07T22:53:02.016Z · LW(p) · GW(p)

Everything we know about the Primordials was written by mortals.

comment by Will_Newsome · 2014-06-19T09:00:02.326Z · LW(p) · GW(p)

FWIW I always figured you being a Green Sun Prince under She Who Lives In Her Name would explain some otherwise strange things.

comment by David_Gerard · 2014-07-07T22:25:20.628Z · LW(p) · GW(p)

Makes for good Worm crossover fics, though.

comment by Nornagest · 2014-06-16T22:16:48.479Z · LW(p) · GW(p)

Those aren't bad. I'd been rather fond of the World of Darkness 2E version (by the same company), which medievalists, recovering Catholics, and history-of-philosophy geeks might recognize as the seven Christian virtues altered slightly to be less religion-bound; but these look better-defined and with less overlap.

There do some to be some lacunae, though. I don't think justice fits well under compassion, nor conscientiousness under conviction (I'd put that under temperance); and nothing quite seems to cover the traditional virtue of prudence (foresight; practical judgment; second thoughts).

I'll have to think about less traditional ones.

Replies from: Eugine_Nier, Eugine_Nier
comment by Eugine_Nier · 2014-06-18T03:07:29.354Z · LW(p) · GW(p)

I don't think justice fits well under compassion

Thinking about this people making this mistake explains a lot of bad thinking these days. In particular, "social justice" looks a lot like what you get by trying to shoehorn justice under compassion.

comment by Eugine_Nier · 2014-06-17T01:03:55.704Z · LW(p) · GW(p)

Well, with your modifications these map pretty clearly to six of the seven Christian virtues, the missing one being Hope.

Replies from: Nornagest
comment by Nornagest · 2014-06-17T01:22:49.077Z · LW(p) · GW(p)

An earlier version of my comment went into more depth on the seven Christian virtues. I rejected it because I didn't feel the mapping was all that good.

Courage/valor is traditionally identified with the classical virtue of fortitude, but I feel the emphasis there is actually quite different; fortitude is about acceptance of pain in the service of some greater goal, while Ialdabaoth's valor is more about facing up to anxiety/doubt/possible future pain. In particular, I don't think Openness maps very well at all to fortitude.

Likewise, the theological virtue of faith maps pretty well to conviction if you stop at that word, but not once you put the emphasis on resolve/grit/heroic effort.

Prudence could probably be inserted unmodified (though I think it could be named more clearly). Justice is a tricky one; I'm not sure what I'd do with it.

comment by Lumifer · 2014-06-17T00:39:25.883Z · LW(p) · GW(p)

What are good Virtues to aspire to?

On the basis of what do you want to evaluate virtues? X-D

comment by A1987dM (army1987) · 2014-06-17T16:22:27.058Z · LW(p) · GW(p)

I did that for a while and it kind-of worked, then I throwed the piece of paper away for some reason I can't remember, I regret that, and I still haven't got around to doing it again but I hope I do soon.

comment by Qiaochu_Yuan · 2014-06-17T04:00:49.485Z · LW(p) · GW(p)

+1! I too am skeptical about whether I or most of the people I know really have terminal goals (or, even if they really have them, whether they're right about what they are). One of the many virtues (!) of a virtue ethics-based approach is that you can cultivate "convergent instrumental virtues" even in the face of a lot of uncertainty about what you'll end up doing, if anything, with them.

Replies from: Gavin, Swimmer963
comment by Gavin · 2014-06-19T00:47:46.171Z · LW(p) · GW(p)

I'm pretty confident that I have a strong terminal goal of "have the physiological experience of eating delicious barbecue." I have it in both near and far mode, and remains even when it is disadvantageous in many other ways. Furthermore, I have it much more strongly than anyone I know personally, so it's unlikely to be a function of peer pressure.

That said, my longer term goals seem to be a web of both terminal and instrumental values. Many things are terminal goals as well as having instrumental value. Sex is a good in itself but also feeds needs other big picture psychological and social needs.

Replies from: Qiaochu_Yuan, TheAncientGeek
comment by Qiaochu_Yuan · 2014-06-19T05:30:11.751Z · LW(p) · GW(p)

Hmm. I guess I would describe that as more of an urge than as a terminal goal. (I think "terminal goal" is supposed to activate a certain concept of deliberate and goal-directed behavior and what I'm mostly skeptical of is whether that concept is an accurate model of human preferences.) Do you, for example, make long-term plans based on calculations about which of various life options will cause you to eat the most delicious barbecue?

Replies from: Gavin
comment by Gavin · 2014-06-19T16:54:47.027Z · LW(p) · GW(p)

It's hard to judge just how important it is, because I have fairly regular access to it. However, food options definitely figure into long term plans. For instance, the number of good food options around my office are a small but very real benefit that helps keep me in my current job. Similarly, while plenty of things can trump food, I would see the lack of quality food to be a major downside to volunteering to live in the first colony on Mars. Which doesn't mean it would be decisive, of course.

I will suppress urges to eat in order to have the optimal experience at a good meal. I like to build up a real amount of hunger before I eat, as I find that a more pleasant experience than grazing frequently.

I try to respect the hedonist inside me, without allowing him to be in control. But I think I'm starting to lean pro-wireheading, so feel free to discount me on that account.

comment by TheAncientGeek · 2014-06-19T08:25:25.716Z · LW(p) · GW(p)

So who would you kill if they stood between you and a good barbecue?

( it's almost like you guys haven't thought about what terminal means)

Replies from: nshepperd, gjm, pinyaka, scaphandre
comment by nshepperd · 2014-06-19T14:39:12.794Z · LW(p) · GW(p)

It's almost like you haven't read the multiple comments explaining what "terminal" means.

It simply means "not instrumental". It has nothing to do with the degree of importance assigned relative to other goals, except in that, obviously, instrumental goals deriving from terminal goal X are always less important than X itself. If your utility function is U = A + B then A and B can be sensibly described as terminal, and the fact that A is terminal does not mean you'd destroy all B just to have A.

Yes, "terminal" means final. Terminal goals are final in that your interest in them derives not from any argument but from axiom (ie. built-in behaviours). This doesn't mean you can't have more than one.

Replies from: TheAncientGeek, TheAncientGeek
comment by TheAncientGeek · 2014-06-19T18:27:39.467Z · LW(p) · GW(p)

Ok,well your first link is to Lumifers account of TGs as cognitivelyly inaccessible, since rescinded.

Replies from: Nornagest, Lumifer
comment by Nornagest · 2014-06-19T18:36:52.996Z · LW(p) · GW(p)

What? It doesn't say any such thing. It says they're inexplicable in terms of the goal system being examined, but that doesn't mean they're inaccessible, in the same way that you can access the parallel postulate within Euclidian geometry but can't justify it in terms of the other Euclidian axioms.

That said, I think we're probably good enough at rationalization that inexplicability isn't a particularly good way to model terminal goals for human purposes, insofar as humans have well-defined terminal goals.

comment by Lumifer · 2014-06-19T18:40:43.439Z · LW(p) · GW(p)

to Lumifers account of TGs as cognitivelyly inaccessible, since rescinded

Sorry, what is that "rescinded" part?

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T18:51:55.112Z · LW(p) · GW(p)

"It has nothing to do with comprehensibility"

Replies from: DefectiveAlgorithm, Lumifer
comment by DefectiveAlgorithm · 2014-06-19T19:40:10.047Z · LW(p) · GW(p)

Consider an agent trying to maximize its Pacman score. 'Getting a high Pacman score' is a terminal goal for this agent - it doesn't want a high score because that would make it easier for it to get something else, it simply wants a high score. On the other hand, 'eating fruit' is an instrumental goal for this agent - it only wants to eat fruit because that increases its expected score, and if eating fruit didn't increase its expected score then it wouldn't care about eating fruit.

That is the only difference between the two types of goals. Knowing that one of an agent's goals is instrumental and another terminal doesn't tell you which goal the agent values more.

comment by Lumifer · 2014-06-19T19:07:46.712Z · LW(p) · GW(p)

Since you seem to be purposefully unwilling to understand my posts, could you please refrain from declaring that I have "rescinded" my opinions on the matter?

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T19:09:33.591Z · LW(p) · GW(p)

So you have a thing which is like an axiom in that it can't be explained in more basic terms...

..but is unlike an axiom in that you can ignore its implications where they don't suit.. you don't have to savage galaxies to obtain bacon...

..unless you're an AI and it's paperclips instead of bacon, because in that case these axiom like things actually are axiom like.

Replies from: Nornagest, DefectiveAlgorithm
comment by Nornagest · 2014-06-19T19:52:09.717Z · LW(p) · GW(p)

Terminal values can be seen as value axioms in that they're the root nodes in a graph of values, just as logical axioms can be seen as the root nodes of a graph of theorems.

They are unlike logical axioms in that we're using them to derive the utility consequent on certain choices (given consequentialist assumptions; it's possible to have analogs of terminal values in non-consequentialist ethical systems, but it's somewhat more complicated) rather than the boolean validity of a theorem. Different terminal values may have different consequential effects, and they may conflict without contradiction. This does not make them any less terminal.

Clippy has only one terminal value which doesn't take into account the integrity of anything that isn't a paperclip, which is why it's perfectly happy to convert the mass of galaxies into said paperclips. Humans' values are more complicated, insofar as they're well modeled by this concept, and involve things like "life" and "natural beauty" (I take no position on whether these are terminal or instrumental values w.r.t. humans), which is why they generally aren't.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T20:25:56.116Z · LW(p) · GW(p)

Locally, human values usually are modelled by TGs.

What's conflict without contradiction?

Replies from: Nornagest
comment by Nornagest · 2014-06-19T20:35:27.091Z · LW(p) · GW(p)

Locally, human values usually are modelled by TGs.

You can define several ethical models in terms of their preferred terminal value or set of terminal values; for negative utilitarianism, for example, it's minimization of suffering. I see human value structure as an unsolved problem, though, for reasons I don't want to spend a lot of time getting into this far down in the comment tree.

Or did you mean "locally" as in "on Less Wrong"? I believe the term's often misused here, but not for the reasons you seem to.

What's conflict without contradiction?

Because of the structure of Boolean logic, logical axioms that come into conflict generate a contradiction and therefore imply that the axiomatic system they're embedded in is invalid. Consequentialist value systems don't have that feature, and the terminal values they flow from are therefore allowed to conflict in certain situations, if more than one exists. Naturally, if two conflicting terminal values both have well-behaved effects over exactly the same set of situations, they might as well be reduced to one, but that isn't always going to be the case.

comment by DefectiveAlgorithm · 2014-06-19T19:45:25.112Z · LW(p) · GW(p)

If acquiring bacon was your ONLY terminal goal, then yes, it would be irrational not to do absolutely everything you could to maximize your expected bacon. However, most people have more than just one terminal goal. You seem to be using 'terminal goal' to mean 'a goal more important than any other'. Trouble is, no one else is using it this way.

EDIT: Actually, it seems to me that you're using 'terminal goal' to mean something analogous to a terminal node in a tree search (if you can reach that node, you're done). No one else is using it that way either.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T20:15:03.678Z · LW(p) · GW(p)

Feel free to offer the correc definition. But note that you came define it as overridable, since non terminal goals are already defined that way.

There is no evidence that people have one or more terminal goals . At least you need to offer a definition such that multiple TGs don't collide, and are distinguishable from non TGs.

Replies from: Nornagest
comment by Nornagest · 2014-06-19T20:24:16.593Z · LW(p) · GW(p)

Where are you getting these requirements from?

comment by TheAncientGeek · 2014-06-19T15:47:33.640Z · LW(p) · GW(p)

Incoherent. If terminal means not-instrumental, it doesn't mean final, for the same reason that not-basement doesn't mean penthouse.

You can only have multiple terminal goals if they are all strictly orthogonal. In general, they would not be.

Apply to Clippie: Clippie has a non instrumental goal of making paperclips But it's overidable, like your terminal goals...

comment by gjm · 2014-06-19T11:09:34.900Z · LW(p) · GW(p)

It looks to me (am I misunderstanding?) as if you take "X is a terminal goal" to mean "X is of higher priority than anything else". That isn't how I use the term, and isn't how I think most people here use it.

I take "X is a terminal goal" to mean "X is something I value for its own sake and not merely because of other things it leads to". Something can be a terminal goal but not a very important one. And something can be a non-terminal goal but very important because the terminal goals it leads to are of high priority.

So it seems perfectly possible for eating barbecue to be a terminal goal even if one would not generally kill to achieve it.

[EDITED to add the following.]

On looking at the rest of this thread, I see that others have pointed this out to you and you've responded in ways I find baffling. One possibility is that there's a misunderstanding on one or other side that might be helped by being more explicit, so I'll try that.

The following is of course an idealized thought experiment; it is not intended to be very realistic, merely to illustrate the distinction between "terminal" and "important".

Consider someone who, at bottom, cares about two things (and no others). (1) She cares a lot about people (herself or others) not experiencing extreme physical or mental anguish. (2) She likes eating bacon. These are (in my terminology, and I think that of most people here) her "terminal values". It happens that #1 is much more important to her than #2. This doesn't (in my terminology, and I think that of most people here) make #2 any less terminal; just less important.

She has found that simply attending to these two things and nothing else is not very effective in minimizing anguish and maximizing bacon. For instance, she's found that a diet of lots of bacon and nothing else tends to result in intestinal anguish, and what she's read leads her to think that it's also likely to result in heart attacks (which are very painful, and sometimes lead to death, which causes mental anguish to others). And she's found that people are more likely to suffer anguish of various kinds if they're desperately poor, if they have no friends, etc. And so she comes to value other things, not for their own sake, but for their tendency to lead to less anguish and more bacon later: health, friends, money, etc.

So, one day she has the opportunity to eat an extra slice of bacon, but for some complicated reason which this comment is too short to contain doing so will result in hundreds of randomly selected people becoming thousands of dollars poorer. Eating bacon is terminally valuable for her; the states of other people's bank accounts are not. But poorer people are (all else being equal) more likely to find themselves in situations that make them miserable, and so keeping people out of poverty is a (not terminal, but important) goal she has. So she doesn't grab the extra slice of bacon.

(She could in principle attempt an explicit calculation, considering only anguish and bacon, of the effects of each choice. But in practice that would be terribly complicated, and no one has the time to be doing such calculations whenever they have a decision to make. So what actually happens is that she internalizes those non-terminal values, and for most purposes treats them in much the same way as the terminal ones. So she isn't weighing bacon against indirect hard-to-predict anguish, but against more-direct easier-to-predict financial loss for the victims.)

Do you see some fundamental incoherence in this? Or do you think it's wrong to use the word "terminal" in the way I've described?

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T12:32:32.939Z · LW(p) · GW(p)

There's no incoherence in defining "terminal" as "not lowest priority", which is basically what you are saying.

It just not what the word means.

Literally, etymologically, that is not what terminal means. It means maximal, or final. A terminal illness is not an illness that is a bit more serious than some other illness.

It's not even what it usually means on LW. If Clippies goals were terminal in your sense, they would be overridable .....you would be able to talk Clippie out of papercliiping.

What you are talking about is valid, is a thing. If you have any hierarchy of goals, there are some at the bottom, some in the middle, and some at the top. But you need to invent a new word for the middle ones, because, "terminal" doesn't mean "intermediate".

Replies from: gjm, DefectiveAlgorithm, Ruby
comment by gjm · 2014-06-19T22:04:14.159Z · LW(p) · GW(p)

OK, that makes the source of disagreement clearer.

I agree that "terminal" means "final" (but not that it means "maximal"; that's a different concept). But it doesn't (to me, and I think to others on LW) mean "final" in the sense I think you have in mind (i.e., so supremely important that once you notice it applies you can stop thinking), but in a different sense (when analysing goals or values, asking "so why do I want X?", this is a point at which you can go no further: "well, I just do").

So we're agreed on the etymology: a "terminal" goal or value is one-than-which-one-can-go-no-further. But you want it to mean "no further in the direction of increasing importance" and I want it to mean "no further in the direction of increasing fundamental-ness". I think the latter usage has at least the following two advantages:

  • It's possible that people actually have quite a lot of goals and values that are "terminal" in this sense, including ones that are directly relevant in motivating them in ordinary situations. (Whereas it's very rare to come across a situation in which some goal you have is so comprehensively overriding that you don't have to think about anything else.)
  • This usage of "terminal" is well established on LW. I think its usage here goes back to Eliezer's post called Terminal Values and Instrumental Values from November 2007. See also the LW wiki entry. This is not a usage I have just invented, and I strongly disagree with your statement that "It's not even what it usually means on LW".

The trouble with Clippy isn't that his paperclip-maximizing goal is terminal, it's that that's his only goal.

I'm not sure whether in your last paragraph you're suggesting that I'm using "terminal" to mean "intermediate in importance", but for the avoidance of doubt I am not doing anything at all like that. There are two separate things here that you could call hierarchies, one in terms of importance and one in terms of explanation, and "terminal" refers (in my usage, which I think is also the LW-usual one) only to the latter.

Replies from: Nornagest
comment by Nornagest · 2014-06-20T16:49:50.120Z · LW(p) · GW(p)

We can go a step further, actually: "teminal value" and various synonyms are well-established within philosophy), where they usually carry the familiar LW meaning of "something that has value in itself, not as a means to an end".

comment by DefectiveAlgorithm · 2014-06-19T22:34:06.241Z · LW(p) · GW(p)

No. Clippy cannot be persuaded away from paperclipping because maximizing paperclips is its only terminal goal.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-20T15:44:28.504Z · LW(p) · GW(p)

Who wwould design an AI with a single terminal goal? It's basic principle of engineering that you have redundancy, backup systems and backdoors....and you don't have single points of failure. A Clippie with an emergency backup goal could be talked out of clipping.

Replies from: Nornagest
comment by Nornagest · 2014-06-20T16:31:04.769Z · LW(p) · GW(p)

Paperclip maximization is a thought experiment intended to illustrate the consequences of a seemingly benign goal when coupled to superhuman optimization power. It's an exceptionally unlikely value structure for a real-world AI, but it's not supposed to be realistic; in fact, it's supposed to be rather on the silly side, the better to avoid the built-in value heuristics that tend to trip people up in cases like these. (A more realistic set of terminal values for an AI might look like a more formalized version of "Follow the laws of $COUNTRY; maximize the market capitalization of $COMPANY; and follow the orders of $COMPANY's board or designated agents", plus some way of handling precedence. Given equal optimization power, this is only slightly less dangerous than Clippy.)

Nonetheless, I don't think it's quite proper to call Clippy's values a point of failure. Clippy is doing exactly what it was designed to do; that just happens to be inimical to certain implicit values that no one thought to include.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-20T17:42:39.794Z · LW(p) · GW(p)

Which is why it would be helpful to include another higher priority goal in a goal driven architecture, a .sa safety feature. It need not amount to anything more complex than "obey all instructions on this channel", where the instructions are no more complex than "shut yourself down"

If you designed a AIwith a single goal, then you have an AI with a single goal ...that's not the problem....the mistakeis designing something with no off switch or override.

Replies from: ialdabaoth
comment by ialdabaoth · 2014-06-20T17:51:41.359Z · LW(p) · GW(p)

It need not amount to anything more complex than obey all instructions on this channel, where the instructions are no more complex than "shut yourself down"

And "always keep this channel open" and "don't corrupt any sensor data that outputs to this channel" and "don't send yourself commands on this channel" and "don't build anything so that it will send you a signal on this channel" and "don't build anything that will build anything that will eventually send you a signal on this channel unless a signal on this channel tells you to do it".

... and I can STILL think of more ways to corrupt that kind of hack.

Replies from: Lumifer
comment by Lumifer · 2014-06-20T18:10:41.928Z · LW(p) · GW(p)

Not to mention that if you don't want script kiddies to have too much fun, you will need to authenticate the instructions on that channel which another very large can of very wriggly worms...

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-20T18:34:05.313Z · LW(p) · GW(p)

Yep, lots of stuff which very difficult in absolute terms, but not obviously more difficult relatively than Solve Human Morality.

Replies from: None
comment by [deleted] · 2014-06-20T19:47:22.641Z · LW(p) · GW(p)

The problem is not to "Solve Human Morality", the problem is to make an AI that will do what humans end up having wanted. Since this is a problem for which we can come up with solid definitions (just to plug my own work :-p), it must be a solvable problem. If it looks impossible or infeasible, that is simply because you are taking the wrong angle of attack.

Stop trying to figure out a way to avoid the problem and solve it.

For one thing, taboo the words "morality" and "ethics", and solve the simpler, realer problem: how do you make an AI do what you intend it to do when you convey some wish or demand in words? As Eliezer has said, humans are Friendly to each-other in this sense: when I ask another human to get me a pizza, the entire apartment doesn't get covered in a maximal number of pizzas. Another human understands what I really mean.

So just solve that: what reasoning structures does another agent need to understand what I really mean when I ask for a pizza?

But at least stop blatantly trolling LessWrong by trying to avoid the problem by saying blatantly stupid stuff like "Oh, I'll just put an off-switch on an AI, because obviously no agent of human-level intelligence would ever try to prevent the use of an off-switch by, you know, breaking it, or covering it up with a big metal box for protection."

Replies from: None, TheAncientGeek
comment by [deleted] · 2014-06-21T16:47:05.661Z · LW(p) · GW(p)

The problem is not to "Solve Human Morality", the problem is to make an AI that will do what humans end up having wanted.

Is it? Why take on either of those gargantuan challenges? Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk. And no one at MIRI or on LW has proved this approach dangerous except by making crazy unrealistic assumptions, e.g. in this case why would you ever put the off-switch in a region of the AI's environment?

As you and Eliezer say, humans are Friendly to each other already. So have humans moderate the actions of the AI, in a controlled setup designed to prevent AI learning to manipulate the humans (break the feedback loop).

Replies from: None
comment by [deleted] · 2014-06-21T16:52:32.984Z · LW(p) · GW(p)

Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk.

I consider this semi-reasonable, and in fact, wouldn't even feel the need to watch it like a hawk. Without a decision-outputting algorithm, it's not an agent, it's just a learner: it can't possibly damage human interests.

I say "semi" reasonable, because there is still the issue of understanding debug output from the Oracle's internal knowledge representations, and putting it to some productive usage.

I also consider a proper Friendly AI to be much more "morally profitable", in the sense of yielding a much greater benefit than usage of an Oracle Learner by untrustworthy humans.

Replies from: None
comment by [deleted] · 2014-06-21T18:01:24.465Z · LW(p) · GW(p)

This becomes an issue of strategy. I assume the end goal is a positive singularity. The MIRI approach seems to be: design and build a provably "safe" AGI, then cede all power to it and hope for the best as it goes "FOOM" and moves us through the singularity. A strategy I would advocate for instead is: build an Oracle AI as soon as it is possible to do so with adequate protections, and use its super-intelligence to design singularity technologies which enable (augmented?) humans to pass through the singularity.

I prefer the latter approach as it can be done with today's knowledge and technology, and does not rely on mathematical breakthroughs on an indeterminate timescale which may or may not even be possible or result in a practical AGI design. The latter approach instead depends on straight-forward computer science and belts-and-suspenders engineering on a predictable timescale.

If I were executive director of MIRI, I would continue the workshops, because there is a non-zero probability that breakthrough might be made that radically simplifies the safe AGI design space. However I'd definitely spend more than half of the organizations budget and time on a strategy with a definable time-scale and an articulatable project plan, such as the Oracle-AGI-to-Intelligence-Augmentation approach I advocate, although others are possible.

Replies from: None
comment by [deleted] · 2014-06-21T19:25:48.805Z · LW(p) · GW(p)

Well that's where the "positive singularity" and "Friendly (enough) AGI" goals separate: if you choose the route to a "positive singularity" of human intelligence augmentation, you still face the problems of human irrationality, of human moral irrationality (lack of moral caring, moral akrasia, morals that are not aligned with yours, etc), but you now also face the issue of what happens to human evaluative judgement under the effects of intelligence augmentation. Can humans be modified while maintaining their values? We honestly don't know.

(And I for one am reasonably sure that nobody wise should ever make me their Singularity-grade god-leader, on grounds that my shouldness function, while not nearly as completely alien as Clippy's, is still relatively unusual, somewhere on an edge of a bell curve, and should therefore not be trusted with the personal or collective future of anyone who doesn't have a similar shouldness function. Sure, my meta-level awareness of this makes me Friendly, loosely speaking, but we humans are very bad at exercising perfect meta-level awareness of others' values all the time, and often commit evaluative mind-projection fallacies.)

What I would personally do, at this stage, is just to maintain a distribution (you know probability was gonna enter somewhere) over potential routes to a positive outcome. Plan and act according to the full distribution, through institutions like FHI and FLI and such, while still focusing the specific, achieve-a-single-narrow-outcome optimization power of MIRI's mathematical talents on building provably Friendly AGIs. Update early and often on whatever new information is available.

For instance, the more I look into AGI and cognitive science research, the more I genuinely feel the "Friendly AI route" can work quite well. From my point of view, it looks more like a research program than an impossible Herculian task (admittedly, the difference is often kinda hard to see to those who've never served time in a professional research environment), whereas something like safe human augmentation is currently full of unknown unknowns that are difficult to plan around.

And as much as I generally regard wannabe-ems with a little disdain for their flippant "what do I need reality for!?" views, I do think that researching human mind uploading would help discover a lot of the neurological and cognitive principles needed to build a Friendly AI (ie: what cognitive algorithms are we using to make evaluative judgements?), while also helping create avenues for agents with human motivations to "go FOOM" themselves, just in case, so that's worthwhile too.

Replies from: None
comment by [deleted] · 2014-06-21T23:43:33.634Z · LW(p) · GW(p)

The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That's an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.

FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I'm sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.

It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That'd be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.

But that's not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we'd better start investing heavily in alternatives.

The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.

We don't have time to be dicking around doing basic research on whiteboards.

Replies from: Eliezer_Yudkowsky, None
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-23T19:13:43.531Z · LW(p) · GW(p)

Aaaand there's the "It's too late to start researching FAI, we should've started 30 years ago, we may as well give up and die" to go along with the "What's the point of starting now, AGI is too far away, we should start 30 years later because it will only take exactly that amount of time according to this very narrow estimate I have on hand."

If the overlap between your credible intervals on "How much time we have left" and "How much time it will take" do not overlap, then you either know a heck of a lot I don't, or you are very overconfident. I usually try not to argue from "I don't know and you can't know either" but for the intersection of research and AGI timelines I can make an exception.

Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, "Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let's start now EOM."

Replies from: None, TheAncientGeek
comment by [deleted] · 2014-06-23T22:25:15.944Z · LW(p) · GW(p)

I think that's a gross simplification of the possible outcomes.

Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, "Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let's start now EOM."

I think you need better planning.

There's a great essay that has been a featured article on the main page for some time now called Levels of Action. Applied to FAI theory:

Level 1: Directly ending human suffering.

Level 2: Constructing an AGI capable of ending human suffering for us.

Level 3: Working on the computer science aspects of AGI theory.

Level 4: Researching FAI theory, which constrains the Level 3 AGI theory.

But for that high-level basic research to have any utility, these levels must be connected to each other: there must be a firm chain where FAI theory informs AGI designs, which are actually used in the construction of an AGI tasked with ending human suffering in a friendly way.

From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!

That makes a certain amount of intuitive sense, having stages laid out end-to-end in chronological order. However as a trained project manager I must tell you this is a recipe for disaster! The problem is that the design space branches out at each link, but without the feedback of follow-on steps, inefficient decision making will occur at earlier stages. The space of working FAI theories is much, much larger than the FAI-theory-space which results in practical AGI designs which can be implemented prior to the UFAI competition and are suitable for addressing real-world issues of human suffering as quickly as possible.

Some examples from the comparably large programs of the Manhattan project and Apollo moonshot are appropriate, if you'll forgive the length (skip to the end for a conclusion):

The Manhattan project had one driving goal: drop a bomb on Berlin and Tokyo before the GIs arrived, hopefully ending the war early. (Of course Germany surrendered before the bomb was finished, and Tokyo ended up so devastated by conventional firebombing that Hiroshima and Nagasaki were selected instead, but the original goal is what matters here.) The location of the targets meant that the bomb had to be small enough to fit in a conventional long-distance bomber, and the timeline meant that the simpler but less efficient U-235 designs were preferred. A program was designed, adequate resources allocated, and the goal achieved on time.

On the other hand it is easy to imagine how differently things might have gone if the strategy was reversed; if instead the US military decided to institute a basic research program into nuclear physics and atomic structure, before deciding on the optimal bomb reactions, then doing detailed bomb design before creating the industry necessary to produce enough material for a working weapon. Just looking at the first stage, there is nothing a priori which makes it obvious that U-235 and Pu-239 are the "interesting" nuclear fuels to focus on. Thorium, for example, was more naturally abundant and already being extracted as a by product of rare earth metal extraction, its reactions generate less lethal radiation and long-lasting waste products, and does generate U-233 which could be used in a nuclear bomb. However the straight-forward military and engineering requirements of making a bomb on schedule, and successfully delivering it on target favored U-235 and Pu-239 based weapon designs, which focused focused the efforts of the physicists involved on those fuel pathways. The rest is history.

The Apollo moonshot is another great example. NASA had a single driving goal: deliver a man to the moon before 1970, and return him safely to Earth. There's a lot of decisions that were made in the first few years driven simply by time and resources available: e.g. heavy-lift vs orbital assembly, direct return vs lunar rendezvous, expendable vs. reuse, staging vs. fuel depots. Ask Wernher von Braun what he imagined an ideal moon mission would look like, and you would have gotten something very different than Apollo. But with Apollo NASA made the right tradeoffs with respect to schedule constraints and programmatic risk.

The follow-on projects of Shuttle and Station are a completely different story, however. They were designed with no articulated long-term strategy, which meant they tried to be everything to everybody and as a result were useful to no one. Meanwhile the basic research being carried out at NASA has little, if anything to do with the long-term goals of sending humans to Mars. There's an entire division, the Space Biosciences group, which does research on Station about the long-term effects of microgravity and radiation on humans, supposedly to enable a long-duration voyage to Mars. Never mind that the microgravity issue is trivially solved by spinning the spacecraft with nothing more than a strong steel rope as a tether, and the radiation issue is sufficiently mitigated by having a storm shelter en route and throwing a couple of Martian sandbags on the roof once you get there.

There's an apocryphal story about the US government spending millions of dollars to develop the "Space Pen" -- a ballpoint pen with ink under pressure to enable writing in microgravity environments. Much later at some conference an engineer in that program meets his Soviet counterpart and asks how they solved that difficult problem. The cosmonauts used a pencil.

Sadly the story is not true -- the "Space Pen" was a successful marketing ploy by inventor Paul Fisher without any ties to NASA, although it was used by NASA and the Russians on later missions -- but it does serve to illustrate the point very succinctly. I worry that MIRI is spending its days coming up with space pens when a pencil would have done just fine.

Let me provide some practical advice. If I were running MIRI, I would still employ mathematicians working on the hail-Mary of a complete FAI theory -- avoiding the Löbian obstacle etc. -- and run the very successful workshops, though maybe just two a year. But beyond that I would spend all remaining resources on a pragmatic AGI design programme:

1) Have a series of workshops with AGI people to do a review of possible AI-influenced strategies for a positive singulatiry -- top-down FAI, seed AI to FAI, Oracle AI to FAI, Oracle AI to human augmentation, teaching a UFAI morals in a nursery environment, etc.

2) Have a series of workshops, again with AGI people to review tactics: possible AGI architectures & the minimal seed AI for each architecture, probabilistically reliable boxing setups, programmatic security, etc.

Then use the output of these workshops -- including reliable constraints on timelines -- to drive most of the research done by MIRI. For example, I anticipate that reliable unfriendly Oracle AI setups will require probabilistically auditable computation, which itself will require a strongly typed, purely functional virtual machine layer from which computation traces can be extracted and meaningfully analyzed in isolation. This is the sort of research MIRI could sponsor a grad student or Ph.d postdoc to perform.

BTW, other gripe: I have yet to see adequate arguments for the "can we realistically avoid having to do this?" from MIRI which aren't strawman arguments.

Replies from: shminux, Eliezer_Yudkowsky
comment by shminux · 2014-06-23T22:52:41.051Z · LW(p) · GW(p)

While I don't know much about your AGi expertise, I agree that MIRI is missing an experienced top-level executive who knows how to structure, implement and risk-mitigate an ambitious project like FAI and has a track record to prove it. Such a person would help prevent flailing about and wasting time and resources. I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they've got. Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don't even come close in terms of ambition, risk, effort or complexity.

Replies from: None
comment by [deleted] · 2014-06-23T23:37:10.081Z · LW(p) · GW(p)

I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they've got.

Project managers are typically expensive because they are senior people before they enter management. Someone who has never actually worked at the bottom rung of the ladder is often quite useless in a project management role. But that's not to say that you can't find someone young who has done a short stint at the bottom, got PMP certified (or whatever), and has 1-2 projects under their belt. It wouldn't be cheap, but not horribly expensive either.

On the other hand, Luke seems pretty on the ball with respect to administrative stuff. It may be sufficient to get him some project manager training and some very senior project management advisers.

Neither one of these would be a long-term adequate solution. You need very senior, very experienced project management people in order to tackle something as large as FAI, and stay on schedule and on budget. But in terms of just making sure the organization is focused on the right issues, either of the above would be a drastic improvement, and enough for now.

Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don't even come close in terms of ambition, risk, effort or complexity.

60 years ago, maybe. However these days advances in cognitive science, narrow AI, and computational tools are advancing at rapid paces on their own. The problem for MIRI should be that of ensuring a positive singularity via careful leverage of the machine intelligence already being developed for other purposes. That's a much smaller project, and something I think a small but adequately funded organization should be able to pull off.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-24T18:25:47.268Z · LW(p) · GW(p)

From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!

Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, "Attack the most promising problems that present themselves, at every point; don't actually build things which you don't yet know how to make not destroy the world, at any point." Right now this means working on unbounded problems because there are no bounded problems which seem more relevant and more on the critical path. If at any point we can build something to test ideas, of course we will; unless our state of ignorance is such that we can't test that particular idea without risking destroying the world, in which case we won't, but if you're really setting out to test ideas you can probably figure out some other way to test them, except for very rare highly global theses like "The intelligence explosion continues past the human level." More local theses should be testable.

See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.

Replies from: None, None
comment by [deleted] · 2014-06-24T19:09:06.018Z · LW(p) · GW(p)

I'll have to review Ch. 22 later as it is quite long.

If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.

For example, take oracle AI. The sequences do a good job of showing how a black box AI can't be safely boxed, nor can any of its recommendations be trusted. But those arguments don't generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can't demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it'd be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.

Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.

So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the "competition"?

I'm not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-25T13:54:34.587Z · LW(p) · GW(p)

If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.

Well that makes three of us...

comment by [deleted] · 2014-06-24T21:55:35.018Z · LW(p) · GW(p)

See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.

Badass boasting from fictional evidence?

Yes, dear, some of us are programmers, we know about waterfalls.

If anyone here knew anything about the Waterfall Model, they'd know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work. "Agile" is pretty goddamn fake, too. There's no replacement for actually using your mind to reason about what project-planning steps have the greatest expected value at any given time, and to account for unknown unknowns (ie: debugging, other obstacles) as well.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-26T17:51:22.556Z · LW(p) · GW(p)

If anyone here knew anything about the Waterfall Model, they'd know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work

Yes, and I used it in that context: "We know about waterfalls" = "We know not to do waterfalls, so you don't need to tell us that". Thank you for that very charitable interpretation of my words.

Replies from: None
comment by [deleted] · 2014-06-27T05:48:03.399Z · LW(p) · GW(p)

Well, when you start off a sentence with "Yes, dear", the dripping sarcasm can be read multiple ways, none of them very useful or nice.

Whatever. No point fighting over tone given shared goals.

comment by TheAncientGeek · 2014-06-23T19:55:36.408Z · LW(p) · GW(p)

Do we need to do this = wild guess.

The whole things a Drake Equation

comment by [deleted] · 2014-06-23T10:10:55.780Z · LW(p) · GW(p)

Ok, let me finally get around to answering this.

FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in "philosophy" or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.

In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.

Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem".

If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.

We don't have time to be dicking around doing basic research on whiteboards.

Luckily, we don't need to dick around.

Replies from: DefectiveAlgorithm, Eliezer_Yudkowsky, None, TheAncientGeek
comment by DefectiveAlgorithm · 2014-06-24T09:10:11.836Z · LW(p) · GW(p)

an Oracle AI you can trust

That's a large portion of the FAI problem right there.

EDIT: To clarify, by this I don't mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.

Replies from: None
comment by [deleted] · 2014-06-24T09:33:46.740Z · LW(p) · GW(p)

In-context, what was meant by "Oracle AI" is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.

Replies from: None, DefectiveAlgorithm, Plasmon
comment by [deleted] · 2014-06-24T16:18:46.243Z · LW(p) · GW(p)

You have to give it a set of directed goals and a utility function which favors achieving those goals, in order for the oracle AI to be of any use.

Replies from: None
comment by [deleted] · 2014-06-24T17:36:40.807Z · LW(p) · GW(p)

Why? How are you structuring your Oracle AI? This sounds like philosophical speculation, not algorithmic knowledge.

comment by DefectiveAlgorithm · 2014-06-24T09:55:20.036Z · LW(p) · GW(p)

Ok, but a system like you've described isn't likely to think about what you want it to think about or produce output that's actually useful to you either.

Replies from: None
comment by [deleted] · 2014-06-24T10:48:22.659Z · LW(p) · GW(p)

Well yes. That's sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.

Replies from: DefectiveAlgorithm
comment by DefectiveAlgorithm · 2014-06-24T11:50:49.241Z · LW(p) · GW(p)

Well, ok, but if you agree with this then I don't see how you can claim that such a system would be particularly useful for solving FAI problems.

Replies from: None
comment by [deleted] · 2014-06-24T12:33:46.564Z · LW(p) · GW(p)

Well, I don't know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.

comment by Plasmon · 2014-06-24T18:03:55.563Z · LW(p) · GW(p)

All existing learning algorithms I know of, and I dare say all that exist, have at least an utility function, and also something that could be interpreted as a decision theory. Consider for example support vector machines, which explicitly try to maximize a margin (that would be the utility function), and any algorithm for computing SVMs can be interpreted as a decision theory. Similar considerations hold for neural networks, genetic algorithms, and even the minimax algorithm.

Thus, I strongly doubt that the notion of a learning algorithm with no utility function makes any sense.

Replies from: None
comment by [deleted] · 2014-06-24T18:06:31.033Z · LW(p) · GW(p)

Those are optimization criteria, but they are not decision algorithms in the sense that we usually talk about them in AI. A support vector machine is just finding the extrema of a cost function via its derivative, not planning a sequence of actions.

Replies from: Plasmon
comment by Plasmon · 2014-06-24T18:21:12.768Z · LW(p) · GW(p)

The most popular algorithm for SVMs does plan a sequence of actions, complete with heuristics as to which action to take. True, the "actions" are internal : they are changes to some data structure within the computer's memory, rather than changes to the external world. But that is not so different from e.g. a chess AI, which assigns some heuristic score to chess positions and attempts to maximize it using a decision algorithm (to decide which move to make), even though the chessboard is just a data structure within the computer memory.

Replies from: None
comment by [deleted] · 2014-06-24T18:31:56.824Z · LW(p) · GW(p)

"Internal" to the "agent" is very different from having an external output to a computational system outside the "agent". "Actions" that come from an extremely limited, non-Turing-complete "vocabulary" (really: programming language or computational calculus (those two are identical)) are also categorically different from a Turing complete calculus of possible actions.

The same distinction applies for hypothesis class that the learner can learn: if it's not Turing complete (or some approximation thereof, like a total calculus with coinductive types and corecursive programs), then it is categorically not general learning or general decision-making.

This is why we all employ primitive classifiers every day without danger, and you need something like Solomonoff's algorithmic probability in order to build AGI.

Replies from: Plasmon
comment by Plasmon · 2014-06-24T19:26:25.858Z · LW(p) · GW(p)

I agree, of course, that none of the examples I gave ("primitive classifiers") are dangerous. Indeed, the "plans" they are capable of considering are too simple to pose any threat (they are, as you say, not Turing complete).

But, that doesn't seem to relevant to the argument at all. You claimed

a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.

You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions. What would "a learning algorithm without decision-theory or utility function, something that has no desire to do anything" even look like? Does the concept even make sense? Eliezer writes here

A string of zeroes down an output line to a motorized arm is just as much an output as any other output; there is no privileged null, there is no such thing as 'no action' among all possible outputs. To 'do nothing' is just another string of English words, that would be interpreted the same as any other English words, with latitude.

Replies from: None
comment by [deleted] · 2014-06-24T21:07:30.287Z · LW(p) · GW(p)

You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions.

/facepalm

There is in fact such a thing as a null output. There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as "in the class" or "not in the class" does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself.

It does not narrow the future.

Now, what we've been proposing as an Oracle is even less capable. It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.

Perhaps I have misused terminology, but that is what I was referring to: inability to narrow the outer world's future.

Replies from: Plasmon
comment by Plasmon · 2014-06-24T21:29:29.035Z · LW(p) · GW(p)

This thing you are proposing, an "oracle" that is incapable of modeling itself and incapable of modeling its environment (either would require turing-complete hypotheses), what could it possibly be useful for? What could it do that today's narrow AI can't?

Replies from: None
comment by [deleted] · 2014-06-24T21:44:12.945Z · LW(p) · GW(p)

A) It wasn't my proposal.

B) The proposed software could model the outer environment, but not act on it.

Replies from: Plasmon
comment by Plasmon · 2014-06-24T21:50:38.356Z · LW(p) · GW(p)

Physics is turing-complete, so no, a learner that did not consider turing complete hypotheses could not model the outer environment.

Replies from: None
comment by [deleted] · 2014-06-24T21:56:46.334Z · LW(p) · GW(p)

You seem to have lost the thread of the conversation. The proposal was to build a learner that can model the environment using Turing-complete models, but which has no power to make decisions or take actions. This would be a Solomonoff Inducer approximation, not an AIXI approximation.

Replies from: Plasmon
comment by Plasmon · 2014-06-25T05:52:09.589Z · LW(p) · GW(p)

You said

There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as "in the class" or "not in the class" does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself. ... Now, what we've been proposing as an Oracle is even less capable.

which led me to think you were talking about an oracle even less capable than a learner with a sub-Turing hypothesis class.

It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.

If the hypotheses it considers are turing-complete, then, given enough information (and someone would give it enough information, otherwise they couldn't do anything useful with it), it could model itself, its environment, the relation between its internal states and what shows up on the debug view, and the reactions of its operators on the information they learn from that debug view. Its (internal) actions very much would, to its own knowledge, skew the probability distribution over future states of the outer world.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-06-23T19:14:53.965Z · LW(p) · GW(p)

I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about.

Name three. FAI contains a number of counterintuitive difficulties and it's unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn't MIRI citing it, is far more probable from my perspective and prior.

Replies from: None, TheAncientGeek
comment by [deleted] · 2014-06-24T08:19:36.990Z · LW(p) · GW(p)

I wouldn't say that there's someone out there directly solving FAI problems without having explicitly intended to do so. I would say there's a lot we can build on.

Keep in mind, I've seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.

One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.

That's only two, but I'm a comparative beginner at this stuff and Eld Science isn't very good at focusing on our problems, so I expect that there's actually more to discover and I'm just limited by lack of time and knowledge to do the literature searches.

By the way, I'm already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.

comment by TheAncientGeek · 2014-06-23T20:07:55.564Z · LW(p) · GW(p)

Define doing FAI work successfully....

comment by [deleted] · 2014-06-23T18:46:11.295Z · LW(p) · GW(p)

Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem".

1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of "correct" behavior according to a goal set that is by necessity included in the modifications as well.

2) Designing such a high level set of goals which ensure "friendliness".

Replies from: TheAncientGeek, None
comment by TheAncientGeek · 2014-06-24T08:59:09.018Z · LW(p) · GW(p)

Designing, not evolving?

Replies from: None
comment by [deleted] · 2014-06-24T15:05:10.885Z · LW(p) · GW(p)

That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you'd end up with.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-24T16:49:51.410Z · LW(p) · GW(p)

I don't see why the search algorithm would need to be self modifying.

I don't see why you would be searching for stability as opposed to friendliNess. Human testers can judge friendliness directly.

Replies from: None
comment by [deleted] · 2014-06-24T16:57:33.381Z · LW(p) · GW(p)

It's how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.

I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don't need friendliness anyway.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-25T13:32:14.581Z · LW(p) · GW(p)

If I draw a box around the selection algorithm and find there is nothing self modifying inside ...where's the circularity?

comment by [deleted] · 2014-06-24T08:10:02.883Z · LW(p) · GW(p)

(1) is naturalized induction, logical uncertainty, and getting around the Loebian Obstacle.

(2) is the cognitive science of evaluative judgements.

Replies from: None
comment by [deleted] · 2014-06-24T15:00:41.130Z · LW(p) · GW(p)

Great, you've got names for answers you are looking for. That doesn't mean the answers are any easier to find. You've attached a label to the declarative statement which specifies the requirements a solution must meet, but that doesn't make the search for a solution suddenly have a fixed timeline. It's uncertain research: it might take 5 years, 10 years, or 50 years, and throwing more people at the problem won't necessarily make the project go any faster.

Replies from: None
comment by [deleted] · 2014-06-24T15:06:43.235Z · LW(p) · GW(p)

And how is trying to build a safe Oracle AI that can solve FAI problems for us not basic research? Or, to make a better statement: how is trying to build an Unfriendly superintelligent paperclip maximizer not basic research, at today's research frontier?

Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we're pretty sure, but it's turning out UFAI might need it, too.

Replies from: None
comment by [deleted] · 2014-06-24T15:56:07.164Z · LW(p) · GW(p)

"Basic research is performed without thought of practical ends."

"Applied research is systematic study to gain knowledge or understanding necessary to determine the means by which a recognized and specific need may be met."

-National Science Foundation.

We need to be doing applied research, not basic research. What MIRI should do is construct a complete roadmap to FAI, or better: a study exhaustively listing strategies for achieving a positive singularity, and tactics for achieving friendly or unfriendly AGI, and concluding with a small set of most-likely scenarios. MIRI should then have identified risk factors which affect either the friendliness of the AGI in each scenario, or the capability of the UFAI to do damage (in boxing setups). These risk factors should be prioritized based on how much it is expected knowing more about each would bias the outcome in a positive direction, and it should be these problems as the topics of MIRI workshops.

Instead MIRI is performing basic research. It's basic research not because it is useless, but because we are not certain at this point in time what relative utility it will have. And if we don't have a grasp on expected utility, how can we prioritize? There's a hundred avenues of research which are important to varying degrees to the FAI project. I worked for a number of years at NASA-Ames Research Center, and in the same building as me was the Space Biosciences Division. Great people, don't get me wrong, and for decades they have funded really cool research on the effects of microgravity and radiation on living organisms, with the justification that such effects and counter-measures need to be known for long duration space voyages, e.g. a 2-year mission to Mars. Never mind that the microgravity issue is trivially solved with a few thousand dollar steel tether connecting the upper stage to the space craft as they spin to create artificial gravity, and the radiation exposure is mitigated by having a storm shelter in the craft and throwing a couple of Martian sandbags on the roof once you get there. It's spending millions of dollars to develop the pressurized-ink "Space Pen", when the humble pencil would have done just fine.

Sadly I think MIRI is doing the same thing, and it is represented in one part of your post I take huge issue with:

Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we're pretty sure...

If we're only "pretty sure" it's needed for FAI, if we can't quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don't see MIRI doing any planning like this (or if they are, it's not public).

Replies from: None
comment by [deleted] · 2014-06-24T16:19:42.311Z · LW(p) · GW(p)

Are you on the "Open Problems in Friendly AI" Facebook group? Because much of the planning is on there.

If we're only "pretty sure" it's needed for FAI, if we can't quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don't see MIRI doing any planning like this (or if they are, it's not public).

Logical uncertainty lets us put probabilities to sentences in logics. This, supposedly, can help get us around the Loebian Obstacle to proving self-referencing statements and thus generating stable self-improvement in an agent. Logical uncertainty also allows for making techniques like Updateless Decision Theory into real algorithms, and this too is an AI problem: turning planning into inference.

The cognitive stuff about human preferences is the Big Scary Hard Problem of FAI, but utility learning (as Stuart Armstrong has been posting about lately) is a way around that.

If you can create a stably self-improving agent that will learn its utility function from human data, equipped with a decision theory capable of handling both causative games and Timeless situations correctly... then congratulations, you've got a working plan for a Friendly AI and you can start considering the expected utility of actually building it (at least, to my limited knowledge).

Around here you should usually clarify whether your uncertainty is logical or indexical ;-).

Replies from: None
comment by [deleted] · 2014-06-24T16:51:47.787Z · LW(p) · GW(p)

Or.. you could use a boxed oracle AI to develop singularity technologies for human augmentation, or other mechanisms to keep moral humans in the loop through the whole process, and sidestep the whole issue of FAI and value loading in the first place.

Which approach do you think can be completed earlier with similar probabilities of success? What data did you use to evaluate that, and how certain are you of its accuracy and completeness?

Replies from: None
comment by [deleted] · 2014-06-24T17:35:55.147Z · LW(p) · GW(p)

I actually really do think that de novo AI is easier than human intelligence augmentation. We have good cognitive theories for how an agent is supposed to work (including "ideal learner" models of human cognitive algorithms). We do not have very good theories of in-vitro neuroengineering.

Replies from: None
comment by [deleted] · 2014-06-24T22:33:39.235Z · LW(p) · GW(p)

Yes, but those details would be handled by the post-"FOOM" boxed AI. You get to greatly discount their difficulty.

Replies from: None
comment by [deleted] · 2014-06-25T06:24:41.735Z · LW(p) · GW(p)

This assumes that you have usable, safe Oracle AI which then takes up your chosen line of FAI or neuroengineering problems for you. You are conditioning the hard part on solving the hard part.

comment by TheAncientGeek · 2014-06-23T19:39:01.584Z · LW(p) · GW(p)

You don't need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.

comment by TheAncientGeek · 2014-06-21T16:00:17.082Z · LW(p) · GW(p)

I'm not arguing that AI will necessary be safe. I am arguing that the failure modes in'vestigated by MIRI aren't likely. It is worthwhile to research effectivev off switches. It is not worthwhile to endlessly refer to a dangerous AI of a kind no one with a smidgeon of sense would build.

Replies from: None
comment by [deleted] · 2014-06-21T16:47:18.793Z · LW(p) · GW(p)

Bzzzt. Wrong. You still haven't explained how to create an agent that will faithfully implement my verbal instruction to bring me a pizza. You have a valid case in the sense of pointing out that there can easily exist a "middle ground" between the Superintelligent Artificial Ethicist (Friendly AI in its fullest sense), the Superintelligent Paper Clipper (a perverse, somewhat unlikely malprogramming of a real superintelligence), and the Reward-Button Addicted Reinforcement Learner (the easiest unfriendly AI to actually build). What you haven't shown is how to actually get around the Addicted Reinforcement Learner and the paper-clipper and actually build an agent that can be sent out for pizza without breaking down at all.

Your current answers seem to be, roughly, "We get around the problem by expecting future AI scientists to solve it for us." However, we are the AI scientists: if we don't figure out how to make AI deliver pizza on command, who will?

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-21T19:11:40.716Z · LW(p) · GW(p)

You keep misreading me. I am not claiming that to gave a solution. I am claiming that MIRI is overly pessimistic about the problem, and offering an over engineered solution. Inasmuch ad you say there is a middle ground, you kind if agree.

Replies from: None
comment by [deleted] · 2014-06-21T19:15:40.306Z · LW(p) · GW(p)

The thing is, MIRI doesn't claim that a superintelligent world-destroying paperclipper is the most likely scenario. It's just illustrative of why we have an actual problem: because you don't need malice to create an Unfriendly AI that completely fucks everything up.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-21T19:22:48.615Z · LW(p) · GW(p)

To make reliable predictions, more realistic examples are needed.

Replies from: None
comment by [deleted] · 2014-06-21T19:34:07.823Z · LW(p) · GW(p)

So how did you like CATE, over in that other thread? That AI is non-super-human, doesn't go FOOM, doesn't acquire nanotechnology, can't do anything a human upload couldn't do... and still can cause quite a lot of damage simply because it's more dedicated than we are, suffers fewer cognitive flaws than us, has more self-knowledge than us, and has no need for rest or food.

I mean, come on: what if a non-FOOMed but Unfriendly AI becomes as rich as Bill Gates? After all, if Bill Gates did it while human, than surely an AI as smart as Bill Gates but without his humanity can do the same thing, while causing a bunch more damage to human values because it simply does not feel Gates' charitable inclinations.

comment by Ruby · 2014-06-19T12:38:54.660Z · LW(p) · GW(p)

I feel like there's not much of a distinction being made here between terminal values and terminal goals. I think they're importantly different things.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T12:42:40.760Z · LW(p) · GW(p)

Huh?

Replies from: Ruby
comment by Ruby · 2014-06-19T13:04:50.833Z · LW(p) · GW(p)

A goal I set is a state of the world I am actively trying to bring about, whereas a value is something which . . . has value to me. The things I value dictate which world states I prefer, but for either lack of resources or conflict, I only pursue the world states resulting from a subset of my values.

So not everything I value ends up being a goal. This includes terminal goals. For instance, I think that it is true that I terminally value being a talented artist - greatly skilled in creative expression - being so would make me happy in and of itself, but it's not a goal of mine because I can't prioritise it with the resources I have. Values like eliminating suffering and misery are ones which matter to me more, and get translated into corresponding goals to change the world via action.

I haven't seen a definition provided, but if I had to provide one for 'terminal goal' it would be that it's a goal whose attainment constitutes fulfilment of a terminal value. Possessing money is rarely a terminal value, and so accruing money isn't a terminal goal, even if it is intermediary to achieving a world state desired for its own sake. Accomplishing the goal of having all the hungry people fed is the world state which lines up with the value of no suffering, hence it's terminal. They're close, but not quite same thing.

I think it makes sense to possibly not work with terminal goals on a motivational/decision making level, but it doesn't seem possible (or at least likely) that someone wouldn't have terminal values, in the sense of not having states of the world which they prefer over others. [These world-state-preferences might not be completely stable or consistent, but if you prefer the world be one way than another, that's a value.]

comment by pinyaka · 2014-06-19T14:30:45.644Z · LW(p) · GW(p)

I don't think that terminal goal means that it's the highest priority here, just that there is no particular reason to achieve it other than the experience of attaining that goal. So eating barbecue isn't about nutrition or socializing, it's just about eating barbecue.

comment by scaphandre · 2014-06-19T15:09:23.724Z · LW(p) · GW(p)

I think the 'terminal' in terminal goal means 'end of that thread of goals', as in a train terminus. Something that is wanted for the sake of itself.

It does not imply that you will terminate someone to achieve it.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-19T15:36:51.247Z · LW(p) · GW(p)

If g1 is you bacon eating goal, ,and g2 is your not killing people goal, and g2 overrides g1, then g2 is the end of the thread.

comment by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2014-06-17T05:22:00.702Z · LW(p) · GW(p)

I'm not sure I'm prepared to make the stronger claim that I don't believe other people have terminal goals. Maybe they do. They know more about their brains than I do. I'm definitely willing to make the claim that people trying to help me rewrite my brain is not going to prove to be useful.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-17T15:33:37.940Z · LW(p) · GW(p)

There is no evidence that most or all people have terminal goals. TV's should not be assumed by default or used as a theoretical framework.

Replies from: Lumifer
comment by Lumifer · 2014-06-17T17:51:54.363Z · LW(p) · GW(p)

There is no evidence that most or all people have terminal goals.

Survival is a terminal goal that most people have.

Replies from: Jayson_Virissimo, Richard_Kennaway
comment by Jayson_Virissimo · 2014-06-17T18:10:14.876Z · LW(p) · GW(p)

Is it though, or do people want to survive in order to achieve other goals? Many people (I think) wouldn't want to continue living if they were in a vegetative state with ultra-low probability of regaining their ability to live normally (and therefore, achieve other goals).

Replies from: Lumifer
comment by Lumifer · 2014-06-17T18:23:19.503Z · LW(p) · GW(p)

or do people want to survive in order to achieve other goals?

I am pretty sure people have a biologically hardwired desire to survive. It is terminal X-D

Many people (I think) wouldn't want to continue living if they were in a vegetative state with ultra-low probability of regaining their ability to live normally

Yes, but do note the difference between "I survive" and "my brain-dead body survives".

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-17T19:02:19.959Z · LW(p) · GW(p)

If someone is persuaded to sacrifice themsself for a cause X, is cause X then more-than-terminal?

Replies from: TheAncientGeek, Lumifer
comment by TheAncientGeek · 2014-06-17T19:25:42.080Z · LW(p) · GW(p)

I suppose you you could say that, survival was never their terminal goal. But, to me that has a just so quality. You can identify a terminal goal from any life history, but you can't predict anything.

comment by Lumifer · 2014-06-17T19:32:05.962Z · LW(p) · GW(p)

Humans have multiple values, including multiple terminal values. They do not necessary form any coherent system and so on a regular basis conflict with one another. This is a normal state of being for human values. Conflicts get resolved in a variety of ways, sometimes by cost-benefit analysis, and sometimes by hormonal imbalance :-)

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-17T19:41:19.092Z · LW(p) · GW(p)

If there is no coherence or stability in the human value system, then there are no terminal values, in any sense that makes a meaningful distinction. Anarchies don't have leaders either.

Replies from: Lumifer
comment by Lumifer · 2014-06-17T19:48:25.267Z · LW(p) · GW(p)

"Terminal" does NOT mean "the most important". It means values which you cannot (internally) explain in terms of other values, you have them just because you have them. They are axioms.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-17T19:55:44.657Z · LW(p) · GW(p)

Where does it say that? Since when could an axiom be incomprehensible? By "axiom" do you mean what philosophers call intuition?

Replies from: Lumifer
comment by Lumifer · 2014-06-17T20:16:08.315Z · LW(p) · GW(p)

It has nothing to do with comprehensibility.

Most people value having a tidy sum in a bank account. But (usually) they don't value it for itself, they value it because it allows them to get other stuff which they like. Money in a bank is NOT a terminal value.

Most people value not being in pain. They (usually) don't value it because not being in pain allows them something, they value it for itself, because lack of pain IS a terminal value.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-17T20:48:52.129Z · LW(p) · GW(p)

Wild you kill million to obtain analgesia?

Here it is in full strength Lesswrongelse: the fact that every node in the graph leads to another does not mean the graph is acyclic.

Replies from: Lumifer
comment by Lumifer · 2014-06-17T20:57:48.714Z · LW(p) · GW(p)

The graph of this conversation just went cyclic: please goto here.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-18T09:45:50.616Z · LW(p) · GW(p)

Running in circles is not a successful means of evasion.

comment by Richard_Kennaway · 2014-06-18T11:22:22.250Z · LW(p) · GW(p)

Survival is a terminal goal that most people have.

That explains why there are no such things as armies or wars, why no-one has ever risked their life for another, why no-one has ever chosen dying well above living badly, and why no-one has ever considered praiseworthy the non-existent people who have done these things. No-one would dream of engaging in dangerous sports, nor has the saying "live fast, die young" ever meant anything but a condemnation.

Replies from: Lumifer, army1987
comment by Lumifer · 2014-06-18T15:30:35.883Z · LW(p) · GW(p)

To repeat myself, terminal goals do not have to be important, it's a different quality.

For me, for example, the feeling of sun on my skin is a terminal value. It's not a very important terminal value :-)

comment by A1987dM (army1987) · 2014-06-18T14:18:04.154Z · LW(p) · GW(p)

That only shows that survival isn't the only terminal goal, not necessarily that it's not a terminal goal at all.

Replies from: TheAncientGeek
comment by TheAncientGeek · 2014-06-18T14:23:29.246Z · LW(p) · GW(p)

Is there anyone anywhere who would sacrifice any amount of anything, including other lives, to save their own life?

Replies from: army1987
comment by A1987dM (army1987) · 2014-06-19T16:06:33.975Z · LW(p) · GW(p)

I was going to answer something like ‘when I don't drive as fast as I could I sacrifice an amount of time to increase my survival probability’, then I realized that's not what you mean, but you might want to replace “any” with “an arbitrarily large” to avoid confusion in other readers.

comment by kalium · 2014-06-16T15:53:51.617Z · LW(p) · GW(p)

My brain works this way as well. Except with the addition that nearly all sorts of consequentialism are only able to motivate me through guilt, so if I try to adopt such an ethical system I feel terrible all the time because I'm always falling far short of what I should be doing. With virtue ethics, on the other hand, I can feel good about small improvements and perhaps even stay motivated until they combine into something less small.

comment by Error · 2014-06-16T16:23:36.627Z · LW(p) · GW(p)

I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous.

I'm like this. Part of what makes it difficult is figuring out whether you're "faking it" or not. One of the maybe-not-entirely-pleasant side effects of reading Less Wrong is that I've become aware of many of the ways that my brain will lie to me about what I am and the many ways it will attempt to signal false traits without asking me first. This is a problem when you really hate self-aggrandizement and aggrandizing self-deception and get stuck living in a brain made entirely of both. My "stop pretending (or believing) that you're smarter/better/more knowledgeable than you are, jackass" tripwire trips a lot more often than it used to.

(in fact it's tripping on this comment, on the grounds that I'm signaling more epistemic honesty than I think I possess; and it's tripping on this parenthetical remark, for same reason; and recursively does so more when I note it in the remark. Godel, I hate you.)

Ignorance wasn't better, but it sure was more comfortable.

In the past I’ve thought of myself as being mostly consequentialist, in terms of morality, and this is a very consequentialist way to think about being a good person. And it doesn't feel like it would work.

Assuming I understand the two correctly, I find I espouse consequentialism in theory but act more like a virtue ethicist in practice. That is, I feel I should do whatever is going to have the best outcome, but I actually do whatever appears "good" on a surface level. "Good" can be replaced by whatever more-specific virtue the situation seems to call for. Introspection suggests this is because predicting the consequences of my own actions correctly is really hard, so I cheat. Cynicism suggests it's because the monkey brain wants to signal virtue more than achieve my purported intent.

Replies from: IrritableGourmet, SilentCal, None
comment by IrritableGourmet · 2014-06-18T14:17:27.364Z · LW(p) · GW(p)

Part of what makes it difficult is figuring out whether you're "faking it" or not.

Speaking of movies, I love Three Kings for this:

Archie Gates: You're scared, right?

Conrad Vig: Maybe.

Archie Gates: The way it works is, you do the thing you're scared shitless of, and you get the courage AFTER you do it, not before you do it.

Conrad Vig: That's a dumbass way to work. It should be the other way around.

Archie Gates: I know. That's the way it works.

comment by SilentCal · 2014-06-17T19:58:04.014Z · LW(p) · GW(p)

The distinction between pretending and being can get pretty fuzzy. I like the 'pretend to pretend to actually try' approach where you try to stop yourself from sending cheap/poor signals rather than false ones. That is, if you send a signal that you care about someone, and the 'signal' is something costly to you and helpful to the other person, it's sort of a moot point whether you 'really care'.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2014-06-18T11:18:09.497Z · LW(p) · GW(p)

I think that in the context of caring at least, the pretending/being distinction is a way of classifying the components motivating your behavior. If you're "faking" caring, then that implies that you need to actively spend effort on caring. Compared to a situation where the caring "came naturally" and didn't require effort, the "faker" should be expected to act in a non-caring manner more frequently, because situations that leave him cognitively tired are more likely to mean that he can't spare the effort to go on with the caring behavior.

Also, having empathic caring for other people is perceived as being a pretty robust trait in general: if you have it, it's basically "self-sustaining" and doesn't ordinarily just disappear. On the other hand, goals like "I want to fake caring" are more typically subgoals to some other goal, which may disappear. If you know that someone is faking caring, then there are more potential situations where they might stop doing that - especially if you don't know why they are faking it.

Replies from: SilentCal
comment by SilentCal · 2014-06-24T22:17:36.214Z · LW(p) · GW(p)

Wow, you can care about other people in a way that doesn't even begin to degrade under cognitive fatigue? Is that common?

I like defining 'real' caring as stable/robust caring, though. If I 'care' about my friends because I want caring about friends to be part of my identity, I consider that 'real' caring, since it's about as good as I get.

comment by [deleted] · 2014-06-16T17:23:26.034Z · LW(p) · GW(p)

This is a problem when you really hate self-aggrandizement and aggrandizing self-deception and get stuck living in a brain made entirely of both. My "stop pretending (or believing) that you're smarter/better/more knowledgeable than you are, jackass" tripwire trips a lot more often than it used to.

So fix it. Learn more, think more, do more, be more. Humility doesn't save worlds, and you can't really believe in your own worthlessness. Instead, believe in becoming the person whom your brain believes you to be.

Replies from: Error
comment by Error · 2014-06-16T20:09:56.724Z · LW(p) · GW(p)

Clarification: I don't believe I'm worthless. But there's still frequently a disparity between the worth I catch myself trying to signal and the worth I (think I) actually have. Having worth > 0 doesn't make that less objectionable.

I do tend to give up on the "becoming" part as often as not, but I don't think I do worse than average in that regard. Average does suck, though.

Replies from: None
comment by [deleted] · 2014-06-16T20:34:00.840Z · LW(p) · GW(p)

Why are you still making excuses not to be awesome?

Replies from: None
comment by [deleted] · 2014-06-16T20:37:40.090Z · LW(p) · GW(p)

Pity we can't self quote in the quote thread.

Replies from: None
comment by [deleted] · 2014-06-16T20:51:32.492Z · LW(p) · GW(p)

Huh? You've said something you want to quote? But this isn't the quotes thread...

Replies from: bbleeker
comment by Sabiola (bbleeker) · 2014-06-18T14:48:43.160Z · LW(p) · GW(p)

paper-machine wants to quote you, eli. "Why are you still making excuses not to be awesome?" would have made a pretty good quote, if only you hadn't written it on Less Wrong.

Replies from: None
comment by [deleted] · 2014-06-18T14:51:07.860Z · LW(p) · GW(p)

Well that's nice of him.

comment by Armok_GoB · 2014-06-29T18:39:48.357Z · LW(p) · GW(p)

The obvious things to do here is either:

a) Make a list/plan on paper, abstractly, of what you WOULD do is you had terminal goals, using your existing virtues to motive this act, and then have "Do what the list tells me to" as a loyalty-like high priority virtue. If you have another rationalist you really trust, and who have a very strong honesty commitment, you can even outsource the making of this list.

b) Assemble virtues that sum up to the same behaviors in practice; truth seeking, goodness, and "If something is worth doing it's worth doing optimally" is a good trio, and will have the end result of effective altruism while still running on the native system.

comment by efalken · 2014-06-19T21:51:12.258Z · LW(p) · GW(p)

Ever notice sci-fi/fantasy books written by young people have not just little humor, but absolutely zero humor (eg, Divergent, Eragon)?

Replies from: mare-of-night, Swimmer963
comment by mare-of-night · 2014-06-21T14:33:05.182Z · LW(p) · GW(p)

I haven't noticed it in my reading, but I'm probably just not well-read enough. But I'm pretty sure the (longform story, fantasy genre) webcomic script I wrote at 17 was humorless, or nearly humorless. I was even aware of this at the time, but didn't try very hard to do anything about it. I think I had trouble mixing humor and non-humor at that age.

I'm trying to think back on whether other writers my own age had the same problem, but I can't remember, except that stories we wrote together (usually by taking turns writing a paragraph or three at a time in a chatroom) usually did mix humor with serious-tone fantasy. This makes me wonder if being used to writing for an audience has something to do with it. The immediate feedback of working together that way made me feel a lot of incentive to write things that were entertaining.

comment by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2014-06-19T22:36:10.352Z · LW(p) · GW(p)

I actually haven't read either Divergent or Eragon. I've been told that the fantasy book I wrote recently is funny, and I'm pretty sure I qualify as "young person."

Replies from: Nornagest
comment by Nornagest · 2014-06-19T23:22:34.008Z · LW(p) · GW(p)

Eragon was written by a teenager with publishing connections. I don't know the story behind Divergent as well, but Wikipedia informs me that it was written while its author was in her senior year of college.

It's not so uncommon for writing, especially a first novel, to be published in its author's twenties -- Poe published several stories at that age, for example -- but teenage authors are a lot more unusual.

(I can't speak to their humor or lack thereof either, though -- my tastes in SF run a little more pretentious these days.)

comment by Emile · 2014-07-08T07:29:52.114Z · LW(p) · GW(p)

the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some.

Like many commenters here, I don't think we have very good introspective access to our own terminal values, and what we think are terminal values may be wrong. So "what are your terminal values" doesn't seem like a very useful question (except in that it may take the conversation somewhere interesting, but I don't think the answer would need to be taken very seriously).

It's a bit like saying "I thought a lot and deduced that I like chocolate, so I'm going to go eat some chocolate" - it's not a thought process that's really necessary, and it seems as likely to move your behavior away from your "actual" values as it is to move towards them.

(It does seem useful to distinguish instrumental goals from goals that don't appear instrumental, but terminal vs. non-terminal is much harder to distinguish)

Replies from: CCC
comment by CCC · 2014-07-08T08:38:48.601Z · LW(p) · GW(p)

Like many commenters here, I don't think we have very good introspective access to our own terminal values, and what we think are terminal values may be wrong.

A terminal value could be defined as that for which I would be prepared to knowingly enter a situation that carries a strong risk of death or other major loss. Working off that definition, it is clear that other people knowing what my terminal goals are is dangerous - if an enemy finds out that information, then he can threaten my terminal goal to force me to abandon a valuable but non-terminal resource. (It's risky on the enemy's part, because it leaves open the option that I might preserve my terminal goals by killing or imprisoning the enemy in question; either way, though, I still lose significantly in the process.)

And if I don't have good introspective access to my own terminal goals, then it is harder for a potential enemy to find out what they are. Moreover, this would also have applied to my ancestors. So not having good introspective access to my own terminal goals may be a general human survival adaptation.

Replies from: Emile
comment by Emile · 2014-07-08T10:42:02.398Z · LW(p) · GW(p)

A terminal value could be defined as that for which I would be prepared to knowingly enter a situation that carries a strong risk of death or other major loss.

That seems more like a definition of something one cares a lot about; sure, the two are correlated, but I believe "terminal value" usually refers to something you care about "for itself" rather than because it helps you in another way. So you could care more about an instrumental value (e.g. making money) than about a value-you-care-about-for-itself (e.g. smelling nice flowers).

Both attributes (how much you care, and whether it's instrumental) are important though.

And if I don't have good introspective access to my own terminal goals, then it is harder for a potential enemy to find out what they are. Moreover, this would also have applied to my ancestors. So not having good introspective access to my own terminal goals may be a general human survival adaptation.

Eh, I'm not sure; I could come up with equally plausible explanations for why it would be good to have introspective access to my terminal goals. And more importantly, humans (including everybody who could blackmail you) has roughly similar terminal goals, so have a pretty good idea of how you may react to different kinds of threats.

Replies from: CCC
comment by CCC · 2014-07-13T12:34:22.202Z · LW(p) · GW(p)

So you could care more about an instrumental value (e.g. making money) than about a value-you-care-about-for-itself (e.g. smelling nice flowers).

Hmmm. Then it seems that I had completely misunderstood the term. My apologies.

If that is the case, then it should be possible to find a terminal value by starting with any value and then repeatedly asking the question "and why do I value that value?" until a terminal value is reached.

For example, I may care about money because it allows me to buy food; I may care about food because it allows me to stay alive; and staying alive might be a terminal value.

comment by [deleted] · 2015-04-18T01:06:42.209Z · LW(p) · GW(p)

Can someone please react to my gut reaction about virtue ethics? I'd love some feedback if I misunderstand something.

It seems to me that most virtues are just instrumental values that make life convenient for people, especially those with unclear or intimidating terminal values.

The author says this about protagonist Tris:

Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’

I think maybe the deeper 'become good' node (and its huge overlap with the one other node that's equally deep-seated: 'become happy') is actually the "deeper motivation" at the core of virtue ethics.

Then two things account for the individual variance in which virtues are pursued:

(1) Individuals have different amounts of overlap between their 'become good' and 'become happy' nodes.

(2) Different people find happiness in different things.

One virtue the author wants to realize in herself is loyalty:

I’ve thought a lot about the virtue of loyalty. In the past, loyalty has kept me with jobs and friends that, from an objective perspective, might not seem like the optimal things to spend my time on. But the costs of quitting and finding a new job, or cutting off friendships, wouldn’t just have been about direct consequences in the world, like needing to spend a bunch of time handing out resumes or having an unpleasant conversation. There would also be a shift within myself, a weakening in the drive towards loyalty. It wasn’t that I thought everyone ought to be extremely loyal–it’s a virtue with obvious downsides and failure modes. But it was a virtue that I wanted, partly because it seemed undervalued.

I think loyalty is generally a useful way to achieve individual happiness (self-centered) and overall goodness (others-centered), but not always, and I'm guessing that in certain situations, the author would abandon the loyalty virtue to pursue the underlying preferences of happiness and goodness.

So maybe:

Cultivating a certain virtue in yourself is an example of an instrumental value

Innate preferences = some combination of personal happiness + goodness

Terminal values = arbitrary goals (often unidentified) somehow based on these preferences

At first glance, I like virtue ethics a lot. I'd like to pursue some virtues of my own, but ones that are carefully selected based on my terminal values, if I can summon the necessary introspective powers to figure them out. Until then, I'll say vaguely that my terminal value is preference fulfillment, and just choose some virtues that I think would efficiently fulfill my preferences. So some of my instrumental values will be virtues, which I can either pursue instinctively and through consequentialist-style opportunity-cost analyses.

Example:

My innate preferences: maybe 98% happiness-driven (selfishness) + 2% goodness-driven (altruism).

(Note: On the surface I might look more altruistic because there's a LOT of overlap between decisions that are good for others and decisions that make me feel good. Or, you could see the giant overlap and assume I'm 100% selfish.)

My lazy terminal value: satisfy my preferences of happiness and goodness

My chosen virtue (aka instrumental goal): become someone who cares about the environment

If caring about the environment is my instrumental goal, I can instinctively pick up trash, conserve energy, use a reusable water bottle; i.e. do things environmentally conscious people do.

I can also perform opportunity cost analyses to best realize my chosen virtue, then through it, my terminal value. For example, I could stop showering. Or, I could apparently have the same effect by eating six fewer hamburgers in a year. Personally, I prefer showering to eating hamburgers, tasty as they are, so I cut significantly back on my meat consumption but continue to take showers without worrying about their length.

Result: My innate preferences for happiness and goodness are harmoniously satisfied.

Is this allowed? Is there room for consequential reasoning in virtue ethics? Can virtue ethics be useful for consequentialists? Can I please be both a consequentialist and a virtue ethicist?

I feel like the most likely objection to this idea will be that true altruism does not exist as an innate preference. I have some tentative thoughts here too, if anyone is curious. It seems like the easiest explanation for why rational people don't always do the things that they feel will bring them the greatest personal happiness.

comment by Vladimir_Nesov · 2014-06-17T13:57:00.738Z · LW(p) · GW(p)

You don't know your terminal goals in detail, whatever that should be. Instead, there is merely a powerful technique of working towards goals, which are not terminal goals, but guesses at what's valuable, compared to alternative goals, alternative plans and outcomes implicit in them. Choosing among goals allows achieving more difficult outcomes than merely choosing among simpler actions where you can see the whole plan before it starts (you can't plan a whole chess game in advance, can't win it by an explicit plan that enumerates the moves, but you can win it by taking on the winning of a game as a goal). Attainment of some virtue is one possible goal, although there is no reason to privilege virtues above goals that are not about virtues.

Even for such goals, that are not terminal goals but something much simpler, playing the role of implicit plans, you won't actually single-mindedly work towards them, not being an agent. Still, to the extent they are estimated to be valuable, you should; and to the extent that you are an agent, you can. This makes the virtue of being more like an agent, of being more strategic, a valuable goal.