Posts
Comments
Since the question is about potential dangers, I think it is worth assuming the worst here. Also, realistically, we don't have a magic want to pop things into existence by fiat so I would guess that by default if such an AI was created it would be created with ML.
So lets say that this is trained largely autonomously with ML. Is there some way that would result in dangers outside the four already-mentioned categories?
Clearly, you and I have different definitions of "easy".
This was a terrific post; insightful and entertaining in excess of what can be conveyed by an upvote. Thank you for making it.
What you're proposing sounds more like moral relativism than moral nihilism.
Ah, yes. My mistake. I stand corrected. Some cursory googling suggests that you are right. With that said, to me Moral Nihilism seems like a natural consequence of Moral Relativism, but that may be a fact about me and not the universe, so to speak (though I would be grateful if you could point out a way to be morally relativist without morally nihilist).
I think that you're confusing moral universalism with moral absolutism and value monism.
The last paragraph of my previous post was a claim that unless you an objective way of ordering conflicting preferences (and I don't see how you can), you are forced to work under value pluralism. I did use this as an argument against moral universalism , though that argument may not be entirely correct. I concede the point.
But there's that language again that people use when they talk about moral nihilism, where I can't tell if they're just using different words, or if they really think that morality can be whatever we want it to be, or that it doesn't mean anything to say that moral propositions are true or false.
Okay. Correct me if any of this doesn't sound right. When a person talks about "morality", you imagine a conceptual framework of some sort - some way of distinguishing what makes actions "good" or "bad", "right" or "wrong", etc. Different people will imagine different frameworks, possibly radically so - but there is generally a lot of common ground (or so we hope), which is why you and I can talk about "morality" and more or less understand the gist of each other's arguments. Now, I would claim that what I mean when I say "morality", or what you mean, or what a reasonable third party may mean, or any combination thereof - that each of these is entirely unrelated to ground truth.
Basically, moral propositions (e.g. "Murder is Bad") contain unbound variables (in this case, "Bad") which are only defined in select subjective frames of reference. "Bad" does not have a universal value in the sense that "Speed of Light" or "Atomic Weight of Hydrogen" or "The top LessWrong contributor as of midnight January 1st, 2015" do. That is the main thesis of Moral Nihilism as far as I understand it. Does that sound sensible?
I wouldn't ask people those questions. People can be wrong about what they value. The point of moral philosophy is to know what you should do.
Alright; let me rephrase my point. Let us say that you have access to everything there that can be known about a individual X. Can you explain how you compute their objective contingent morality to an observer who has no concept of morality? You previous statement of "what is moral is what you value" would need to define "what you value" before it would suffice. Note that unless you can do this construction, you don't actually have something objective.
I think that using this notation is misleading. If I am understanding you correctly, you are saying that given an individual, we can derive their morality from their (real/physically grounded) state, which gives real/physically grounded morality (for that individual). Furthermore, you are using "objective" where I used "real/physically ground". Unfortunately, one of the common meanings of objective is "ontologically fundamental and not contingent", so your statement sounds like it is saying something that it isn't.
On a separate note, I'm not sure why you are casually dismissing moral nihilism as wrong. As far as I am aware, moral nihilism is the position that morality is not ontologically fundamental. Personally, I am a moral nihilist; my experience shows that morality as typically discussed refers to a collection of human intuitions and social constructs - it seems bizarre to believe that to be an ontologically fundamental phenomenon. I think a sizable fraction of LW is of like mind, though I can only speak for myself.
I would even go further and say that I don't believe in objective contingent morality. Certainly, most people have an individual idea of what they find moral. However, this only establishes that there is an objective contingent response to the question "what do you find moral?" There is similarly an objective contingent response to the related question "what is morality?", or the question "what is the difference between right and wrong?" Sadly, I expect the responses in each case to differ (due to framing effects, at the very least). To me, this shows that unless you define "morality" quite tightly (which could require some arbitrary decisions on your part), your construction is not well defined.
Note that I expect that last paragraph to be more relativist then most other people here, so I definitely speak only for myself there.
Okay, but at best, this shows that the immediate cause of you being shaken and coming out of it is related to fearful epiphanies. Is it not plausible that the reason that, at a given time, you find particular idea horrific or are able to accept a solution as satisfying depending on your mental state?
Consider this hypothetical narrative. Let Frank (name chosen at random) be a person suffering from occasional bouts of depression. When he is healthy, he notices an enjoys interacting with the world around him. When he is depressed, he instead focuses on real or imagined problems in his life - and in particular, how stressful his work is.
When asked, Frank explains that his depression is caused by problems at work. He explains that when he gets assigned a particularly unpleasant project, his depression flares up. The depression doesn't clear up until things get easier. Frank explains that once he finishes a project and is assigned something else, his depression clears up (unless the new project is just as bad); or sometimes, through much struggle, he figure out how to make the project bearable, and that resolves the depression as well.
Frank is genuine in expressing his feelings, and correct about work problems being correlated with his depression, but he is wrong about causation between the two.
Do you find this story analogous to your situation? If not, why not?
@gjm:
Just wanted to say that this is well thought out and well written - it is what I would have tried to say (albeit perhaps less eloquently) if it hadn't been said already. I wish I had more than one up-vote to give.
@Eitan_Zohar:
I would urge you to give the ideas here more thought. Part of the point here is that from you are going to be strongly biased for thinking your explanations are of the first sort and not the second. By virtue of being human, you are almost certainly biased in certain predictable ways, this being one of them. Do you disagree?
Let me ask you this: what would it take to make you change your mind; i.e. that the explanation for this pattern is one of the latter three reasons and not the former three reasons?
I definitely know that my depression is causally tied to my existential pessimism.
Out of curiosity, how do you know that this is the direction of the causal link? The experiences you have mentioned in the thread seem to also be consistent with depression causing you to get hung up on existential pessimism.
Your argument assumes that the algorithm and the prisons have access to the same data. This need not be the case - in particular, if a prison bribes a judge to over-convict, the algorithm will be (incorrectly) relying on said conviction as data, skewing the predicted recidivism measure.
That said, the perverse incentive you mentioned is absolutely in play as well.
Great suggestion! That said, in light of your first paragraph, I'd like to point out a couple of issues. I came up with most of these by asking the questions "What exactly are you trying to encourage? What exactly are you incentivising? What differences are there between the two, and what would make those difference significant?"
You are trying to encourage prisons to rehabilitate their inmates. If, for a given prisoner, we use p to represent their propensity towards recidivism and a to represent their actual recidivism, rehabilitation is represented by p-a. Of course, we can't actually measure these values, so we use proxies; anticipated recidivism according to your algorithm and re-conviction rate (we'll call these p' and a', respectively).
With this incentive scheme, our prisons have three incentives: increasing p'-p, increasing p-a, and increasing a-a'. The first and last can lead to some problematic incentives.
To increase p'-p, prisons need to incarcerate prisoners which are less prone to recidivism than predicted. Given that past criminality is an excellent predictor of future criminality, this leads to a perverse incentive towards incarcerating those who were unfairly convicted (wrongly convicted innocents or over-convinced lesser offenders). If said prisons can influence the judges supplying their inmates, this may lead to judges being bribed to aggressively convict edge-cases or even outright innocents, and to convict lesser offenses of crimes more correlated with recidivism. (Counterpoint: We already have this problem, so this perverse incentive might not be making things much worse than they already are.)
To increase a-a', prisons need to reduce the probability of re-conviction relative to recidivism. At the comically amoral end, this can lead to prisons teaching inmates "how not to get caught." Even if that doesn't happen, I can see prisons handing out their lawyer's business cards to released inmates. "We are invested in making you a contributing member of society. If you are ever in trouble, let us know - we might be able to help you get back on track." (Counterpoint: Some of these tactics are likely to be too expensive to be worthwhile, even ignoring morality issues.)
Also, since you are incentivising improvement but not disincentivizing regression, prisons who are below-average are encouraged to try high-volatility reforms even if they would yield negative expected improvement. For example, if a reform has a 20% chance of making things much better but a 80% chance of making things equally worse, it is still a good business decision (since the latter consequence does not carry any costs).
You are of course entirely correct in saying that this is far too little to retire on. However, it is possible to save without being able to liquidate said saving; for example by paying down debts. The Emergency Fund advice is that you should make a point to have enough liquid savings tucked away to tide you over in a financial emergency before you direct your discretionary income anywhere else.
I'm afraid I don't know. You might get better luck making this question a top level post.
I am by no means an expert, but here are a couple of options that come to mind. I came up with most of these by thinking "what kind of emergency are you reasonably likely to run into at some point, and what can you do to mitigate them?"
Learn some measure of first aid, or at least the Heimlich maneuver and CPR.
Keep a Seat belt cutter and window breaker in your glove compartment. And on the subject, there are a bunch of other things that you may want to keep in your car as well.
Have an emergency kit at home, and have a plan for dealing with natural disasters (fire, storms, etc). If you live with anyone, make sure that everyone is on the same page about this.
On the financial side, have an emergency fund. This might not impress your friends, but given how likely financial emergencies (e.g. unexpectedly losing a job) are relative to other emergencies, this is a good thing to plan for nonetheless. I think the standard advice is to have something on the order 3-6 months of income tucked away for a rainy day.
You make a good point, and I am very tempted to agree with you. You are certainly correct in that even a completely non-centralized community with no stated goals can be exclusionary. And I can see "community goals" serving a positive role, guiding collective behavior towards communal improvement, whether that comes in the form of non-exclusiveness or other values.
With that said, I find myself strangely disquieted by the idea of Less Wrong being actively directed, especially by a singular individual. I'm not sure what my intuition is stuck on, but I do feel that it might be important. My best interpretation right now is that having an actively directed community may lend itself to catastrophic failure (in the same way that having a dictatorship lends itself to catastrophic failure).
If there is a single person or group of people directing the community, I can imagine them making decisions which anger the rest of the community, making people take sides or split from the group. I've seen that happen in forums where the moderators did something controversial, leading to considerable (albeit usually localized) disruption. If the community is directed democratically, I again see people being partisan and taking sides, leading to (potentially vicious) internal politics; and politics is both a mind killer and a major driver of divisiveness (which is typically bad for the community).
Now, to be entirely fair, these are somewhat "worst case" scenarios, and I don't know how likely they are. However, I am having trouble thinking of any successful online communities which have taken this route. That may just be a failure of imagination, or it could be that something like this hasn't been tried yet, but it is somewhat alarming. That is largely why I urge caution in the instance.
While your family's situation is explained by lack of scope insensitivity, I'd like to put forward an alternative. I think the behavior you described also fits with rationalization. If you family had already made up their mind about supporting the Republican party, they could easily justify it to themselves (and to you) by citing a particular close-to-the-heart issue as an iron-clad reason.
Rationalization also explains why "even people who bother thinking for themselves are likely to arrive at the same conclusion as their peers" - it just means that said people are engaging in motivated cognition to come up with reasonable-sounding arguments to support the same conclusions as their peers.
Interesting point! It seems obvious in hindsight that if you reward people for making predictions that correspond to reality, they can benefit both by fitting their predictions to reality or fitting reality to their predictions. Certainly, it is an issue that come up even in real life in the context of sporting betting. That said, this particular spin on things hadn't occurred to me, so thanks for sharing!
I think the issue you are seeing is that Less Wrong is fundamentally a online community / forum, not a movement or even a self-help group. "Having direction" is not a typical feature of such a medium, nor would I say that it would necessary be a positive feature.
Think about it this way. The majority of the few (N < 10) times I've seen explicit criticism of Less Wrong, one of the main points cited was that Less Wrong had a direction, and that said direction was annoying. This usually refereed to Less Wrong focusing on the FAI question and X-risk, though I believe I've seen the EA component of Less Wrong challenged as well. By its nature, having direction is exclusionary - people who disagree with you stop feeling welcome in the community.
With that said, I strongly caution about trying to change Less Wrong to import direction to the community as a whole (e.g. by having an official "C.E.O"). With that said, organizing a sub-movement within Less Wrong for that sort of thing carries much less risk of alienating people. I think that would be the most healthy direction to take it, plus it allows you to grow organically (since people can easily join/leave your movement and you don't need to get the entire community mobilized to get started).
I consider philosophy to be a study of human intuitions. Philosophy examines different ways to think about a variety of deep issues (morality, existence, etc.) and tries to resolve results that "feel wrong".
On the other hand, I have very rarely heard it phrased this way. Often, philosophy is said to be reasoning directly about said issues (morality, existence, etc.), albeit with the help of human intuitions. This actually seems to be an underlying assumption of most philosophy discussions I've heard. I actually find that mildly disconcerting, given that I would expect it to confuse everyone involved with substantial frequency.
If anyone knows of a good argument for the assumption above, I would really like to hear it. I've only seen it assumed, never argued.
To address your first question: this has to do with scope insensitivity, hyperbolic discounting, and other related biases. To put it bluntly, most humans are actually pretty bad at maximizing expected utility. For example, when I first head about x-risk, my thought process was definitely not "humanity might be wiped out - that's IMPORTANT. I need to devote energy to this." It was more along the lines of "huh; That's interesting. Tragic, even. Oh well; moving on..."
Basically, we don't care much about what happens in the distant future, especially if it isn't guaranteed to happen. We also don't care much more about humanity than we do about ourselves plus our close ones. Plus we don't really care about things that don't feel immediate. And so on. Then end result is that most people's immediate problems are more important to them then x-risk, even if the latter might be by far the more essential according to utilitarian ethics.
No, I do not believe that it is standard terminology, though you can find a decent reference here.
Not necessarily. You are assuming that she has an explicit utility function, but that need not be the case.
Honestly, I suspect that the average person models others after themselves even if they consider themselves to be unusual. So this poll probably shouldn't be used as evidence to shift how similarly we model others to ourselves, one way or another.
That was awesome - thank you for posting the poll! The results are quire intriguing (at N = 18, anyway - might change with more votes, I guess).
Your best bet would be to find some sort of channel for communicating with your future self that your adversary does not have access to. Other posters mentioned several such examples, with channels including:
- keeping your long term memories (assuming that the memories couldn't be tampered with by the adversary)
- Swallowing a message, getting it as a tattoo, etc. (assuming that the adversary can't force you to do that)
- Using some sort of biometric lock (assuming that the adversary can't get a proper sample with causing detectable alternations to your blood chemistry which would be detectable in the sample) My personal addition: tell your friends/neighbors/the news the story. Unless your adversary can make you lie (or take your form or use mind magic or whatnot), these people can act as the channel you need.
If you don't have a channel of that sort, I believe you are out of luck. A formal proof eludes me at this time; I'll post again if I figure out out.
I think you are being a little too exacting here. True, most advances in well-studied fields are likely to be made by experts. That doesn't mean that non-experts should be barred from discussing the issue, for educational and entertainment purposes if nothing else.
That is not to say that there isn't a minimum level of subject-matter literacy required for an acceptable post, especially when the poster in question posts frequently. I imagine your point may be that Algon has not cleared that threshold (or is close to the line) - but your post seems to imply a MUCH higher threshold for posting.
I'm not convinced that the solution you propose is in fact easier than solving FAI. The following problems occur to me:
1) How to we explain to the creator AI what an FAI is? 2) How do we allow the creator AI to learn "properly" without letting it self modify in ways that we would find objectionable? 3) In the case of an unfriendly creator AI, how do we stop it from "sabotaging" its work in a way that would make the resulting "FAI" be friendly to the creator AI and not to us?
In general, I feel like the approach you outline just passes the issue up one level, requiring us to make FAI friendly.
On the other hand, if you limit your idea to something somewhat less ambitious, e.g. how to we make a safe AI to solve [difficult mathematical problem in useful for FAI], then I think you may be right.
You are absolutely correct. If the number of states of the universe is finite, then as long as any state is reachable from any other state, then every state will be reached arbitrarily often if you wait long enough.
Mathematician here. I wanted to agree with @pianoforte611 - just because you have infinite time doesn't mean that every event will repeat over and over.
For those interested in some reading, the general question is basically the question of Transience in Markov Chains; I also have some examples. :)
Let us say that we have a particle moving along a line. In each unit of time, it moves a unit of distance either left or right, with probability 1/10 of the former and 9/10 of the latter. How often can we expect the particle to have returned to its starting point? Well, to return to the origin, we must have moved left and right an equal number of times. At odd times, this is impossible; at time 2n, the probability of this is %5En%20\cdot%20\left(\frac9{10}\right)%5En%20\cdot%20\binom{2n}{n}) (this is not difficult to derive, and a simple explanation is given here). Summing this over all n, we get that the expected number of returns is one in four - in other words, we have no guarantee of returning even once, much less an infinite number times!
If this example strikes you as somewhat asymmetric, worry not - if the point was moving in three dimensions instead of one (so it could up, down, forward, or back as well as left or right), then a weighing of 1/6 to each direction means that you won't return to the starting point infinitely often. If you don't like having a fixed origin, use two particles, and have them moving independently in 3 dimensions. They will meet after time zero with less-than-unit-probability (actually, the same probability as in the previous problem, since the problems are equivalent after you apply a transformation).
I hope this helps!
Sorry - hadn't logged in for a while. I thought it would have vanishingly low probability of working, though I don't believe that it displaces any other action likely to work (though it does displace saving a person if all else fails, which has nontrivial value). Having said that, curiously enough it seems that this particular suggestion WAS implemented in the official solution, so I guess that was that. :)
I don't believe leveraging Voldemort's bargain will work the way you suggest, because Parseltongue does not enforce promises, only honesty. When Harry demands that he himself be saved, Voldemort can simply say "No."
You make a good point - in this instance, Voldemort is very much difficult to bargain with. However, I don't agree that that makes the problem impossible. For one thing, there may be solutions which don't require Voldemort's cooperation - e.g. silent transfiguration while stalling for time. For another, Harry can still get Voldemort's cooperation by convincing Voldemort that his current action is not in Voldemort's interests - for example, that killing Harry will actually bring about the end of the world that Voldemort fears.
I think in this case, you and Eliezer are both correct, but for different definitions of "winning". If one's primary goal is to find a solution to the puzzle (and get the good ending), then your advice is probably correct. However, if the goal to stimulate the experience of having to solve a hard problem using one's intellect, then Eliezer's advice seems more valid. I imagine that this is in the same way that one might not want to look up a walkthrough for a game - it would help you "win" the game, but not win at getting the most benefit/enjoyment out of it.
I thought of the idea that maybe the human decision maker has multiple utility functions that when you try to combine them into one function some parts of the original functions don't necessarily translate well... sounds like the "shards of desire" are actually a bunch of different utility functions.
This is an interesting perspective, and I would agree that we humans typically have multiple decision criteria which often can't be combined well. However, I don't think it is quite right to call then utility functions. Humans are adaptation-executers, not fitness-maximizers - so it is more like we each have a bunch of behavioral patterns that we apply when appropriate, and potential conflicts arise when multiple patterns apply.
Thank you! That is exactly what I was looking for.
I'm having trouble finding the original sequence post that mentions it, but a "fully general excuse" refers to an excuse that can be applied to anything, independently of the truth value of the thing. In this case, what I mean is that "this isn't really the important stuff" can sound reasonable even when applied to the stuff that actually is important (especially if you don't think about it too long). It follows that if you accept that as a valid excuse but don't keep an eye on your behavior, you may find yourself labeling whatever you don't want to do at the moment as "not really important" - which leads to important work not getting done.
For a while now, I've been spending a lot of my free time playing video games and reading online fiction, and for a while now, I've considered it a bad habit that I should try to get rid of. Up till now, I've been almost universally unsuccessful at maintaining this resolve for any length of time.
My latest attempt consisted of making the commitment public to my closest friends, explaining the decision to them, and then asking them to help by regularly checking up on my progress. This has been more effective than anything else I've tried so far.
Usually, when I get the impulse to catch up on a story or whatnot, I will end up weighing that impulse against the fact that I ought to behave differently. Sadly, as is often the case when pitting System 1 vs System 2, even if I suppress the impulse several times, it eventually wears down my resolve. So far, what I'm experiencing now is that I end up weighing the impulse against the rather unappealing idea of explaining the fact to my friends, which has made it a lot easier to maintain my commitment.
I have two pieces of advice for you. Please take them with a grain of salt - this is merely my opinion and I am by no means an expert in the matter. Note that I can't really recommend that you do things one way or another, but I thought I would bring up some points that could be salient.
1) When thinking about the coding job, don't put a lot of emphasis on the monetary component unless you seriously need the money. You are probably earning less than you would be in a full time job, and your time is really valuable at the moment. On the other hand, if you need the money immediately or are interested in the job primarily because of networking opportunities or career advancement, then it is a different matter.
2) Keeping up a good GPA is not equivalent to learning the material well. There are certainly corners you could cut which would reduce the amount of work you need to do without losing much of the educational benefit. As the saying goes, 20% of the effort gives 80% of the results. If you are pressed for time, you may need to accept that some of your work will have to be "good enough" and not your personal best. Having said that, be very careful here, cause this is also an easy way to undermine yourself. "This isn't really the important stuff" is a fully general excuse.
Pretty much any such moral standard says that you must be better than him
Why does this need to be the case? I would posit that the only paradox here is that our intuitions find it hard to accept the idea of a serial killer being a good person, much less a better person than one need strive to be. This shouldn't be that surprising - really, it is just the claim that utilitarianism may not align well with our intuitions.
Now, you can totally make the argument that not aligning with our intuitions is a flaw of utilitarianism, and you would have a point. If your goal in a moral theory is a way of quantifying your intuitions about morality, then by all means use a different approach. On the other hand, if your goal is to reason about actions in terms of their cumulative impact on the world around you, then utilitarianism presents the best option, any you may just have to bite the bullet when it comes to your intuitions.
As far as I understand it, the text quoted here is implicitly relying on the social imperative "be as moral as possible". This is where the "obligatory" comes from. The problem here is that the imperative "be as moral as possible" gets increasingly more difficult as more actions acquire moral weight. If one has internalized this imperative (which is realistic given the weight of societal pressure behind it), utilitarianism puts an unbearable moral weight on one's metaphorical shoulders.
Of course, in reality, utilitarianism implies this degree of self-sacrifice only if you demand (possibly inhuman) moral perfection from oneself. The actual weight you have to accept is defined by whatever moral standard you accept for yourself. For example, you might decide to be at least as moral as the people around you, or you might decide to be as moral as you can without causing yourself major inconvenience, or you might decide to be as immoral as possible (though you probably shouldn't do that, especially considering that it is probably about as difficult as being perfectly moral).
At the end of the day, utilitarianism is just a scale. What you do with that scale is up to you.
It might be useful to distinguish between a "moral theory" which can be used to compare the morality of different actions and a "moral standard" which is a boolean rule use to determine what is morally 'permissible' and what is morally 'impermissible'.
I think part of the point your post makes is that people really want a moral standard, not a moral theory. I think that makes sense; with a moral system, you have a course of action guaranteed to be "good", whereas a moral theory makes no such guarantee.
Furthermore, I suspect that the commonly accepted societal standard is "you should be as moral as possible", which means that a moral theory is translated into a moral standard by treating the most moral option as "permissible" and everything else as "impermissible". This is exactly what occurs in the text quoted by OP; it takes the utilitarian moral system and projects it on a standard according to which only the most moral option is permissible, making it obligatory.
Such a moral theory can be used as one of the criterion in a multi-criterion decision system. This is useful because in general people prefer being more moral to being less moral, but not to the exclusion of everything else. For example, one might genuinely want to improve the work and yet be unwilling to make life-altering changes (like donating all but the bare minimum to charity) to further this goal.
Hello, all!
I'm a new user here at LessWrong, though I've been lurking for some time now. I originally found LessWrong by way of HPMOR, though I only starting following the site when one of my friends strongly recommended it to me at a later date. I am currently 22 years old, fresh out of school with a BA/MA in Mathematics, and working a full-time job doing mostly computer science.
I am drawn to LessWrong because of my interests in logical thinking, self improvement, and theoretical discussions. I am slowly working my way through the sequences right now - slowly because I'm trying to only approach them when I think I have enough cognitive energy to actually internalize anything.
Right now, my best estimation of a terminal goal is to live a happy/fulfilling life, with instrumental subgoals of improving the lives of those around me, forming more close social bonds, and improving myself. Two of my current major projects are to smile more, and to stop wasting time on video games and the like.
I look forward to getting to know you all better and becoming a part of this community.
Get in the habit of smiling. Smile when you are greeting someone, smile at the cashier when they ring up your groceries, smile to yourself when you are alone.
As far as I understand it, the physical act of smiling (for whatever reason) improves your mood. Personally, I've tried to make it a point to smile whenever it occurs to me, and I've found that it generally improves my day. In particular, I find myself feeling more positive and optimistic.